Your Production Floor Is a Cyber-Physical System (Act Like It)

Part 5 of the PPOS series. When you frame production operations as a discrete event system under supervisory control, you get formal safety guarantees that no amount of process documentation can provide.

In 1987, Ramadge and Wonham published a paper that formalized how to control discrete event systems using a supervisor that enables or disables events based on the current state. The framework was designed for manufacturing automation, but its implications extend to any system where a digital control layer governs physical operations.

Your production floor — if it uses software to track orders, manage inventory, and coordinate workers. Is exactly such a system. It’s a cyber-physical system where the cyber component (database, state machine, scheduler) controls the physical component (production floor, warehouse, shipping). The question is whether you’re treating it as one.

The Plant and the Supervisor

In supervisory control theory, the uncontrolled plant is the physical system with all its possible behaviors. Our plant is defined by the state set (13 stages), the event set (approve, produce, batch, pick, personalize, QC, pack, complete, cancel), and the transition function that maps states and events to new states.

The plant, left uncontrolled, can do anything the physics allows. Orders could jump from Pending to Complete. Cancellation could happen after shipment. Inventory could go negative. The plant doesn’t enforce business rules. It just executes events.

The supervisor sits above the plant and restricts its behavior. At each state, the supervisor defines which events are enabled, which transitions are allowed. The closed-loop system is the composition of plant and supervisor, and it produces only behaviors that satisfy both physical possibility (plant) and business rules (supervisor).

In our system, the supervisor is the control plane. The software layer that validates transitions, enforces invariants, and maintains the audit log. The execution plane. Workers performing physical operations and emitting status updates. Is the plant. The feedback loop between them is event-driven: the plant emits events (order completed production, item picked, QC passed), and the supervisor enables or disables the next available transitions based on the current state.

Controllability and Observability

Not all events are controllable. The supervisor can prevent a worker from marking an order as “production complete” (by validating prerequisites), but it can’t prevent an external webhook from firing (a customer submitting a change request, a supplier notification arriving). Controllable events are the ones the supervisor can enable or disable. Uncontrollable events are the ones that happen regardless.

The controllability theorem requires that the supervisor never attempts to disable uncontrollable events. In practice, this means the system must be designed to handle external events gracefully. Absorbing them into the state machine without violating invariants, even if the event arrives at an inconvenient moment.

Observability is the dual requirement: the supervisor can only make decisions based on events it can see. If a physical operation completes but the system isn’t notified (a worker finishes production but forgets to scan the barcode), the supervisor’s model diverges from reality. Observable events are the ones the system can detect. Unobservable events create blind spots.

Our design minimizes observability gaps by making every physical operation require a digital confirmation (scan, button press, system update) before the state machine advances. This doesn’t eliminate unobservable events entirely. A worker could physically move an item without scanning it, but it ensures that the supervisor’s model is never ahead of reality. It might be behind (not yet aware of completed work), but never ahead (believing work is done that hasn’t been).

The supervisor enforces a legal language: the set of all event sequences that satisfy the invariants. Every execution of the closed-loop system must produce an event sequence that belongs to this legal language. Sequences that would violate invariants are prevented by the supervisor’s event disabling.

This is a powerful framing because it separates what from how. The legal language defines what behaviors are acceptable. The supervisor implementation defines how those behaviors are enforced. You can change the implementation (different database, different UI, different worker interface) without changing the legal language, and the safety guarantees transfer.

Hierarchical Control

The supervisor isn’t monolithic. It’s hierarchical, with three levels. The global supervisor enforces system-wide constraints. Inventory conservation, overall capacity limits, cross-order consistency. The batch supervisor enforces batch-level constraints. Capacity per batch, SKU compatibility within batches, due-date coherence. The work order supervisor enforces lifecycle legality. Valid transitions, personalization immutability, cancellation rules.

Each level has authority over its domain and cannot be overridden by lower levels. The global supervisor can prevent a batch from forming even if the batch supervisor would allow it (because global inventory is insufficient). The batch supervisor can prevent a work order transition even if the work order supervisor would allow it (because the batch isn’t ready for that stage).

This hierarchy mirrors military command structure, not coincidentally. Strategic decisions (global resource allocation) aren’t made at the tactical level (individual work order processing). Each level has the information and authority appropriate to its scope.

The Nonblocking Guarantee

The supervisory control framework provides one more critical guarantee: the closed-loop system is nonblocking. From every reachable state, there exists a path to a terminal state (Complete or Cancelled). The supervisor never disables all forward transitions from a non-terminal state. There’s always a way out.

This means the system cannot create deadlocks at the operational level. An item can’t get stuck in a state where no valid transition exists. Combined with the acyclicity guarantee (no infinite loops) and the deadlock avoidance in concurrent transactions, this provides a complete progress guarantee: every work order will eventually reach a terminal state.

In Part 6, we’ll address what happens when things go wrong. Crash recovery, distributed consistency, and why event sourcing gives you deterministic replay as a free architectural bonus.

Discussion

Adam Bishop

Veteran, entrepreneur, and independent researcher. Writing about formal methods, AI governance, production systems, and the operational discipline that connects them. Every project here demonstrates hard thinking on simple infrastructure.