Real-Time Systems
RT at the interview
2017. A SpaceX Falcon 9 lands on an Atlantic barge; during the 8-minute descent the onboard computer processes 100,000 sensor readings per second while holding a hard 1 ms control loop. A staff embedded engineer candidate at SpaceX gets the question: *"How would you design this system?"* The right answer does not start with 'I would pick VxWorks' - it starts with clarifying the latency budget, criticality classification, and failure modes. The RT interview is not a terminology quiz but a check on the ability to think structurally under hard timing constraints, and that is exactly what staff engineers at Tesla, SpaceX, NVIDIA Drive get paid USD 400K base plus equity for.
- **Tesla Autopilot**: staff embedded interview is 4 sessions of 45 minutes, one of them posed as 'design adaptive suspension', graded at L5+ design
- **SpaceX Avionics**: ARINC 653-style partitioning on a custom RTOS, candidates are expected to know DO-178C DAL B/C at the interview
- **NVIDIA Drive**: mixed-criticality systems on DRIVE Thor (2000 TOPS), the interview covers hypervisor partitioning and cache QoS
- **Cruise/Waymo**: ROS 2 + DDS + custom scheduler, candidates get live-coding questions on schedulability and priority inversion
Scheduling questions: Rate Monotonic and EDF
A typical opener at Tesla/SpaceX/Rivian: *"Given three tasks with periods 10, 20, 40 ms and WCET 3, 5, 8 ms. Schedule them under Rate Monotonic. Is the system schedulable?"* This is not a definitions question - the interviewer is checking whether you can apply Liu-Layland $U \leq n(2^{1/n} - 1)$ in two minutes. For $n=3$ that bound is $\approx 0.78$. Utilization: $3/10 + 5/20 + 8/40 = 0.3 + 0.25 + 0.2 = 0.75 < 0.78$ - schedulable by the sufficient condition. A candidate who stops here misses the trap: Liu-Layland is *sufficient, not necessary*. Between this bound and 100% utilization (the necessary condition) lies the zone where response-time analysis is required - and that is exactly the zone interviewers love. Same pattern as ML interviews with SGD convergence: sufficient conditions are easy, necessary ones are subtle.
Second classic question: *"When is EDF preferable to Rate Monotonic?"* The correct answer has three layers. **Surface**: EDF reaches 100% utilization bound versus 69-78% for RM. **Middle**: EDF handles tasks without strict periodicity (aperiodic, sporadic). **Deep**: EDF behaves worse under overload - a single missed deadline triggers a cascade, because dynamic priorities do not preserve the 'critical-first' property. RM degrades predictably under overload: low-priority tasks miss deadlines, high-priority ones survive. So avionics (DAL A, overload = catastrophe) picks RM; multimedia processing (soft RT, overload = jittery audio, not death) picks EDF.
At L6+/staff levels the interviewer pushes into **priority inheritance** and **priority ceiling protocol** - mechanisms against priority inversion. Everyone knows the Mars Pathfinder 1997 case, but few can articulate the difference between **inheritance** (priority inherited only for the blocking interval) and **ceiling protocol** (acquiring a resource raises priority to the maximum across its users). Inheritance is simpler but does not prevent deadlock; ceiling protocol is more expensive but guarantees a bound on blocking. In real code FreeRTOS uses priority inheritance, VxWorks offers both. Knowing this distinction separates 'read a chapter in a book' from 'wrote a driver under VxWorks'.
Liu-Layland gives $U \leq 0.78$ for $n=3$. What do you do if your system utilization is 0.85?
WCET analysis: how to bound worst-case without a 10x margin
A favourite question at Airbus or GM: *"How do you measure WCET for a camera signal processing function?"* A bad answer: *"Run it 10000 times and take the maximum."* That is measurement-based WCET, and it is unreliable for two reasons. First, measurements do not cover all pathways through code (branch coverage is not path coverage). Second, cache misses, branch mispredictions, and interrupt timing depend on system state that the measurements never visited. A real answer has three layers. **Static analysis** (aiT, OTAWA) produces an upper bound from a control-flow graph plus a cache model of the processor - guaranteed but pessimistic, typically 2-3x larger than the real maximum. **Hybrid** combines static structure analysis with measurement timings of individual basic blocks. **Measurement-based** is acceptable only for soft RT and as a sanity check for the static result.
The trap of the question: the interviewer expects you to mention **execution time variability** on modern out-of-order processors with caches. On ARM Cortex-A72 the same function runs anywhere from 12 to 47 microseconds depending on L1 cache state - roughly 4x variation. That kills classical WCET analysis, which was designed for deterministic Cortex-M (no out-of-order, no L2). The fix: either **cache partitioning** (reserve an L2 way for the critical task), or move to **Probabilistic WCET** (a confidence interval, for example 99.99%-tile out of 10^5 runs with a randomized cache). Same approach in ML when p99 latency matters more than the mean: the SLO is a distribution, not a point.
A classic trap question: *"What if WCET includes garbage collection time?"* On JVM or V8 that sounds absurd for hard RT, and the instinct is correct. But **Real-Time Java** (RTSJ, JSR-1) has existed since 2002 and runs in military systems where legacy Java code cannot be rewritten. The fix: **Realtime GC** (Metronome, IBM J9) gives bounded pause times via incremental collection. The cost is 30-50% throughput overhead. Same story in Go: the stop-the-world GC shrank from 300 ms in Go 1.4 to <1 ms in Go 1.8 via concurrent collection - and now Go is usable for multimedia soft RT, which was not true ten years ago.
An interviewer asks you to compute WCET for an image-processing function on Cortex-A72. How do you start your answer?
RT system design: 'design an autopilot'
At the staff/principal level you get an open-ended problem: *"Design the control system for an EV active suspension that adapts to road surface."* This is RT system design, and the approach is as structured as Grokking System Design Interview, only with timing constraints instead of QPS. Five steps. **Step 1: latency budget**. Ask the interviewer about the requirements - typically 10 ms sensor-to-actuator for suspension, 100 ms for steering, 1 ms for airbag deployment. **Step 2: criticality classification**. ASIL D for airbag, ASIL B for adaptive suspension, QM for infotainment. **Step 3: hardware architecture**. Debate centralized ECU (the Tesla approach) versus distributed ECUs (the classic automotive setup). **Step 4: software architecture**. AUTOSAR Classic for ASIL D, AUTOSAR Adaptive or Linux PREEMPT_RT for infotainment. **Step 5: failure modes**. What happens if one sensor fails, one ECU fails, the CAN bus drops.
The main difference from a regular backend system design: in RT, **failure is part of normal operation**, not an exception. The CAN bus may drop a frame; the lidar may return data with 20 ms latency instead of 10; the main MCU may go into a watchdog reset. The architecture must survive that without stopping service. Hence patterns like **graceful degradation** (if active suspension fails, fall back to passive settings), **lockstep CPU** (two processors run the same program, vote on disagreement), **TMR - Triple Modular Redundancy** (three copies with majority voting, used in DAL A avionics). At the interview the candidate is expected to propose these patterns before the interviewer asks *"What if...?"*.
At principal/distinguished level the question shifts to **mixed-criticality systems**: one SoC running ASIL D, ASIL B, and QM tasks at the same time. This is current for Tesla FSD and NVIDIA Drive: 8 GPUs plus 4 ARM cores executing safety-critical autopilot and non-safety infotainment on one board. The fix is **hypervisor partitioning** (PikeOS, QNX Hypervisor) plus **time partitioning** for CPU plus **cache partitioning** for shared L3. Knowing these patterns separates the candidate for staff engineer in Tesla Autopilot or Cruise from a general-purpose embedded developer.
The principal challenge of a mixed-criticality system (ASIL D + QM on one SoC) - what should you stress at the interview?
Tradeoffs: what to pick and when to walk away from RT
The most undervalued question at a staff interview: *"When should you NOT build a hard real-time system?"* Candidates love showing off, so they propose RT for everything. But an honest engineering stance: hard RT is expensive, and applying it without need is technical debt. The price of hard RT: specialised RTOS, MISRA C, formal methods, certification, a limited engineer pool, poor compatibility with modern ML frameworks. If the task allows p99 latency instead of worst-case, choosing plain Linux with PREEMPT_RT plus latency-distribution modelling is a valid answer the interviewer respects more than the reflex 'let's use VxWorks'. Same tension in ML: deterministic inference versus throughput-optimised batching - two different strategies for two different SLOs.
The canonical tradeoff: **VxWorks vs Linux PREEMPT_RT**. VxWorks - certified RTOS under DO-178C/ISO 26262, deterministic worst-case, USD 50K+ project license plus USD 50K runtime royalty per million units. Linux PREEMPT_RT - free, flexible, supports Docker and the ML stack, but best-effort latency can hit 200-500 us worst-case under load. Tesla picked PREEMPT_RT for FSD (ML pipeline flexibility was non-negotiable), Airbus kept VxWorks 653 for the A350 FCC (no alternative for DAL A). Knowing this historical context is a strong signal of engineering maturity.
The final maturity signal at the interview is the ability to **admit uncertainty**. The question: *"Which RTOS is better for our project - VxWorks or QNX?"* A weak answer picks one with the rationale 'I have worked with it'. A strong answer lists the factors (regulatory, hardware support, existing team, license cost, driver ecosystem) and says *"Without knowing these details I cannot give an honest answer; here are the questions I would ask the customer."* Interviewers at Tesla or SpaceX look for exactly that - not confidence for its own sake, but structural thinking under uncertainty. Same behaviour is valued at Anthropic during LLM safety design review - not *'I know the right answer'*, but *'here are my hypotheses and the experiments that would test them'*.
A real-time system is just a 'fast system', and knowing the terminology is enough for an interview
A real-time system is one with a mathematically (or statistically) guaranteed upper bound on latency. At the interview the interviewer checks not terminology but the ability to work with tradeoffs: hard vs soft RT, schedulability tests, WCET analysis, mixed-criticality, picking the stack to match requirements. Knowing formulas without context is junior level.
Real RT systems are built under enormous pressure from cost, time, regulatory framework, and available team expertise. The candidate who can honestly discuss these constraints and propose a structural compromise is what staff and principal engineers are paid for. A candidate who reflexively answers 'VxWorks' or 'PREEMPT_RT' without context analysis is not architect material.
At a staff interview you are asked: *"When pick Linux PREEMPT_RT over VxWorks?"*. Which answer shows maturity?
Related topics
RT interviews intersect with several disciplines:
- Schedulability analysis — Liu-Layland, response-time analysis, EDF bounds - the mathematical core of the RT section
- Formal methods and model checking — UPPAAL, TLA+ - tools for staff+ levels at DAL A/ASIL D companies
- Industry standards — ISO 26262, DO-178C, IEC 62304 - knowing the standards is critical for automotive/avionics/medical interviews
- Realtime backend — Latency budgets in realtime web (WebSocket, SSE) use the same logic as hard RT, at different orders of magnitude
Key ideas
- **Schedulability questions**: Liu-Layland is sufficient, response-time analysis is exact; know the RM vs EDF tradeoff and behaviour under overload
- **WCET analysis**: static (aiT), measurement-based, hybrid; on out-of-order CPUs (Cortex-A72) classical WCET is pessimistic - use cache partitioning and Probabilistic WCET
- **System design**: 5 steps - latency budget, criticality, hardware, software stack, failure modes; mixed-criticality requires hypervisor + cache partitioning, not priorities
- **Tradeoffs**: VxWorks for DAL A/ASIL D, Linux PREEMPT_RT for soft RT + ML; candidate maturity shows up as context-awareness, not reflex choice
Вопросы для размышления
- Mars Pathfinder priority inversion - the famous case. Which modern systems (Tesla FSD, SpaceX Dragon) are theoretically vulnerable to the same class of problems, and which architectural defences are in use?
- A staff engineer at Tesla makes 2x what a regular embedded company pays. For which concrete skills is that premium paid - and which interview questions probe them?
- If PREEMPT_RT can replace VxWorks for most soft RT tasks, why do regulated industries (avionics, medical) keep VxWorks/QNX around for decades?
Связанные уроки
- rts-01 — Without distinguishing hard/soft RT, schedulability questions are unanswerable
- rts-12 — Formal methods is a common deep-dive topic in the RT section at staff+ levels
- rts-13 — Automotive/avionics/medical architectures come up in the architecture round
- ml-28-optimizers — Online learning also runs on a latency budget, just like soft RT
- rt-01-what-is-realtime — Realtime backend shares the same conceptual framework around latency
- ds-04-consistent-hashing — Sharding work across cores reuses the same idea as in distributed systems
- os-01-intro