RTOS Response Times

Real time and determinism. In many RTOS documentation and in literature we find definition of term "real-time" as property of reporting certain events (usually interrupts and task switching) no longer than some guaranteed period of time. While this is a requirement in many applications, it is very hard to determine precisely this maximum latency because most modern CPU themselves are non-deterministic. An RTOS may attempt to measure report times and document them, but they will be effective only to strict circumstances under which they were measured, and will not necessarily be effective under other circumstances employed by the customer code.

CPU features that impact OS determinism are:

Caches. Most modern CPUs have at least separated L1 cache for code and data, I-cache and D-cache respectively. Many have unified or separated L2 cache, and some have L3 cache. Caching performance is non-deterministic, as cached may introduce delays for cache misses (data must be brought in from external memory) and even  longer delays for cache replacement (a new datum or instruction has bad luck of arriving when all quitable cache lines filled; the previously cached data then must be written back to external memory and after then new data may be brought from external memory and fill a cache line.

Write buffers. If a code writes to external memory it may depend on state of write buffers. Cache write-back may be affected by state of write buffers too.

Internal pipeline state and out of order execution. Modern CPUs have long pipelines and some high-performance CPUs employ internal out-of-order execution of instructions. This is done in order to optimize instruction throughput, but the same features prevent determinism of instructions execution. The same instructions will execute at different count of CPU cycles depending on state of internal pipeline and execution units at the time when they enter CPU. (Pipelines more affect software task swicthing procedures than interrupt reporting, as CPUs with long pipelines would complete all outstanding instructions and write their results before starting execution at interrupt handler).

MMU context swicth. When an OS employs MMU and maps different address spaces for different tasks, it faces a need of partially or completely flushing MMU state during task switch. Complete flushing hurts performance too much and on-demand flushing introduce another delayed non-determinism: MMU translation cache will be flushed as necessary; amount of flushes depends on previously cached translations and new translations hit patterns.

Many dedicated embedded RTOS try to address CPU non-determinism by not using at least features not vital for an embedded application. First and obvious candidate is MMU - many embedded RTOS don't employ MMU translations (SeptemberOS is among them). However, caching improve overall performance too much that they can be sacrificed for determinism and disabled. Other CPU internal mechanisms are not even possible to disable.

We saw that it is very hard to determine precisely the "guaranteed response time" of an RTOS. From a theoretical point of view the "guaranteed" times may be deduced by adding maximum amount of maximum possible delay on a given CPU. However such figures are "guaranteed" to look bad on marketing sheets. Usually not only "response times" but also "response CPU clocks count" being documented are taken from measuring a particular application; that mean that they mean nearly nothing for the customer's application's numbers taken for the same parameters.

On the other hand, all the discussion is relevant to RTOS with very fast response times, comparable to dozens of CPU cycles. Some OS have extremely long (relative to CPU cycles) latencies; if figures say that guaranteed response times count in thousands of CPU cycles (typically more than 3 microseconds on today's mid-range CPU), they may be considered (the discussed non-determinism will be negligible).

Copyright (c) Daniel Drubin, 2010