A different approach to state machines – part 2

In the last part we looked at a different approach for implementing state machines. In this part we will refine the implementation with the goal descrease the code size. In this post, I discussed the advantage of using a switch statement over writing a jumptable yourself. Applying this to the statemachine implementation decreases the code size by replacing ‘costly’ function pointers with bytes. A further descrease is possible since the compiler can reduce the number of ‘actual’ (non-inlined) functions. The guard and effect functions will each be replaced by a single function containing a switch statement to select the correct guard/effect.

// Updates to types. Note that the role of guardHandler_t and effectHandler has
// changed. Members of these types have been added to statemachine_t. These
// functions are responsible for handling ALL guards and effects for this
// statemachine.
typedef bool (*guardHandler_t)(uint8_t guard);
typedef void (*effectHandler_t)(uint8_t effect);

typedef struct statemachineRule_t {
    uint8_t trigger;
    uint8_t guard;
    uint8_t effect;
    uint8_t state;
} statemachineRule_t;

typdef struct statemachine_t {
    guardHandler_t           guardHandler;
    effectHandler_t          effectHandler;
    const statemachineRule_t *pRules;
    const uint8_t            nrOfRules;
    uint8_t                  currentState;

// Example guard and effect handler. Note that the guard handler always returns
// true for unknown cases. This makes the implementation of the NONE and ELSE
// guards trivial. It will also help if we start introducing state entry and
// exit functions. Note that NONE and ELSE are identical from an implementation
// perspective, but mean something different from intent. NONE indicates there
// are no conditions to be met for applying the transition. ELSE indicates the
// transition should be taken if all the alternative conditions are false.
bool guardHandler(uint8_t guard) {
    switch( guard ) {
        case gLevelIsZero : return levelIsZero();
        default : return true;

void effectHandler(uint8_t effect) {
    switch( effect ) {
        case eSetDefaultOnLevel : return setDefaultOnLevel(); break;
        case eTurnOff           : return turnOff()          ; break;
        case eDimUp             : return dimUp()            ; break;
        case eDimDown           : return dimDown()          ; break;
        case eStopDimming       : return stopDimming()      ; break;
        default : return true;

// Updates to state machine handler. effectHandler and guardHandler are now
// members of statemachine and are allowed to be NULL. The guards and effects
// are identified by a byte and fed as a parameter to these functions.
static void statemachine_ApplyTrigger(
    statemachine_t *pStatemachine,
    trigger_t       trigger      ) {

    uint8_t internalState = INVALID;

    for ( size_t i = 0u; i < pStatemachine->nrOfRules; ++i ) {
        statemachineRule_t rule = pStatemachine->rules[i];
        if ( rule.trigger == INVALID ) {
            internalState = rule.state;

        if (   (internalState == pStatemachine->currentState)
            && (trigger == rule.trigger)
            && (   (pStatemachine->guardHandler == NULL)
                || pStatemachine->guardHandler(rule.guard)) ) {
            if ( pStatemachine->effectHandler ) {
                pStatemachine->effectHandler( rule.effect );
            pStatemachine->currentState = rule.state;

As with the previous post you can find a complete compileable example here. A quick test with ‘gcc’ and ‘size’ shows a code size decrease of roughly 500 bytes on my machine which is 10% of the complete code size. Pretty spectacular if you consider the complete code already contains quite some overhead by linking to standard libraries. I will provide some more in depth analysis on code size when we proceed refining the implementation in the next part of this series.

$ gcc -o statemachine_part1 statemachine_part1.c -Os -std=c99
$ gcc -o statemachine_part2 statemachine_part2.c -Os -std=c99

$ size statemachine_part1 
   text    data     bss     dec     hex filename
   4561    2132     480    7173    1c05 statemachine_part1

$ size statemachine_part2 
   text    data     bss     dec     hex filename
   4081    2132     480    6693    1a25 statemachine_part2

A different approach to state machines

Especially useful for embedded control systems, state machines are an intuitive and effective tool for design and documentation. In this post I am going to describe an approach to implementing state machines I came up with a couple of years ago. Shortly after, I discovered very similar ideas have been in use in the industry for quite some time. Still, the approach does not seem to be commonly known. This is a real shame; the proposed implementation of state machines can lead to a roughly 2.5 times smaller code size, while being extremely easy to read and maintain.

Readability is actually the starting point of this exercise. What if we could just execute our state machine design? As an example we will use the state diagram below. I use hungarian notation to differentiate (s)tates, (t)riggers, (g)uards and (e)ffects. This will become more helpful once we refine the state machine implementation later on [1].


Jotting down circles and arrows is not very convenient in text [2], but state machines can be easily expressed in text as well, see below.

sOff :
    tPressOn                    / eSetDefaultOnLevel -> sOn
sOn :
    tPressOff                   / eTurnOff           -> sOff
    tPressUp                    / eDimUp             -> sDimming
    tPressDown                  / eDimDown           -> sDimming
sDimming :
    tPressOff                   / eTurnOff           -> sOff
    tReleaseUp                  / eStopDimming       -> sOn
    tReleaseDown [gLevelIsZero] / eStopDimming       -> sOff
    tReleaseDown [gELSE]        / eStopDimming       -> sOn

Theoretically, we could already write this format as a multi-line string in a source file, parse and interpret it. This, however, would be very inefficient in terms of code size and execution speed. As an alternative write the state machine in tabular format as an array of structs:

typedef bool_t (*guardHandler_t)(void);
typedef void (*effectHandler_t)(void);

typedef struct statemachineRule_t {
    uint8_t         trigger;
    guardHandler_t  guard;
    effectHandler_t effect;
    uint8_t         state;
} statemachineRule_t;

#define INVALID 0xFFu
#define STATE( s ) { INVALID, NULL, NULL, s }

enum states {
    sOff    ,
    sOn     ,

enum triggers {
    tPressOn    ,
    tPressOff   ,
    tPressUp    ,
    tPressDown  ,
    tReleaseUp  ,

const stateMachineRule_t rules[] =
//    trigger     , guard       , effect            , next state
STATE( sOff ),
    { tPressOn    , NONE        , eSetDefaultOnLevel, sOn        },
STATE( sOn ),
    { tPressOff   , NONE        , eTurnOff          , sOff       },
    { tPressUp    , NONE        , eDimUp            , sDimming   },
    { tPressDown  , NONE        , eDimDown          , sDimming   },
STATE( sDimming ),
    { tPressOff   , NONE        , eTurnOff          , sOff       },
    { tReleaseUp  , NONE        , eStopDimming      , sOn        },
    { tReleaseDown, gLevelIsZero, eStopDimming      , sOff       },
    { tReleaseDown, ELSE        , eStopDimming      , sOn        }

Note how close the implementation is to the original design! There is one things left to do; implement the engine for the state machine. Luckily, this is surprisingly simple for this first iteration.

typedef struct statemachine_t {
    const stateMachineRule_t rules[];
    const uint8_t            nrOfRules;
    uint8_t                  currentState;
} statemachine_t;

void ProcessTrigger(
    stateMachine_t  *pStateMachine,
    const trigger_t trigger       ) {

    uint8_t internalState = INVALID;

    for ( size_t i = 0u; i < pStateMachine->nrOfRules; ++i ) {
        stateMachineRule_t rule = pStateMachine->rules[i];
        if ( rule.trigger == INVALID ) {
            internalState = rule.state;

        if (   (internalState == pStateMachine->currentState)
            && (trigger == rule.trigger)
            && rule.guard() ) {
            pStateMachine->currentState = rule.state;

I left out some minor details such as the guard and effects functions. Here is a completely self contained .c file that can be compiled into an executable example. In practice you will want to put the state machine engine definitions and implementation in a separate component. This is one of the refinements we will make in a next post.

1. I usually avoid Hungarian notation for indicating types. Here, however, it serves a dual purpose. The descriptive names of state, triggers, guards and effects are likely to clash with existing function or variable names. The prefix prevents most of the clashes. It al

To switch or not to switch…

Nigel Jones makes a compelling argument against the use of switch statements. A compiler has roughly two options for implementing a switch statement: 1) the equivalent of a number of if – else if – else statements or 2) a jump table. For a small number of cases option 1) may be more efficient. If you have a large number of cases option 2) is likely to be more efficient. Some compilers only support one option, others apply heuristics to choose between the alternatives. The bottom line of Nigel’s argument is the actual code size of a switch statement is rather unpredictable. This can be especially confusing if you remove a case and find the code size increases due to the compiler now favoring option 1) over option 2). However, the code size for a switch statement is always comparable or less than an equivalent series of if – else if – else statements. For me, that is good enough to generously apply the switch statements in my code. Furthermore, I find the syntax of the statement aesthetically pleasing and thus easy to read. Forgetting an ‘else’ in a sequence of if – else if – else statements can be a nasty bug to find. I find the equivalent in a switch statement, a missing break, much harder to overlook. But hold on a second, what if you implement a jump table yourself? As the following snippet shows, a jump table is also very easy to read.

typedef enum {
    action_turnOn      = 0,
    action_turnOff     ,
    action_dimUp       ,
    action_dimDown     ,
} action_t;

static void TurnOn(void);
static void TurnOff(void);
static void DimUp(void);
static void DimDown(void);
static void StopDimming(void);

void ExecuteAction(action_t action) {
    switch( action ) {
        case action_turnOn      : TurnOn()     ; break;
        case action_turnOff     : TurnOff()    ; break;
        case action_dimUp       : DimUp()      ; break;
        case action_dimDown     : DimDown()    ; break;
        case action_stopDimming : StopDimming(); break;

void ExecuteAction(action_t action) {
    if ( action == action_turnOn ) {
    } else if ( action == action_turnOff ) {
    } else if ( action == action_dimUp ) {
    } else if ( action == action_dimDown ) {
    } else if ( action == action_stopDimming ) {

void ExecuteAction(action_t action) {
    void (*jumpTable[])(void) = {
        TurnOn         ,
        TurnOff        ,
        TurnDimUp      ,
        TurnDimDown    ,

Unfortunately the compiler has more freedom in implementing a jump table than you have at the source level. You will have to create functions and reference them by pointers, which prevents the compiler from inlining them. The functions themselves may cause some administrative overhead. If the functions are inlined, which can be the case in a switch statement, it is easier for the compiler to exploit common parts and do a better job at optimizing for small code size. Not all jumps are equal, a local jump within a function may be less costly than a function call. The result is that a jump table is probably more costly in terms of code size than a switch statement or an if – else is – else sequence. That said, jump tables CAN be a good alternative. Switch statements only work with constant numbers. With some modifications jump tables can work with other data types as well. Measure what works best for you and your platform. At least you now have the choice between a sequence of if – else if – else statements, a switch statement or a jump table.

Code size optimization

Code size (ROM) should generally not be the first thing on your mind. If you are in a position to pick a platform where you are not likely to run into code size issues, do so. Optimizing for smaller code size is a time-consiming (read expensive) task, which does not add value to your project. Smaller code size is often traded against longer execution times. Longer execution times also means more power consumption. Even if your device is mains-powered, it does not hurt minding our planet a little, does it?

It is good practice to start your project with compiler settings that are a healthy mix of optimization for speed and the ability to debug. If code size becomes an issue, turn off settings that trade speed against code size (e.g. loop unrolling). If compiler output is already optimized for smallest code size, you will need some additional tricks up your sleeve. Unfortunately there are no common patterns for code optimization on source level. Code size is not directly proportional to physical lines of code. Results are largely dependent on the platform used. You could lose a lot of effort spent on reducing code size if you switch to a different platform.

That said, there will be times in which your application just does not fit in ROM and you do not have the luxury of switching platforms. Having a set of tools to decrease code size will definitely come in handy. I intend to create a series of posts here to describe some techniques you can apply to decrease code size, but I should start with a warning: your milage may vary. Do not blindly depend on my advice or your intuition, but remember the most important engineering principle: measure! Always compare code size of various options in your situation.