2016-03-16

Optimized Builds

Up until now, we've always compiled our program in debug mode. This means it contains debug symbols to indentify which compiled code originates in which source function. This would be useful for a normal Rust program, as it enables humand-readable output for debugging purposes, but it's of no use to us right now.

In addition, and perphaps more importantly, it means that our compiled code is not optimized. If we add a layer of abstraction to make our code more understandable, that increases the size of the compiled binary, even if that abstraction is written in a way to require no runtime information. If we ever want to write serious embedded programs in Rust, we need to compile with optimizations, otherwise our programs might not even fit into the memory of more limited devices.

Before we can do that, though, we need to make some preparations. First and foremost, we need to teach the compiler about the importance of the vector table[1]. The vector table is the entry point into our program. Without it, the program wouldn't even run. Too bad the compiler doesn't know that, and thinks it can just throw it away during optimization[2].

I'm not sure why, but the solution to that problem is to add the #[no_mangle] attribute to the vector table. We would normally use that to prevent name mangling[3], but in this case it also causes the compiler to not throw the vector table away during optimization. Here's what that looks like in the code.

#[link_section=".vectors"]
#[no_mangle]
pub static VECTOR_TABLE: VectorTable = VectorTable {
    initial_stack_pointer: &_estack,

    on_reset: on_reset,

    // The rest of the vector table has been omitted, for the sake of brevity.
};
            

While that takes care of our entry point, it's not all we have to do before we can optimize the program. Remember that the microcontroller uses a technique called memory-mapped I/O[4]. This is not something that the compiler can make sense of without our help. To the compiler, it looks as if we're writing data to a random memory address that we never read from again. If the address is never read from, why not throw away the write operation, too? From the compiler's perspective, it doesn't seem to do anything useful.

The answer to this problem is something called a volatile read or write. By doing a volatile read or write, we're basically telling the compiler, "trust us, this has meaning, don't throw it away during optimization". Rust has functions for doing volatile reads and writes[5], but it would be a pain, and quite error-prone, if we always had to remember to use those functions. Let's create a type that wraps volatile data, and can only be accessed via volatile reads or writes.

use core::ptr;


pub struct Volatile {
    value: T,
}

impl Volatile {
    pub unsafe fn read(&self) -> T {
        ptr::read_volatile(&self.value)
    }

    pub unsafe fn write(&mut self, value: T) {
        ptr::write_volatile(&mut self.value, value)
    }
}
            

That's quite straight-forward. As value is private, there's no way to access it from outside the module, except through the two functions which do volatile reads/writes. Here's how we use that type in the real-time timer (RTT) interface.

use volatile::Volatile;


pub struct Rtt {
    pub mode  : Volatile<u32>,
    pub alarm : Volatile<u32>,
    pub value : Volatile<u32>,
    pub status: Volatile<u32>,
}
            

And of course we have to change the code that uses that interfaces, too. Here's how we would read and write before:

(*RTT).mode = 0x00000020; // write
(*RTT).value              // read
            

And here's how it looks after:

(*RTT).mode.write(0x00000020) // write
(*RTT).value.read()           // read
            

Now we can tell the compiler to optimize the program, which is as easy as passing --release to the cargo build command. This is the updated compile script:

#!/usr/bin/env bash

cd blink
cargo build --release --target=target.json
            

And that's it! The only thing left to do is to check the results of the optimization. This is the output from the uploader from before:

Wrote 1412 bytes (6 pages)
            

And after:

Wrote 504 bytes (2 pages)
            

A nice improvement!

And that wraps us up for today. As always, the full code is available on GitHub. See you next time!