Microcontroller memory layout
Recently I’ve had a few people ask me about memory layouts in microcontrollers as well as a few issues over the years with bad linker files (files that tell compiler/linkers where to put memory), so, I thought it might be worth a little blog article; This is aimed at those getting started and junior engineers (answering the questions they asked me recently).
(If you’re a linux/ windows programmer this post probably won’t be of use or interest to you).
I’m going to cover a pretty standard layout of memory on a microcontroller, how memory is used and the usual mistakes/pitfalls and how to help manage your memory footprint.
First we have different types of memory Flash(non volatile) and Ram(volatile).
Lets start with a picture of our memory
Looking at RAM first
Variables fall into several categories
- Global Variables Uninitialised and Zero Initialised.
- Global variables Initialised.
- Local Variables (We’ll look at these in the Stacks section)
Global Variables Uninitialised and Zero Initialised
These are variables that are defined either with no assigned value or a value assigned to zero.
They can be:
– Global (outside a function).
– Static Global (scoped to (only visible to) file outside a function).
– Static Function variables (these are global but only accessible to the function)
static uint32 global2 = 0U;
static uint32 counter = 0U;
Global variables Initialised.
These are any global variable (as above) which are assigned a non 0 value on declaration.
uint32_t global1 = 1U;
static uint32 global2 = 2U;
static uint32 counter = 3U;
Stacks are literally that – they are a contiguous block of memory which you have a location in, you can put things on the stack and remove your things from it at the end of your function.
Delving down a little deeper – what happens when you go into a function (register and stack use).
When you enter a function a number of things happen (these are defined in procedure call standards).
Registers including the Program Counter (where you are) and Link Register(return location for your current function) and working registers are pushed onto the stack (saved).
The Link Register is loaded with the value of the program counter + 1 (so we no where to return to).
The Program Counter is loaded with the function address (jumping to the function)
Variables that the function uses within it come off the stack.
As seen in the example below.
exampleFunction(); //Registers are copied to the stack
uint32_t idx = 0U; //Variable On the stack
uint32_t tries = 3U; //Another Variable On the stack
In the above example (typically) we can see we have half a dozen registers going onto the stack and at least 8 bytes of stack used for variables (maybe more if there are memory alignment requirements).
If another function is called within it that is pushed onto the stack too.
The stack is popped back at the end of each function (memory is effectively freed).
Local variables only exist for the duration of the function (as we can see they get overwritten when something else uses the stack after them).
Another Example of stacks in use with 4 functions [fn1, fn2, fn3, fn4].
[saved registers][fn1 Variables]
[saved registers][fn1 Variables][saved registers][fn2 Variables]
[saved registers][fn1 Variables][saved registers][fn2 Variables][saved registers][fn3 Variables]
[saved registers][fn1 Variables][saved registers][fn2 Variables]
[saved registers][fn1 Variables]
[saved registers][fn1 Variables][saved registers][fn4 Variables]
Important Stack Fact
In most systems (eg ARM) stacks descend!
So when we have an empty stack we’re sitting right at the top of the memory picture (at the end of our memory).
Thus for each function we nest into we’re heading toward the bottom of the picture (toward the heap).
There’s more info on monitoring and positioning stacks later…
On PC’s we regularly use malloc/free, new/delete to allocate memory when we need it.
On PC’s thats fine, we have memory management units and large quantities of ram – on microcontrollers we don’t!
Due to this if we want to dynamically declare memory we need to handle the allocation cleanly ourselves we do this using a heap structure, there are libraries for this but there are consequences to heaps.
On PC’s your memory is measured in GB (1000,000,000’s of bytes) in micro controllers your ram is measured in 10’s to 100,000’s of bytes, a heap may only be 10K big if you even decide to use one.
This all has consequences – such as fragmentation, in an over simplified example say we have 1000 bytes of heap.
we take out 500 bytes
//Empty heap [##########]
uint32_t * a = (uint32_t*)malloc(500U);
//Memory Allocated [aaaaa#####]
uint32_t * b = (uint32_t*)malloc(300U);
//Memory Allocated [aaaaabbb##]
//Memory Freed [#####bbb##]
uint32_t * c = (uint32_t*)malloc(600U);
//Allocation failed NULL returned.
The above example shows that though there is enough overall space for the c variable the memory is broken up; this leaves us with a dilemma of what to do, at the very least we must check each malloc and fail safely if we can’t have our requested memory, but in some scenarios we may not be able to fail cleanly.
Another potential pitfall is not knowing how much memory your application is going to use in all scenarios.
In medical, auto and other safety critical systems you’re not allowed to use dynamic memory for these very reasons, and in general in the embedded industry it is considered unsafe and a poor design decision (greater use of memory structures such as small static global circular buffers that don’t use the heap are far safer and its far easier to cope with running out of buffer safely as they tend to only be used for one purpose).
I usually set heap sizes to 0 for this reason.
Its worth noting that despite this in some systems you are forced to use a heap for external libraries (avoid these if you can).
Monitoring and positioning your stack
As stack usage changes with number of functions deep you are it is useful to ensure you can
1) track how much stack you are using
2) position your stack in the correct place so if you do overrun your stack you don’t corrupt your data variables.
Tracking how much stack you are using
This may seem daunting but its worth the effort.
Put simply if we set the stack to a known sequence of bytes (ie set it all to 0xAA – NOT 0 or 0xFF as these occur commonly in local array declarations).
Then starting at the other end of the stack we can move/count toward the start until we see the sequence change, the size + free size gives us our Maximum Used stack depth (not this is not the current usage which will go up or down and can be identified using the stack pointer register).
To achieve this we need to change the startup code (often assembler in a .s file (NOT THE MAIN function as by that point we’re using the stack) to set the stack itself, theres often code you can crib but if not its normally only about 7 lines of assembler.
The size and location can be externally accessed from the linker file (more on linker files later).
We now need to implement (and regularly/occasionally) call a function to itterate through the stack from the end toward the start (again its usefully to get the position and size from the linker file (export and extern)
Position your stack in the correct place.
As we know from above stacks are descending.
From the diagram we can see that if we put the stack at the end of our memory so even if we overrun the end of our stack we only change unused memory – we don’t corrupt our data.
To achieve a memory layout we describe our memory in linker files (.ld in gcc, .icf in IAR), Unfortunately where c is standardised every compiler manufacturer and his dog have their own format of linker files (so I’m not going to paste an example, just describe roughly how to check your stack is in the correct section – Often it isn’t many examples put their stacks directly at the end of HEAP!).
- Ensure there is a section in the linker for STACK_SECTION
- position it at the end of memory
- STACK_SECTION= (MEMORY_START + MEMORY_SIZE) – (STACK_SIZE + 4U)
Note the – 4U to word align and move it back a word – this is for 32 bit systems you’d using – 2U for 16 bit systems etc.
Make sure you put the stack on its own in its new section.
Be mindful of what’s using your stack – even functions such as printf can easily end up using hundreds of bytes of stack for a single call! and acts such as unrestricted recursion can blow your stack incredibly quickly.
Its worth noting that some IDE’s have stack usage tracking built into them.
Additional note for arm cortex the stack pointer is loaded from the first position in the vector table.
Flash Memory Layout
Flash is the non volatile memory that persists when power goes away.
Most modern microcontrollers use flash to store their Vector Table, Program Code, Initialised Data Values, User Storage.
Vector Tables and interrupts and startup.
Microcontrollers have interrupts that can do various things from triggering on power up to triggering when communications happen; These interrupts each have functions associated with them, they are stored as a predefined list in a table called a ‘vector table’, often the first element in this table is the reset vector (the code that runs when we power up).
This reset vector triggers our startup code which sets up our memory
-zero out the BSS (zero initialised code)
-copy the initialised DATA states from flash to RAM
-setup the stack (This is where we’d set the values)
-Jump to the main() function (in the program code section of the diagram)
This contains your main and all your user functions.
As your ram does not persist through power cycles, global variables that are assigned a value must have their values stored in non volatile memory and copied in on power up.
We often have to store settings on our micro controllers, often this is in flash, when it is we should pay some consideration to certain facts
-Flash has a limited number of ‘writes’ (actually bank erases) typically 10,000 – 100,000 bank erases.
-Can only erase banks and not single bytes.
-Flash can only be write bits from 1 to 0 (ie you can write 0x05 to an empty address (which would contain 0xff) but you could not then write 0x06 to it without first erasing the entire bank.
Due to this we often
– shadow banks (have 2 banks we work erasing copying and swapping between them)
– consider what rate we’re writing data and whether we need to consider wear levelling or a separate external memory more suited (FeRAM, MRAM, PCRAM), often if we’re writing only settings and we write them as a whole block we won’t see wear out situations, it is necessary however to do the maths to work out how many writes per year.