NanoOs

1-Dec-2024 - Dynamic Memory

Good news! I now have dynamic memory allocation and deallocation with automatic memory compaction on my OS! Bad news: This nearly drove me crazy getting it to work.

OK, so a few years ago, I wrote out the logic for implementing all four C memory functions (malloc, calloc, realloc, and free) with a custom bump allocator. I needed this at the time so that I could guarantee that I could still allocate memory to print out a stack trace if the system memory functions were in a bad state at the time the stack was printed.

For a simple bump allocator to work, you need three pieces of information: A pointer to the first byte of memory that you’ll be allocating from, a pointer to the next free address within that memory, and the number of outstanding memory allocations. On an allocation, the current value of the next pointer is taken as the value to return and then the next pointer is incremented (“bumped”) by the number of bytes requested. The number of outstanding memory allocations is also increased by one. On free, the number of outstanding memory allocations is decremented and, if the resulting number is non-zero, that’s all that’s done. If the number is zero, then the next pointer is reset back to the beginning of the memory. What this means is that no space is reclaimed until all outstanding memory allocations are deallocated.

The benefit of a bump allocator is that it’s incredibly fast. Both allocations and deallocations do almost nothing. The downside is that it’s incredibly wasteful. Also, the fact that nothing gets reclaimed until everything is freed means that you run the risk of running out of memory if you’re not careful.

Another drawback is that the simple approach doesn’t allow for a way for reallocation to work. The realloc call is supposed to guarantee that if the memory is reallocated, the new block contains all the data of the old block. That would mean that the data in the old block would have to be copied into the new block and, by itself, the simple approach doesn’t provide a way to know this. It was critical to me that all four functions work, so I had to come up with a solution for this.

My solution was to store the size of the allocation right before the pointer that’s returned to the user. So, if the user requested X bytes, I would actually allocate, say, X + 4 bytes, write the number of bytes that the user requested into the first four bytes, then return the pointer to the 5th byte allocated to the user. If I got a realloc call, I could check the number of bytes in the previously-allocated memory.

In addition to telling me how many bytes to copy on a reallocation, this extra bit of information allowed me to determine two more things: (1) whether or not the new number of bytes being requested was smaller or equal to the number of bytes already allocated (in which case I can just do nothing) and (2) whether or not the pointer being reallocated or freed was the last pointer allocated. (If the address of the pointer plus its size is equal to the next free pointer, then I know I’m working with the last value that was allocated.)

For the second case, if I detected this situation on a free, I could reset the next pointer back to the beginning of the memory being freed because I would know that nothing was allocated after that. What this meant is that, if I freed memory in the reverse order of allocations, I could reclaim space that a normal bump allocator wouldn’t reclaim. Cool, huh?

Well, kind of, yes. This was fine for the purposes I was using it for at the time because I knew that everything allocated would always be freed by the end of the stack trace function. It’s not good enough for an operating system, even an embedded one.

I was fine with only reclaiming space when I detect that the pointer being freed was the last one in the list. Even enterprise-grade operating systems don’t reclaim absolutely everything on every free. Doing it when I detect that I’m freeing the last pointer is as good a time as any. But, I needed to do better than just roll back the next pointer to the pointer being freed. I needed to roll it back to the end of the last block that was allocated. So, if I had an array of 5 elements, I could get back to the second pointer if I freed the elements in the order 2, 3, 4, 5, not only if I freed them in the order 5, 4, 3, 2. Basic high-level goal: reclaim as much memory as possible any time we’re freeing the last pointer allocated.

I realized I could achieve this if I stored one more piece of information before the pointer and did one additional operation. The extra piece of information I needed was a pointer to the previous block allocated and the extra operation was setting a pointer’s size to zero on free. Then, when I detect that I’m freeing the last pointer in the chain, I just walk the prev pointers backward for as long as the size is zero and stop once I get to one that is non-zero. At that point, I just set the next pointer to the end of the block. I don’t even have to track the number of outstanding memory allocations anymore. Huzzah!!!

So, a few days ago, I set about implementing my improvement in my existing logging library. After ironing out the kinks and adding some new unit tests to prove that it was working as expected, I was ready to move on to implementing the live version for the OS. That’s where I started yesterday morning.

At this point, I need to pause and explain something about the way this OS works. All the memory trickery relies on stack segmentation. The stacks for the coroutines are created by setting a jmp_buf, calling a function that declares a 512 byte character buffer that then calls the coroutine main function where the actual coroutine structure is declared and initialized. At that point, longjmp is called with the jmp_buf that was set earlier and the coroutine is ready to be resumed from the place on the stack that it left. (The full algorithm is A LOT more complicated than that, but that’s the simple version of how it works.)

So, when I set about implementing dynamic memory for the Arduino, my goal was to do something similar. I wanted to use as much of the remaining RAM as possible, so I started the memory manager coroutine as the last one allocated. I then set a jmp_buf, called a function with a small stack buffer (128 bytes this time instead of 512), called a function that setup all the metadata, and then returned back to the main coroutine process where I waited for incoming messages to do memory allocation and deallocation.

Seems reasonable, right? Well, evidently not because I kept crashing my Arduino. This is where things started to get ugly. I spent a few hours pulling my hair out trying to figure out why I was crashing things no matter what I did. Finally, I noticed that the addresses of the variables declared in the metadata initialization function were lower values than the addresses of the variables declared in the main coroutine function. Then it dawned on me: The stack pointer decrements with each value pushed onto it, but my memory allocators assumed that all values were strictly increasing. In my logging library, memory is allocated out of a static buffer where the base pointer is the lowest value of the buffer. But, to allocate memory off the stack, my starting value was going to be the highest value available. So my algorithms were backward!!!

Oh, but if only it were that simple… Because not everything is backward, just the order of the addresses. The orientation of the metadata relative to the pointers still has to be the same (with the metadata coming at a lower address than the actual pointer). So, some logic had to be reversed and others didn’t. To make matters worse, instead of all allocations being additions and all deallocations being subtractions, now both operations were a mixture of addition and subtraction. I can’t tell you how many times I screwed that up.

So, yeah, here we are at the end of day two of stack segmentation and revised memory algorithms and automatic compaction and message passing to memory management processes and and and and… AAAAAAAAAAHHHHHHHHH!!!!!!!!!

Anyway, it all seems to work at the moment. I’m calling it a day before anything breaks again. Huzzah!!!

To be continued…

Table of Contents

This site is open source. Improve this page.