NanoOs

8-Jan-2025 - Pipes

If there’s one thing I should take away from the past week, it’s that I should absolutely under no circumstances ever write code while I’m sick…

Since my soldering pencil still hadn’t arrived, I decided to do something quick and easy: Add support for pipes between processes. Really, the design for this was pretty straightforward:

Each process descriptor would have an array of file descriptors.
Each file descriptor would have two pipes: An input pipe and an output pipe.
Each pipe would contain a process to send a message to and the type of message to send.
The scheduler would setup the file descriptors to the correct values when the process was started.
The standard C functions for input and output would be changed to pull the relevant file descriptors and pipes for the current process and exchange the specified messages with the specified targets.

This should have taken 3 or 4 days tops. It’s taken me 7. The problem was that I started coming down with a cold the same day I started this work. I was thinking, “I’m a little tired, but I can still program, right?” Wrong. Writing code when I’m sick is like writing code at 4:00 AM: Technically possible, but not in a good way and I’m going to regret it.

So, yes, technically I did all of the above in a few days… and then I hit bugs. Not simple bugs, either. No, coding when you’re sick results in really terrible bugs. And, debugging when you’re sick is a much, much worse idea than coding when you’re sick. I eventually had to stop entirely. If I had done that to start with, I would have saved myself time overall.

So, (a) piping wasn’t working right, (b) I had a pretty severe memory leak, and (c) the entire system would hang after a few commands. I knew immediately that (c) meant that I had a message leak in addition to a memory leak. Where either was, though, was a complete mystery.

If this was in application space, I’d use valgrind to track things down. This, of course, is not in application space. On the plus side, however, I did have complete control over the infrastructure. Also on the plus side, I knew that the problems had to be limited to the code related to pipes, so that narrowed it down quite a bit.

I started with the memory leak since that seemed like the biggest issue. I modified the memory allocation and deallocation functions to take a function name as the first parameter and made macros around them that provided the __func__ variable as the first argument to the calls. Then, I logged the value allocated and the process ID and function name making the call on allocation and deallocation. I then grabbed the output off the serial call when doing a piped command and put it in a text editor. I then removed all the allocations for which there was a matching deallocation.

It turned out that there was only one value that wasn’t being freed: The array of function descriptors for the foreground process. Now, the free for that array was conditional because there were two possible values: The statically-allocated global value that all processes get by default, or a custom value in the case of piped commands. The call didn’t attempt to free the array if it detected that it was pointing to the statcially-allocated array, and that’s what was happening in this case. But, why? All the code looked correct. Everything appeared to be cleaning up after itself. The function was clearly being called. So what was it?

Multitasking is what it was. See, being a foreground task, the process replaced the shell process. The scheduler is written to automatically restart the shell processes any time it detects that they’re no longer running. The call to make the scheduler deallocate the array was done by a non-blocking message. So, the message was being sent, the process was exiting, then the scheduler detected that the process had exited and reset the shell and all the metadata, including the file descriptors. Only after all that was the message handled by the scheduler where it examined the value of the array pointer which, by this time, had been reset and the pointer for the allocated array had been lost.

So, memory leak taken care of, it was time to address the message leak. Same as before, I changed the allocators and deallocators to print out every action, grabbed the log, then removed the allocations that had deallocations. Only one message didn’t get released: The one the scheduler sent to the process waiting for input from an output file descriptor. Now, I knew it was possible that the destination process might not be in a state to receive the message. However, this shouldn’t matter because the process would be ending. And, when a process ends, its message queue is destroyed. All the messages for the process should be released, right? What gives?

Well, it turned out that I had outsmarted myself. Yes, when a message queue is destroyed, any messages are released… if there’s nothing waiting on the message. If there is something waiting on the message, then it’s marked as done and it’s up to the process that’s waiting on it to release it. And, I had intended for nothing to be waiting on the message… but that’s not what I had actually coded. When I sent the message from the scheduler, I accidentally set the waiting flag to true. So, it never got released when the process exited. OK, simple enough. Change the silly flag to false like it’s supposed to be.

OK, now why wasn’t I seeing the output I was expecting to see? I had done a very simple echo and grep-like command but I was missing output I was expecting to see. I was basically seeing every-other message come through. Well, it was pretty clear what was happening when I thought about it. Output is sent to the console by checking out an output buffer, populating it, and sending it to the console process. It did NOT wait for the message to be processed by the console before returning back to the caller. What was happening was that a message was being checked out, sent back, then checked out again and overwritten a second time before it was ever written. I put in a wait for the message to be processed and all was well.

While I was at it, I also improved the way fgets worked. Since it was now possible for text to come spread across several messages (as was happening with the echo command), I changed it to aggregate several messages into the buffer being read into. This allowed it to work like a proper fgets command instead of returning the contents of each message.

So, all that in place, I FINALLY have working pipes. I have to say, it was pretty cool to see a single echo get piped through two greps and work correctly. I don’t know that anything will ever use that functionality, but it was fun to watch.

Anyway, my soldering pencil has finally arrived, so I can now look into getting the MicroSD card reader working and experimenting with a file system. We shall see! To be continued…

Table of Contents

This site is open source. Improve this page.