NanoOs

15-Dec-2024 - Shell Games

I spent about a day and a half after my last post developing a shell process that could launch a new process. The first thing I had to figure out was how the shell would spawn a new process in such a way that it could take ownership of the console that the shell was using. It obviously didn’t make any sense for more than one process to be connected to stdin at a time, so working out this transfer was the first major hill to get over.

At the time, there were three (3) “kernel” processes that provided services to the the commands: The scheduler, the console, and the memory manager. Up until the introduction of the shell, the console had direclty parsed and launched commands. With the shell, that would no longer be true. The shell would be a user process, not a kernel process. So, now there would be a user process - which would consume one of the user process slots - launching another user process.

Even with the console directly parsing command lines, it still didn’t actually launch processes. Only the scheduler can do that. So, the console would send a message to the scheduler with the information about how to launch the process. The shell will do the same thing. However, this was a non-blocking message from the console. A shell will need to assign ownership of the console port it was using to the launched process and then wait for that process to complete before resuming reading input from the port.

In modern operating systems, the shell always launches a new process when a command is run (unless it’s a built-in command). Early versions of UNIX, however, didn’t work that way. Due to memory constraints, what they did was to replace the running shell with the process being launched. They basically wrote a small bootloader that would load in the next executable. When that program completed, the bootloader would run again and reload the shell. It took the authors a while to figure out how to do that and still maintain the environmental state that needed to persist across program loads.

With the console parsing command lines and launching processes, it could always launch the command as a new process. However, there was a problem I ran into early on about what happens when all available process slots are consumed. In that case, there’s no way to launch process management commands like ps and kill. The way I had dealt with this was to always have one process slot reserved for these kinds of commands.

So, there was a question about the best way for my shell to launch processes. This is an embedded OS and process slots are extremely limited. As I’ve mentioned before, it can currently only support up to eight (8) processes and three (3) of those are taken by kernel processes. With the one reserved process slot, that left four (4) slots open for general user commands and one (1) for special commands. The shell is considered a user process and my ultimate goal is to support multiple concurrent shells, so I was faced with ultimately consuming at least two (2) process slots for shell processes.

If the shell continued to launch a new process for all user commands, then, with two shells running in parallel, there would only be enough slots to run one general user command per shell. Not much of a multitasking OS environment in that kind of situation. Also, the fact that there was only one reserved slot for process management commands meant that only one user could run such a command at a time, which is a poor - if not outright dangerous - design.

I clearly needed a better design. One possibility would be to go back to the way early versions of UNIX did things and always replace the shell with the new command to run. However, I didn’t like that idea for two reasons: (1) it’s not realistic and not representative of how modern operating systems work and (2) it removes the possibility of having background processes running. It would also essentially mean that all other process slots were wasted, although I could potentially just use the memory for the extra slots for the dynamic memory manager.

Then I had a realization: I could use a hybrid approach. I could make process management commands replace the running shell and all other commands be launched as new processes. This approach had the additional benefit that it eliminated the need to reserve a process slot for the process management functions. Since they would replace the running shell process, they would always be guaranteed to have an available process slot to run in. So, in five (5) user process slots with two (2) concurrent shells, there would be three (3) available process slots for any general user commands to run in. Still not super, but better, and not terrible for an embedded OS.

So, that’s ultimately the design I went with. As you can probably imagine, there was quite a bit of rework that had to be done to support this kind of hybrid system. I decided that the simplest thing to do would be to designate certain process slots as shell slots and have the scheduler check each iteration of the loop to make sure they were all running. A shell that wasn’t running would indicate that it had been replaced by a special command and that command had exited, so the scheduler simply needed to restart the shell in that case.

Getting the shell to block waiting for a spawned process that took over the console port to complete, however, turned out to be a bit of a challenge. The raw parser in the console never had any concept of this. The shell needed to wait for a message from another process to tell it that the spawned process had completed. Also, ownership of the console port would have to be passed back and forth between the shell process and the process that it spawned. There was work in the console that had to be done to support that.

It took me about a day and a half to get all that working. At that point, I had some other chores to take care of and I didn’t get back to this until two days ago.

Now that I had the basics of the shell process running, the next step was to add in process ownership so that a user could login, take ownership of the shell, and launch processes under their personal ownership. Other users should be forbidden to kill another user’s processes (unless that other user is the root user). When the user logs out, the shell should return to an unowned state.

Since the goal here is to be UNIX-like, I decided that permissions and ownership would be determined by a numeric user ID. That meant defining a new UserId integer type and adding that to the descriptor structure for running processes. I defined two well-known user ID values: ROOT_USER_ID (which I set to 0) and NO_USER_ID (which I set to -1). The fact that the NO_USER_ID value was negative meant that the UserId type had to be signed. I used an int16_t. This allowed me to use user ID values that were similar to what’s used on UNIX-like operating systems (i.e. starting normal user IDs at 1000).

Getting the user ID of the running process and setting it both had to be scheduler operations since only the scheduler has access to the data structures needed. Getting the ID was pretty straightforward, but I debated what to do about setting one. It seemed pretty clear to me that a running process should be able to set its own user ID if its current user ID wasn’t set, but what if the user ID was set or what if a request was being made from a different process?

I couldn’t come up with a use case for one user process assigning ownership of another. Also, I concluded that the use case of a process assigning itself to a user other than the one it was currently assigned to would be a security risk since that potentially means that the process could elevate its own permissions. So, the only two cases I allowed were a process going from unassigned to assigned or from assigned to unassigned. This seemed like both the simplest and most-secure model to me.

Even though it wouldn’t be valid for one user process to assign ownership of another one, ownership does have to propagate from a launching process (the shell) to a spawned process. This is the responsibility of the scheduler. Strictly speaking, the scheduler doesn’t care if the spawning process is a shell (which may be a good thing down the road), it simply copies the ownership from the process that launches to the process that’s launched. When a launched process completes, it unassigns itself again.

Well, usually, anyway. For a process that’s taken over the shell’s slot, it doesn’t make sense to unassign itself when it completes. When the scheduler detects that the process is no longer running, it will launch the shell again. The process slot needs to maintain the ownership it had when the shell launched the process in that case. So, process ownership is only undone if the command that was launched was not a shell command and, therefore, did not replace the shell in its process slot.

This allows for the shell process to be a little intelligent about requesting a login from the user. The process can inquisition itself to get the ID of the user that currently owns it. If it detects that it’s already owned by a valid user on startup, it bypasses the login prompt. If it’s not owned, it displays a one-line summary of the OS and prompts the user for login credentials until a login is successful.

That brings me to authentication. There’s currently no file system available here, so user information had to be stored in the source code. I could have just stored the password for the user plaintext, but I didn’t want to do that, even if the passwords were documented in comments in the code. I wanted there to be at least SOME rudimentary level of security at this stage. What I chose was to use a SHA1 digest as the storage mechanism for passwords. (Yes, I know that’s not considered secure these days, but the source code for the algorithm is easy to come by. I’m not going for bulletproof at this stage.)

I have a SHA1 algorithm that I use in my own code, and I intended to use it here. There was a problem with this, though. The algorithm decleares quite a bit of stuff on the stack as local variables. When I added it up, it was around 450 bytes. That’s dangerously close to the stack size limit of a single process. Add on a few more bytes due to the process being called as a function plus the local variables of the shell itself and I could very well have overflowed the stack. However, the dynamic memory manager still has over 1 KB available. So, I changed the algorithm to require that its working variables be passed in and made the calling code allocate and deallocate memory around it. After a few minor glitches with sprintf, this worked flawlessly.

So!!! I now have a working shell that allows for login and logout, execution of both shell commands and regular user commands, and protection mechanisms in the scheduler to ensure that only a process owner or the root user can kill a process. I can login as one user, logout, login as a different user, and verify that I can’t kill the process that the other user started unless the second user is root.

The software part of being able to support multiple concurrent users is now in place. I’ve ordered a breadboard kit to start playing with the GPIO pins on the Arduino that I plan to connect to a Raspberry Pi that I have. My goal is to be able to run a command in one shell that hangs the shell, then kill it from the other one and resume work in both shells. That should be up in pretty short order at this point.

The next software portion I plan to work on after this is the scheduler. Apart from handling interprocess messages, all it does right now is just run processes in a round-robin manner. It doesn’t even do any state checking on them, it just iterates over every element in the process array. One thing I want to do is to have a proper set of queues and move processes among them depending on their states. Another thing I want to do is change the way that user processes are multitasked. I don’t like that it’s still mandatory for user processes to explicitly call yield. So, I plan to switch user processes to preemptive multitasking. I think cooperative multitasking is still the most-efficient way to manage the kernel processes, so I will leave that in place.

Things are getting more functional (and exciting) now! To be continued…

Table of Contents