Design Choices of my OS

SoulofDeity · **Joined:** Wed Jan 11, 2012 6:10 pm **Posts:** 193

While I mostly want to keep it proprietary, I figured that I'd create a topic to share some of the design choices I've made for OS I've been working on. It's heavily Unix-inspired, but there are a lot of differences.

First up is the structure of the filing system. I really wanted to clean up that mess

Code:

app           applications                  (same purpose as /opt)
com           common files                  (same purpose as /home and /root)
  <username>
dev           devices
env           environments                  (same purpose as /usr)
  <envname>                                 (same purpose as /usr/local and /boot)
    lib           libraries
    man           manuals
    obj           objects                   (same purpose as /bin and /sbin)
    reg           registry data             (same purpose as /etc)
    res           resources                 (same purpose as /share)
    var           variable data
mac           machine                       (same purpose as /proc)
net           network
sys           system files
vol           volumes                       (same purpose as /mnt and /media)

Note that 'same purpose' here does not mean 'identical'. Take for example, the 'obj' directory. While it has the same purpose as 'bin' on Unix systems, it doesn't necessarily have to contain executable files. There is really no difference between a normal object file and an executable file aside from the fact that the later one has less relocations most of the time. Part of the reason I chose the name 'obj' instead of 'bin' was that I really don't like the word 'binary' being used instead of 'program' for technical reasons. There's also no difference between static and nonstatic objects files, so having 2 separate directories for them is rather foolish imo. The modern specification tries to re-define 'sbin' as 'system binaries', but lets be honest, that's what permissions are for. Having a dedicated 'games' directory is also kind of stupid imo.

The 'app' directory will contain applications. Each application is a subdirectory or *.a archive file. The benefits of this approach are that each application can contain metainformation and embedded resources regardless of the format of the object files. The self-contained archive files are similar to how Mac uses *.dmg files. One idea I had in mind is that you can execute a subdirectory like a program if it contains the metainformation for an application.

The 'env' directory will contain environments. It's like having multiple /usr/local directories on Unix systems. One example use for it is to create a separate environment for each user, but it's not limited to that alone. It may also be used to have both 32 and 64-bit environments, which may be used selectively depending on the application being executed. With the addition of environments, the need for a 'boot' directory is nullified (since a tmpfs can simply be another environment). The 'run' directory can be put back in 'var' where it belongs. Additionally, I'd prefer that 'var' be used instead of 'srv'. They literally have the same exact purpose, they only confuse people about where to put their stuff at and in some cases lead them to break things.

The 'mac' directory replaces the 'proc' directory. The name 'proc' is misleading imo because the directory contains a lot more information than just data about processes.

The 'net' directory was added as a mount-point for machines on the local network.

The 'vol' directory is a merge of the '/mnt', '/cdrom', '/floppy', '/media', etc. directories. Having this crap spread all over the filing system is annoying to me. The two reasons I chose not to stick with '/mnt' were the lack of defined structure and the fact that I'd like to get rid of sexual joke names like 'touch', 'mount', 'finger', 'fsck', etc.

The 'reg' directory will replace the 'etc' as a place specifically for registry data, and nothing else. This is essentially what the 'etc' directory has became over the years anyhow; and it's purpose is more apparent to people coming from a Windows background.

The 'com' directory will replace the 'home' and 'root' directories. Why? Because having homes inside my home is an inception I can do without and as far as I can tell, the 'root' directory is unneeded.

UPDATE #1:

I've been thinking about creating a 'streaming system' to manage how audio streams are processed. Like in windowing systems where you normally render windows across multiple or individual displays; the streaming system would let you render streams across multiple or individual channels. It'd basically work the same way. You'd have a Channel Server, Stream Manager, and Filter Toolkit. To get audio input or output, you'd first open a channel and create a stream to use. Then you'd just use the filter toolkit to create filters that manipulate samples of the stream. Like with widgets, while you can create custom filters, there would be a lot of preset filter types for things like echo and reverb.
[EDIT] Now I think about it, this setup would actually be a really good way of handling video input from cameras as well.

UPDATE #2

In addition to a 'public' directory, I think each user should have a 'mobile' directory that's syncronized across devices. Perhaps the underlying mechanism here would be that each user is automatically assigned a GUID, but it may be manually specified. When 2 or more devices have a home directory for a user with the same GUID, the timestamps for each file in the mobile directory on each device will be compared and most recently updated files will overwrite the older ones on each device.

UPDATE #3

Decided on a name. Originally, I was just using '?' as a placeholder. When I asked myself what wanted to call it, the Yoda-speak translator in my brain responded, "'What', I want to call it.". The name is thus 'Que', the Spanish word for 'what'. Just a funny coincidence, 'Que' (pronounced 'kay') is based on Unix systems which are often labelled as 'X'. A single letter connotation. 'K' looks and sounds like 'X', but it's not! Just for the added humor, I think I'll use '¿?' as the logo. One advantage to this is that it can be displayed in the terminal since both characters are part of the extended ASCII character set.

UPDATE #4

I've been thinking a lot about the object file format. While I do want to add PC support, my primary focus is embedded systems. Since you may have a small amount of RAM and lack a MMU; I want to enforce that all libraries are linked as PIC and all programs are relocatable.

SpyderTL · **Joined:** Sun Sep 19, 2010 10:05 pm **Posts:** 1074

I like the way you are headed, but then again, I'm more of a Windows guy.

Have you considered just naming your folders what they actually are, instead of the traditional 3 character convention? I'm just not sure what the point is, nowadays. Are you just trying to save hard drive space? Or save keystrokes? Or maintain some sort of "link" to traditional Unix naming conventions?

I would just consider using more descriptive folder names, just to make it a bit easier on the new users. But, other than that, it looks good.

SoulofDeity · **Joined:** Wed Jan 11, 2012 6:10 pm **Posts:** 193

SpyderTL wrote:

I like the way you are headed, but then again, I'm more of a Windows guy.

Have you considered just naming your folders what they actually are, instead of the traditional 3 character convention? I'm just not sure what the point is, nowadays. Are you just trying to save hard drive space? Or save keystrokes? Or maintain some sort of "link" to traditional Unix naming conventions?

I would just consider using more descriptive folder names, just to make it a bit easier on the new users. But, other than that, it looks good.

Mostly, it's to save keystrokes when working from the terminal (the same reason Unix originally had mostly 2 and 3 letter commands and directory names). One thing I put a lot of time into was making the names somewhat more intelligible and making sure that the purpose of each change wouldn't reuse names defined in the FHS unless they were truly identical in structure. Portmanteaus may sound out nicely, but they kind of mess with your brain a little.

When working from a GUI, these directory names will be abstracted away and you'll just see the more descriptive names.

thomasloven · **Posted:** Fri Feb 20, 2015 3:21 am

If you want to save keystrokes, give them intelligible names, but make sure they have different starting letters, and use tab completion.

iansjack · **Posted:** Fri Feb 20, 2015 3:29 am

thomasloven wrote:

If you want to save keystrokes, give them intelligible names, but make sure they have different starting letters, and use tab completion.

That makes a lot of sense. It would be, IMO, a poor design choice, with a lot of potential confusion, to have different names for directories depending upon whether you were using a terminal or a GUI. TBH, I'm not convinced that people worry too much about directory names.

thomasloven · **Posted:** Fri Feb 20, 2015 10:52 am

My mac does that since the OS is translated to Swedish.
I solve it by never leaving the terminal...

SoulofDeity · **Joined:** Wed Jan 11, 2012 6:10 pm **Posts:** 193

Tab completion won't save keystrokes in cases where you have a directory names like 'computer' and 'commands' in the same place. Best case scenario, there are only 2 directories starting with the same letters and you only have to make 3 keypresses to get one of the two. Worst case scenario, you have 3 or more directories starting with the same letters and you have to make 5 or more keystrokes. It's best just to keep the names short.

Also bear in mind, I'm mostly targetting embedded systems where reading from ROM may be a very costly operation.

thomasloven · **Posted:** Fri Feb 20, 2015 6:03 pm

thomasloven wrote:

If you want to save keystrokes, give them intelligible names, but make sure they have different starting letters, and use tab completion.

SoulofDeity · **Joined:** Wed Jan 11, 2012 6:10 pm **Posts:** 193

thomasloven wrote:

If you want to save keystrokes, give them intelligible names, but make sure they have different starting letters, and use tab completion.

The problem I'm talking about is when the user creates directories. and...

SoulofDeity wrote:

Also bear in mind, I'm mostly targetting embedded systems where reading from ROM may be a very costly operation.

There was an embedded platform I worked on before that had no access to external memory, only flash ROM. On an 8MHz cpu, it was very expensive (time-wise) to access flash memory. Or even perform normal operations for that matter. The default kernel for the device used a flat-structured filing system with 8 letter names to be conservative of the meager 64KiB RAM the device had (divided into 2 32KiB banks, but the first 8KiB of each bank was taken by the kernel, so you really only had 24KiB of contiguous memory). If you add in the ability to have directories, storing the entire FAT in RAM is just unreasonable. You can cache a part of it, but when there's a cache miss you're looking at waiting about 10 or more seconds to copy a part of the FAT to RAM. I'm not saying tab completion is bad, I just want something soft and fuzzy to fallback on.

bluemoon · **Posted:** Fri Feb 20, 2015 11:14 pm

SoulofDeity wrote:

Also bear in mind, I'm mostly targetting embedded systems where reading from ROM may be a very costly operation.
There was an embedded platform I worked on before that had no access to external memory, only flash ROM. On an 8MHz cpu, it was very expensive (time-wise) to access flash memory. Or even perform normal operations for that matter.

Have you consider upgrading the hardware, which may eventually solve the problems instead of workaround with special software designs? I think getting a higher clock chip is even cheaper than find a 8MHz cpu now, due to production volume.

Or are you talking about writing an entire OS for MCU, which is much more limited?

SoulofDeity · **Joined:** Wed Jan 11, 2012 6:10 pm **Posts:** 193

bluemoon wrote:

Or are you talking about writing an entire OS for MCU, which is much more limited?

^ Bingo.

An update on the topic of a 'streaming system', I've been planning out the API and had this peculiar idea. A brief description...

A channel is an object consisting of an input stream, an output stream, and a set of filters. At least one of the 2 streams must not be null, and they may reference the same stream. When the input stream is null but the output stream is not, the channel is write-only. When the output stream is null but the input stream is not, the channel is read-only. All channels have 5 basic operations: open, close, flush, attach filter, and detach filter.

A stream is a pointer to data that can be either incremented or decremented, representing the flow of data. Streams can be in either a volatile or nonvolatile state. When a stream is volatile, it means that it may change direction at any point in time. When a stream is non-volatile, it means that it always flows the same direction. You cannot operate on a stream directly, you can only operate on samples of it. All streams have 4 basic operations: seek position, tell position, read samples, and write samples. The first 2 functions, seek and tell, only operate on volatile streams.

A filter is a function that operates on samples of a stream.

So this a rough idea, but basically the api for operating on audio/video channels is nearly identical to how you'd operate on files.

EDIT:
Another thing, to abstract away the need to operate on streams after creating a channel, it'd probably be best if channels also had 'send data' and 'receive data' operations. These would be in turn call the proper sampling and filtering instructions.

SoulofDeity · **Joined:** Wed Jan 11, 2012 6:10 pm **Posts:** 193

So, to simplify the filing system a bit, I'm taking both approaches. Using 3 letter names for the actual directories, and creating human-readable symlinks to them. The root directory will have a '.hidden' file with the same format used by nautilus which lists app, dev, env, etc. directories to be hidden. This introduces a problem though in that symlinks don't work on several filing systems like FAT32. So, to work around this limitation, I want to add a new '.symbolic' file that lists symbolic files. These are just normal text files with a path stored in them. When you try to access a file that doesn't exist, the system will look for a '.symbolic' file and check if the name of that file is listed in it before it decides to toss back an error.

When you try to copy or move symlinks from one filesystem to another, the conversion will be done automatically.

In other news, I'm highly considering the use of PE files as my executable format. The reasons being:

They're much faster to load when they don't need to be rebased.
They allow the use of ordinal numbers for symbols, which can be used to make the files smaller and load faster
The files have DOS stubs located at the start of the file; on 8/16-bit embedded systems, I can just use this instead of the whole PE file, resulting in much more compact binaries. On the other hand, ELF files do not support 8/16-bit architectures (and no, simply ignoring the upper 16 bits of all addresses is not 'support', it's wasteful)
It is very easy to embed metainformation and resources into PE files. While you can do this for ELF files, there's not really a strict standard about this (that I know of), and exposing those resources to the program itself is tedius.
PE is like, the 'official' executable format of .NET and Mono. As far as I know, ELF does not currently support this.

EDIT:
On second note, I've decided to stick with ELF, but with an exception. There will be a 'tiny-elf' file prepended to each file that is used for 8 and 16-bit architectures, like how the MZ executable is prepended to PE/COFF.

This 'tiny-elf' format was made specifically with 8/16-bit architectures in mind, both with and without a MMU. For that reason, the format was designed to be very conscious of size. Each file begins with the following 16-byte header:

Code:

u16  signature          ; "‰e" (0x8965)
 u8  headersize         ; headersize + 1 = header size, max = 256 octets
 u8  endianness         ; endianness of the machine (and file)
u16  machinetype
 u4  protection         ; link only and allowed rwx page protection
 u4  extrasections      ; which extra optional sections to use
 u8  emptysections      ; bitfields for which sections are empty
u16  entrypoint         ; entry point for the program
u16  checksum
u32  filesize           ; size of the file in octets

The endianness can be unknown (0x00), big (0x01), or little (0x02). All other fields in the file will use this endianness.

The machine type must be an 8 or 16-bit machine id. Two special id's are reserved for unknown (0x00) and custom (0x01). For custom id's, how the files are identified is up to implementation. It may be used in the case where the number of 16-bit unique id's has ran out so that a field in an extended header may be used.

The uppermost bit of the 'protection' field specifies whether the file is for linking only. For programs and overlays, this bit should set to 0. For libraries and relocatable object files, it should be set to 1. The meanings of the allowed protection flags are as follows:

Code:

r--  read protection allowed, has .rwtext
rw-  read and write protections allowed, has .rwtext and .text
r-x  read and execute protections allowed, has .rwtext and .data
rwx  all protections allowed, has .text + .rodata + .rwtext + .data
-w-  same as 'rw-'
-wx  same as 'rwx'
--x  same as 'r-x'
---  same as 'r--'

The 'r' flag pretty much exists only for aethetic reasons and is ignored. The 'extrasections' field specifies which optional extra sections are used:

Code:

bit 3 - .bss
bit 2 - .rel.*
bit 1 - .rela.*
bit 0 - .symtab

The 'emptysections' field tells whether each section is empty (1) or not empty (0). Each bit represents the following sections:

Code:

bit 7 = index 0 = .text       (read-only text section)
bit 6 = index 1 = .rodata     (read-only data section)
bit 5 = index 2 = .rwtext     (writable text section)
bit 4 = index 3 = .data       (writable data section)
bit 3 = index 4 = .bss
bit 2 = index 5 = .rel.*      (static relocation table)
bit 1 = index 6 = .rela.*     (dynamic relocation table)
bit 0 = index 7 = .symtab     (symbol table)

The purpose of this is apparent for the '.bss' section, but it also has other uses. For example, someone using a JIT compiler might want to allocate space for a '.rwtext' section at run time while keeping their main code in the '.text' section.

The checksum is the sum of all 16-bit values in the file (within the bounds of the 'filesize' field).

After the header comes an array of section info entries. These entries always appear in the indexed order for the sections and have the following format:

Code:

u16  loadaddress        ; address to load the section at
u16  size               ; size of the section in octets

After the array of section info entries is the raw data for each section in the same order.

Each relocation in the relocation tables has the following format:

Code:

 u8  sectionindex       ; index of the section that the relocation is in
 u8  type               ; the type of relocation
u16  offset             ; offset of the relocation

The only currently specified relocation type is null (0x00) for entries removed from the table.

The '.symtab' section contains symbols for all of the relocations in the format of null-terminated strings.

Anything after the offset pointed to by the 'filesize' field is ignored by 8/16-bit loaders. This way, a normal ELF or ELF64 file can immediately follow the stub.

---------------------------

EDIT#2:
Many people may think this is a dumb idea, but I'm going to redefine the *.c extension to mean 'compilable file' instead of 'C code'. There are well over 100 different programming languages that spam the use of several file extensions I need like '.f', 'm', and 'r'. It makes far more sense in my eyes to just specify all of them as generic 'compilable files' and selecting the proper language when passing them to the compiler.

SoulofDeity · **Joined:** Wed Jan 11, 2012 6:10 pm **Posts:** 193

So I tried running a .NET application in a Windows 8 VM today and it threw a fit. After digging around for a few hours, I discovered that it was because apparently, the drivers were only supporting OpenGL 1.1. I'm not the first to have this problem, there's people all over the internet talking about how Win8 broke OpenGL drivers; and they're having to modify old unsigned drivers to get things working again.

Then I got to thinking, what's the point in using a virtual machine if the application is still platform dependent? It makes no sense. You're just sacrificing speed. This applies to Java as well. So, what I want to do is create a virtual machine/platform abstraction layer. It'll have a single interface for devices and absolutely will not allow peeking at hardware or executing/linking against programs and libraries not built for it. By doing this, all of the software written for it will be 100% platform independent. Porting the VM/PAL to a different platform would mean that anything ever written for it is guaranteed to work on that platform; thus the software never becomes obsolete. The only factor that matters is performance.

It's a very ambitious project though. It requires defining new standards for pretty much everything. And they're not something that can just be changed on a whim; once they're written, they're set in stone (otherwise they'd become the cause of what they're seeking to fix). The good outweighs the bad in the long run though.

cmdrcoriander · **Joined:** Tue Jan 20, 2015 8:33 pm **Posts:** 29

SoulofDeity wrote:

Then I got to thinking, what's the point in using a virtual machine if the application is still platform dependent? It makes no sense. You're just sacrificing speed. This applies to Java as well. So, what I want to do is create a virtual machine/platform abstraction layer. It'll have a single interface for devices and absolutely will not allow peeking at hardware or executing/linking against programs and libraries not built for it. By doing this, all of the software written for it will be 100% platform independent. Porting the VM/PAL to a different platform would mean that anything ever written for it is guaranteed to work on that platform; thus the software never becomes obsolete. The only factor that matters is performance.

Uhhh... how is this not Java? Or at least conceptually really, really similar?

Rusky · **Joined:** Wed Jan 06, 2010 7:07 pm **Posts:** 792

As soon as you start versioning the VM/PAL you're back where you were with some people stuck on the equivalent of GL 1.1.

OSDev.org

Design Choices of my OS

Who is online