This argument has a serious problem, however: segmentation was not (originally) designed for memory protection, either. The 8086 had no memory protection, and all x86 CPUs since then retain this when operating in real mode. In the original design, segmentation was a means of simplifying the addressing, by allowing a 20-bit memory address space be mapped to 16 address lines. It was designed that way in part to make it easier to port 8-bit 8080/Z80/8080A code, by allowing 16-bit segment-local addressing, but the primary purpose was to save four address pins in the CPU's DIP with the understanding that you'd just take the hit from double-dipping when you needed a wider pointer. No matter how you handled the segments, a program could always use a FAR address to get to anywhere in the address space and the only penalties were in performance and memory use.
There is no requirement that it was designed for memory protection. The 32-bit offsets used in RIP-relative x86-64 code was not intended for protection either, but it can still be used that way. By putting code out-of-scope (except for specific references), we narrow down the effect of bugs which makes them easier to find. If you have problems with the network driver, you are pretty sure the bug originates there, and not in the video driver. Something that the packed-flat-memory model cannot guarantee.
While memory protection based on segments was added in the 80286, then redesigned and expanded in the 80386, memory protection as no more the purpose of segmentation than it was for paging - and unlike segmentation, a certain amount of memory protection has been part of the paging system from the start.
Paging alone is not much better than using physical memory directly, and using physical memory directly would boost performance considerably, especially with the 4-paging levels in x86-64. Sure, totally fraudulent pointers are likely to generate page-faults, but limit violations or mis-calculated addresses are not.
Paging is NOT used for protection because data structures in programs are not page-aligned. This is where segmentation comes in.
I am unclear on just what it is you are arguing for at this point. Is the issue that
- a page is too large for most individual data structures, and thus wasteful if each is given a separate page (or, conversely, that you would need to pack multiple data structures into each page, which might expose them to unwanted access if more than one process is sharing a given page which holds data that shouldn't be shared - I can't see why anyone would do that);
- that it is too small for some, and thus contains hidden breaks in the data which could affect locality;
- that the requirement for each page's permissions to be set rather than having all the related ones set at once for the segment is inefficient compared to segment selectors (except that you can group pages, by page directory entry, so that makes no sense), and/or that the system is too likely to handle the page permissions incorrectly (except that you said you are using both pages and multiple segments, which means that actually adds complexity over a flat layout),
- something I have missed entirely?
If Intel had designed for 32-bit selectors, it would be natural to map every object to its own selector. Unfortunately, when Intel designed their 32-bit architecture, they didn't extend selectors to 32-bits, rather kept them at 16-bits. That means a segmented OS needs to make compromises in how it uses segmentation not to run out of selectors. That's why I have protection at the level of the driver, and then also map some major objects, like thread descriptors and other objects that are not likely to be produced in large quantities to selectors. Things like FS buffers use flat addresses because they cannot be mapped to selectors.
All page-level protection will operate on page sizes, and thus never maps naturally to software structures. Therefore, paging doesn't support exact limit checking, which segmentation supports.
OK, I am lost here. I was assuming that your argument was that segmentation was more fine grained than paging, but not only is this extremely coarse-grained (if I am reading this right, you aren't even separating the heap and the stack!), it also isn't actually memory protection at all.
Each thread has it's own kernel stack that is mapped to a selector (and thus has exact limit checking). The heap can allocate both selectors and linear addresses, and the selection is based on the above compromise that 16-bit selectors cause.
Each userland process can still read and write any other process' userland segments; you are simply enforcing non-access by convention, a pure honor system. True, you would have to hand it a 48-bit pointer for it to know where the other process' memory is ahead of time - but that does nothing to stop it from trying to scribble over another process' memory by generating a random segment selector address and either segfaulting or trashing something, with some parent process endlessly restarting it when it faults.
Currently, userland uses C/C++ and a flat memory model. Processes are separated by paging just as they are in Windows and Linux. I even implemented the fork and exec functionality of Unix recently. It's only the kernel that uses segmentation. Although it is certainly possible to add a segmented userland executable format since all references passed to kernel use 48-bit addresses. I once had such a format, but I no longer use it.
Now, I am not going to say you are wrong, or even argue specifically against segmentation when discussing a OS that is explicitly meant to only run on x86-32. I am trying to understand why you disagree with conventional wisdom on this, and what you are seeing as a problem in a flat memory model.
I think I answered that above and in another post. The flat memory model packs code and data in such a way that there is no separation between drivers, and in fact, all drivers operate in a shared context.
Hell, I agree that it is tempting to use segmentation some ways, and that the flat model requirement in GCC is one of the main reasons it isn't use more. However, even I were thinking of using segmentation - which is an option if I decide to do a 32-bit x86 implementation of Kether - I would not bake it into the design, but bury it in the runtime synthesis. Since I only know of one other OS that uses runtime synthesis, and it was written in 68000 assembly language, it is safe to say that trying to abstract away segmentation on a design meant for multiple implementations would be pretty problematic.
Probably. I have ported a few C drivers to my OS (ACPI, FreeType), and it wasn't a big problem, but there was a need for an assembly layer to interface with the OS. Today, I sometimes decide to use C for a driver, but I almost always end up with an assembly layer for the interface, so the code is not portable.