Paging and Dynamic Loading/Relocation Questions

~ · Post by ~ » Wed Jun 21, 2017 7:08 am

I'm currently looking for the best way to initially make use of paging and given that I need to load modules dynamically into my kernel core, I think that I have come up with learning some details, but I don't know if it's a complete view on its own.

It seems that paging was never intended as a relocation mechanism. The very only thing paging does is mapping physical memory addresses to virtual addresses, but since you can only map a virtual address once in a process, you woldn't be able to map all of the dynamic loaded libraries to, say, address 0x100000 for the same process, so you would really have a limited number of libraries loaded simultaneously depending on the size of the memory.

It probably means that if a Linux shared library or a DLL is loaded, for example, at virtual address 0x100000, then no other code or data in the whole system could use that address in its own virtual address space, unless there's some API that allows to use that virtual address with the condition of becoming unable to use that library.

Maybe that's a reason for the higher-half kernel (to have a half for programs and some user libraries and another half for the kernel and system libraries), but a system could probably be loaded cleanly without that, specially seeing how now the half in 64-bit systems might not be practical, but I don't know enoug about this point.

I say that paging isn't really for relocation because we could recompile at run time, write position-independent code, use protected-mode or real-mode segmentation, instead of paging. We could even use the same physical address for all libraries, keep track of which one is currently loaded, and load back and forth the one we need, but obviously this last thing would produce a very unclean and confusing system.

It seems that if we want to relocate something, the only really intended thing to do so is to actually know which parts of the opcodes to relocate depending on where the program was actually loaded, by the time the code is loaded, and do so manually. Then we will have a relocation system portable to flat memory, real-mode, paging, protected-mode segmentation...

Shared libraries, dynamic link libraries, drivers, etc., are in reality like kernel modules, so they are only loaded once and mapped for any processes that want to use them. They need to be relocated first of course, except maybe for very few cases. The only detail is that all those libraries or drivers can be loaded with different privilege levels, but they are still basically kernel modules.

______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
So, I'm figuring out that if I want to use paging for dynamically loading modules into my kernel and programs, I'd actually be better of by developing a relocation system instead, to begin with.

But for that it seems that I would need my relocator to know all of the x86 opcodes in existence for 16, 32 and 64-bit mode, then provide it with a table with the start address of each instruction, so that it relocates as indicated.

It seems to me that I cannot do this with paging alone. If I were to load dynamic libraries in their own virtual address spaces using multitasking capabilities for it, I would probably need some inter-process calling functions to make use of such libraries, but then I would need some way to distinguish between libraries, if I load them to the same virtual address in different virtual spaces. Then I would need to indirectly call those libraries via a position-independent mechanism like an interrupt that selected between them. But it seems too unclean of a code base.

It seems that whichever method I choose to use to implement dynamic loading, it has its own different yet equally complex details to implement, but a generic relocator for code and data seems to come before memory management. It seems to be portable to any memory system/layout.

The only other option would be to link modules statically into the kernel, but even in my OS, which is extremely basic, the need to dynamically load stuff at any address can no longer be escaped without becoming stuck with lack of system capabilities.

I thought that I could have the kernel load a binary, program or library, at any address, and then return me the base load address probably in EBP/WIDEBP, but then I would probably need to rewrite all of the code I've ever created to support this position independence, and if I load more than one library it would also become very difficult and very unclean to keep a base address initially in EBP for each one and switch between them, and being able to store and find it repeatedly.

Using relocations without paging, just flat memory, would probably be like having a purely single-tasking system. Paging only makes it easier and more secure to implement multiple process states but still is nothing more than a facility to disguise physical addresses with virtual ones, with all of the implicit protective effect that might have, but it's nothing more than that.

We could still implement a multitasking system with a good relocator where we could switch tasks and only use physical addresses ever, and we could still implement good memory protection, but then we would rely much much more on disk to swap out memory regions or purely physical pages that we need to use by another process mapped to the very same regions, which are taken up by a currently inactive task or set of data.

Probably implementing such system with paging-like functions, multitasking and memory swapping capabilities without actually use paging could prove useful to make for a more robust system that would be fundamentally independent from any hardware paging or special CPU protection, but which later could benefit from that as an additional module. We could debug a system in this way, and when all functions are mature, we could add actual paging as a trivial additional step given that we would have been managing things with the same structure but purely in software. The only thing finally enabling paging would achieve would be decreasing swapping memory from disk and provide additional hardware-based protection from the CPU, and to avoid unmanageable fragmentation as a plus side effect of virtual addressing mapping, but frankly apparently not so much more at all.

So how do you do with or without a relocator and with or without paging, to support dynamic loading of kernel module,
library code, and then program code?

zaval · Post by **zaval** » Wed Jun 21, 2017 7:53 am

I think you have heard about PE/COFF file format. it makes use a relocation mechanism. it's all you need to do base relocations of your dynamically loaded binaries. ELF format as well has such a mechanism, I mean, there is possibilty to preserve relocation information for every piece of code requring it. I don't know much about ELF, but it seems to produce either non-relocatable executables or PIC libraries.

Given, address randomization is not considered (because with existence of NX alike bits, it is for paranoids

), it's good to have the same DLL loaded physically once (non-writable pages of course), this preserves RAM and makes loading programs faster. But then there is a problem for address collisions. It's possible to link DLLs at 0, and then choose dynamically its base. If a process A loads it first time, and the DLL goes at the base Base, then every process should have this DLL at the same Base, to not duplicate the code in RAM. generally it's not easy. There are system DLLs, they can be guaranteed to not overlap. But with a custom DLL, it's somewhat trciky. I think, there should be a mechanism deployed, similar to that of system DLLs. Where every DLL from a package that is going to be shared between possibly not related packages, should register itself for an address and range in VA. It would be a system wide load optimisation database, and with help of it it could be possible to avoid space clashes. And code duplication.
Again there is PIC approach. I don't know about it anything. It looks not worth trying for me.
So for dynamically loaded modules I am going to use PE base relocation mechanism. It fits well for my need and preference.

~ · Post by ~ » Wed Jun 21, 2017 8:03 am

zaval wrote:I think you have heard about PE/COFF file format. it makes use a relocation mechanism. it's all you need to do base relocations of your dynamically loaded binaries. ELF format as well has such a mechanism, I mean, there is possibilty to preserve relocation information for every piece of code requring it. I don't know much about ELF, but it seems to produce either non-relocatable executables or PIC libraries.

Given, address randomization is not considered (because with existence of NX alike bits, it is for paranoids ), it's good to have the same DLL loaded physically once (non-writable pages of course), this preserves RAM and makes loading programs faster. But then there is a problem for address collisions. It's possible to link DLLs at 0, and then choose dynamically its base. If a process A loads it first time, and the DLL goes at the base Base, then every process should have this DLL at the same Base, to not duplicate the code in RAM. generally it's not easy. There are system DLLs, they can be guaranteed to not overlap. But with a custom DLL, it's somewhat trciky. I think, there should be a mechanism deployed, similar to that of system DLLs. Where every DLL from a package that is going to be shared between possibly not related packages, should register itself for an address and range in VA. It would be a system wide load optimisation database, and with help of it it could be possible to avoid space clashes. And code duplication.
Again there is PIC approach. I don't know about it anything. It looks not worth trying for me.
So for dynamically loaded modules I am going to use PE base relocation mechanism. It fits well for my need and preference.

But in any case, it seems to me that for example if I write a tiny DLL, the virtual address it's loaded into will no longer be usable for anything at all in the system, or that DLL will become inaccessible.

So there should also be a way to tell the loader if a library is system-wide or local to a program, and a way to balance it ideally. It could be indicated in the binary, or it could be chosen/overriden with a different API call. The problem I still see to this point is that the more system libraries are loaded, the less free virtual addresses will be available, system-wide.

I could be wrong but even with paging, it would seem difficult to swap in and out libraries to the same virtual address.

It still seems that paging would only be useful to save on actual physical RAM usage, disguise addresses and avoid fragmentation, not so much to overlap different system-wide libraries to the same address, which seems unclean but could probably be done in a very advanced system, so it might not apply for the first iterations of a kernel.

zaval · Post by **zaval** » Wed Jun 21, 2017 8:12 am

for 64-bit, VA space is huge, no worry. allocating bases and ranges for both system supplied and package derived DLLs should be maintained on a per installation basis (some kind of service having its database in th registry, the loader will consult it when loading modules). system DLLs might be prelinked at their allocated bases with relocation stripped.
for 32-bit space, still I think it would be enough space to fit them. After all, if some DLL cannot be placed at its registered base, it could be base relocated as a resort.

iansjack · Post by **iansjack** » Wed Jun 21, 2017 8:16 am

Shared libraries are supposed to be position independent, so they don't need to be loaded at the same address in all processes.

If process A and process B both use shared library X, and load it at 0x100000 (for example) then it is true that those processes cannot use that address for another purpose. Why would they even want to? But any process that doesn't use that shared library is free to map that address as it sees fit.

The easy answer is to use long mode and forget about limitations of logical address space (for the time being, anyway). Unless you are writing an OS for a very specialist application it is reasonable to assume that 64-bit addressing is available (OK - 48-bit, but you know what I mean.)

~ · Post by ~ » Wed Jun 21, 2017 8:51 am

iansjack wrote:Shared libraries are supposed to be position independent, so they don't need to be loaded at the same address in all processes.

If process A and process B both use shared library X, and load it at 0x100000 (for example) then it is true that those processes cannot use that address for another purpose. Why would they even want to? But any process that doesn't use that shared library is free to map that address as it sees fit.

The easy answer is to use long mode and forget about limitations of logical address space (for the time being, anyway). Unless you are writing an OS for a very specialist application it is reasonable to assume that 64-bit addressing is available (OK - 48-bit, but you know what I mean.)

And what if it maps that address to something else, then loads library X, but doesn't know that it's loaded there too?

If it's a library mapped system-wide, the process won't be able to relocate it any more.

It sounds to me that mapping anywhere every time is possible only with locally-loaded libraries.

It also sounds to me that for system-wide libraries, even if a process doesn't specifically map an address for a given system-wide library, that address will not be used by anything in any process but by that library, or just left unallocated in a given process. Is that true?

iansjack · Post by **iansjack** » Wed Jun 21, 2017 11:42 am

I don't know about PE executables, but elf executables contain a list of shared libraries they depend upon. An executable can reserve the address space it needs. This need have no relation to the address space that another process will use as the libraries are position-independent. But it does make life a little easier if you reserve the same address space for shared libraries in all processes. With 48-bit logical addresses this is not a restriction (with current amounts of physical RAM).

zaval · Post by **zaval** » Wed Jun 21, 2017 11:57 am

It also sounds to me that for system-wide libraries, even if a process doesn't specifically map an address for a given system-wide library, that address will not be used by anything in any process but by that library, or just left unallocated in a given process. Is that true?

There is such a thing as dynamic loading, - when a DLL gets loaded "on demand", at some point at run-time. There is no possibility to know ahead of time if this will be the case, so it's better to not touch that range. Of course, an application could make its life harder by itself - demandingly allocating pages at these addresses, and then trying to load DLLs supposed to reside there. But the system will survive this perversion - it will copy those DLLs and relocate them.

I'd not bother myself with "space starvation", it's hardly possible. But maintaing this base/range allocation really is a burden. It becomes yet another memory management duty.

PS. PIC adds an overhead, right? Or brings some brain damaging complexity into this loading system.

iansjack · Post by **iansjack** » Wed Jun 21, 2017 12:42 pm

Position independent code does add a slight overhead, but it is much better from a security point of view as it allows executables and libraries to be loaded at random addresses. Which tradeoff is more important is a matter of choice.

simeonz · Post by **simeonz** » Wed Jun 21, 2017 2:04 pm

There are multiple issues involved. There are self-references and external references. There are different memory models.

Self-references can be handled by rip-relative addressing, but due to the x86 ISA limitations, this is cumbersome on 32-bit systems (for references to data). For 64-bit systems, this is comparatively trivial. The compiler has little incentive to avoid generating rip relative code for all self-references on x64 on both Windows and Linux and the result usually requires no fix-up.

For x86 (i.e. 32-bit), Windows will fix-up the self-references in DLL code once and try to keep the memory layout consistent between processes using a global DLL layout plan. This layout plan is then randomized once per boot. Linux by default uses RIP relative addressing (even on 32-bit) at the cost of some performance overhead and thus has greater flexibility for per-process randomization. Whether a shared object uses RIP relative code or not can be altered at compile time however, which I believe will result in copy on write fix-up at load time.

For external references, both Windows and Linux use indirection and trampolines in dedicated sections. Since the address fix-up of those is isolated to a compact memory range, no significant memory duplication is necessary.

Again, there are issues related to the memory model, such as 64-bit vs 32-bit maximal offsets, but those usually relate to additional performance overhead.

LtG · Post by **LtG** » Thu Jun 22, 2017 11:47 am

iansjack wrote:Position independent code does add a slight overhead, but it is much better from a security point of view as it allows executables and libraries to be loaded at random addresses. Which tradeoff is more important is a matter of choice.

Never been a fan of ASLR, and especially on 32-bit there are ways to defeat it due to relatively small address space to begin with, especially with 4KiB page alignment.. So I wouldn't say _much_ better security.

IIRC ASLR was never intended to stay, it was a short-gap, but like many such things has unfortunately stayed.

OSDev.org

Paging and Dynamic Loading/Relocation Questions

Paging and Dynamic Loading/Relocation Questions

Re: Paging and Dynamic Loading/Relocation Questions

Re: Paging and Dynamic Loading/Relocation Questions

Re: Paging and Dynamic Loading/Relocation Questions

Re: Paging and Dynamic Loading/Relocation Questions

Re: Paging and Dynamic Loading/Relocation Questions

Re: Paging and Dynamic Loading/Relocation Questions

Re: Paging and Dynamic Loading/Relocation Questions

Re: Paging and Dynamic Loading/Relocation Questions

Re: Paging and Dynamic Loading/Relocation Questions

Re: Paging and Dynamic Loading/Relocation Questions