OSDev.org

The Place to Start for Operating System Developers
It is currently Mon Sep 23, 2019 1:11 am

All times are UTC - 6 hours




Post new topic Reply to topic  [ 5 posts ] 
Author Message
 Post subject: Rate my object format and ABI
PostPosted: Sat Mar 23, 2019 7:30 pm 
Offline
Member
Member
User avatar

Joined: Sat Oct 16, 2010 3:38 pm
Posts: 636
I have recently designed an object file format for Glidix. Here is the specification:

https://glidix.madd-games.org/Gxo

I also devised an ABI for x86_64. The calling convention is mostly the same as System V, but passing of structs and unions is slightly simplified. The entry environment is also slightly different. Full spec here:

https://glidix.madd-games.org/Gxo_x86_64_supplement

I can currently actually link files in this format. Looking for any comments and suggestions for improvements :)

_________________
Glidix: An x86_64 POSIX-compliant operating system, aiming to be as optimized as possible, especially in graphics.
https://glidix.madd-games.org/


Top
 Profile  
 
 Post subject: Re: Rate my object format and ABI
PostPosted: Sun Mar 24, 2019 2:43 am 
Offline
Member
Member

Joined: Thu May 17, 2007 1:27 pm
Posts: 607
Looks mostly good. I did not read the entire relocation process though. Some comments:
  • You're using the same hash function as ELF. That function performs notoriously bad. In my compiler stack, I noticed that sometimes the chain lengths are extremely skewed (e.g. 50% of all symbols are in a single chain).
  • The major bottleneck of ELF relocation is the fact that it needs to search O(n) DSOs to find a symbol (compare that to the O(1) operations that are required to search for the symbol inside a DSO). If I were to design a new binary format, I'd make the providing DSO part of the symbol so that only one DSO needs to be searched.
  • Why did you choose to use a __init() function instead of a more flexible .init_array mechanism? IIRC, GCC (for example) does not even support __init() on x86_64 (and unconditionally sets --enable-initfini array in configure.ac unless you're cross-compiling) because of stack alignment issues: it's not enough to concat multiple functions to a single __init(), a modified prologue would need to be emitted to realign the stack. I'd do it the other way around -- get rid of __init and only use .init_array.
  • I'd consider renaming your SYM_SCOPE_WEAK to something else (SYM_SCOPE_OVERRIDABLE?). In the context of ELF, a weak symbol is a symbol that is allowed to be missing, it is not a symbol that can be overriden by a global one. For ELF, if symbol X is defined weakly in DSO A and globally in DSO B and A < B in the resolution order, the weak definition in A will override the global definition in B. (This is the source of a lot of confusion by people encountering ELF for the first time, partially due to buggy implementations in early glibc versions.)
  • Also be aware that your definition of weak symbols requires Omega(n) DSO lookups -- all DSOs need to be searched for an overriding global symbol. (Copy relocations prevent this issue for ELF.)
  • Can the ABI support varargs?

EDIT: To demonstrate why the second point is an issue, let's look at the seemingly lightweight gedit application, check how many DSOs it requires and how many relocations need to be done:
Code:
$ ldd /usr/bin/gedit | wc -l
78
$ ldd /usr/bin/gedit | cut '-d ' -f3 | xargs readelf -r | grep R_X86_64_GLOB_DAT | wc -l # Looking at GLOB_DAT should be enough, although this misses some other relocation types that also refer to symbols.
8745

In the worst case, those 8745 relocations need to be looked up in all 78 DSOs! If DSOs are part of the symbol name, this cost is decreased considerably.

_________________
managarm: A microkernel-based OS that is capable of running a Wayland desktop


Top
 Profile  
 
 Post subject: Re: Rate my object format and ABI
PostPosted: Sun Mar 24, 2019 6:46 am 
Offline
Member
Member
User avatar

Joined: Sat Oct 16, 2010 3:38 pm
Posts: 636
Ah, good call, I should probably clarify on varargs. The idea for that would be the same as System V: the arguments are passed as normal and the function spills them.

As for DSOs: I want to keep the functionality that symbols may be defined in a DSO not known at compile-time, or a DSO may even reference symbols which the executable is expected to define, etc.

I was in fact considering whether or not I should implement .init_array. I figured that the __init() function could instead be constructed as a list of 'call' instructions. I forgot about stack alignment, but this could be fixed by a dummy push/pop at the start and end. Would there be disadvantages to this approach?

As for the hashing function: I ran it on the symbols in my libc, and with 512 buckets, the largest chain I got was 10 symbols. Is there a better-performing hash function you could recommend?

_________________
Glidix: An x86_64 POSIX-compliant operating system, aiming to be as optimized as possible, especially in graphics.
https://glidix.madd-games.org/


Top
 Profile  
 
 Post subject: Re: Rate my object format and ABI
PostPosted: Sun Mar 24, 2019 9:34 am 
Offline
Member
Member

Joined: Thu May 17, 2007 1:27 pm
Posts: 607
If you generate the prologues etc. correctly, __init() should work of course. I think I'd still consider .init_array slightly cleaner as it requires no linker magic or correct order of input files. But that might be mostly personal taste.

For the hash function: this StackOverflow post has some comparison. AFAIK Python used FNV in the past with reasonably good results. DJB2 probably offers a good tradeoff between simplicity and quality. SipHash (SSE/AVX implementation by Google) is a high quality hash function while still performing surprisingly good (~3 cycles per byte for 64-byte strings, according to the original paper). However, its resistance against denial-of-service (by producing strings that intentionally hash to the same value) requires storing a key and is not needed in this context.

Out of curiosity, how many symbols does your libc contain in this example with 512 buckets?

_________________
managarm: A microkernel-based OS that is capable of running a Wayland desktop


Top
 Profile  
 
 Post subject: Re: Rate my object format and ABI
PostPosted: Sun Mar 24, 2019 3:33 pm 
Offline
Member
Member
User avatar

Joined: Sat Oct 16, 2010 3:38 pm
Posts: 636
Korona wrote:
If you generate the prologues etc. correctly, __init() should work of course. I think I'd still consider .init_array slightly cleaner as it requires no linker magic or correct order of input files. But that might be mostly personal taste.

For the hash function: this StackOverflow post has some comparison. AFAIK Python used FNV in the past with reasonably good results. DJB2 probably offers a good tradeoff between simplicity and quality. SipHash (SSE/AVX implementation by Google) is a high quality hash function while still performing surprisingly good (~3 cycles per byte for 64-byte strings, according to the original paper). However, its resistance against denial-of-service (by producing strings that intentionally hash to the same value) requires storing a key and is not needed in this context.

Out of curiosity, how many symbols does your libc contain in this example with 512 buckets?

There were around 700 symbols.

_________________
Glidix: An x86_64 POSIX-compliant operating system, aiming to be as optimized as possible, especially in graphics.
https://glidix.madd-games.org/


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 5 posts ] 

All times are UTC - 6 hours


Who is online

Users browsing this forum: Bing [Bot], Majestic-12 [Bot] and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group