Why is a GOT needed for x86_64?

rianquinn · **Joined:** Thu Jan 21, 2010 9:31 pm **Posts:** 16

I was hoping that someone could clear up a a couple questions for me.

I understand why a GOT is needed on 32bit, but why is it needed on 64bit? When I compile a simple library with a global variable, I see a relocation entry for that variable. Since x86_64 can do relative addressing, why doesn't the compiler just directly address the variable?

Also, when doing the relocation I see conflicting information for R_X86_64_GLOB_DAT and R_X86_64_JUMP_SLO. The specs say that the relocation is "S" or "Represents the value of the symbol whose index resides in the relocation entry". I read that as sym->st_value. Which doesn't make much sense to me because the compiler could just use that address itself since it clearly has it. In some of the code I read, I see mem + sym->st_value, where mem is the starting virtual address that the executable is loaded into. That makes more sense to me as it would mean your likely filling in an absolute address. Still not sure why that's needed on x86_64, but if that is the case, why does the documentation say "S"?

Thanks in advance,
- Rian

BASICFreak · **Posted:** Sat Oct 24, 2015 11:14 am

Quote:

I was hoping that someone could clear up a a couple questions for me.

I will attempt one of them:

Quote:

I understand why a GOT is needed on 32bit, but why is it needed on 64bit? When I compile a simple library with a global variable, I see a relocation entry for that variable. Since x86_64 can do relative addressing, why doesn't the compiler just directly address the variable?

GOT is not needed by x86 nor x86_64, it is needed by the ELF format (and maybe another format.?.). The reason it is used in x86_64 (even though the relative addressing is present) is simple, let us say we want to load in 3 shared objects, unless these shared objects are at the exact location every time they are loaded - which is usually not the case - you will still need to keep track of where each global (function or variable) is located in the address range.

If I want to use fprintf where am I going to locate the function. If you give me a hard set offset, how am I to trust that the Position Independent Code is loaded Position Dependent? Remember shared objects are loaded in on run-time not compile-time.

But the GOT table can be completely removed from all ELFs, though it is a mess to do so - relocation tables on every EXEC and you will still need a table, though unlike GOT you can relocate the actual references in the code instead of calling / retrieving GOT value on every call to a Global. (I'm actually about to reverse this so I have shared object support, the right way)

I have not dealt with x86_64 ELF relocations yet so I cannot answer your question there (plus I'm to lazy to read the spec sheet to reply to it :twisted:

)

Hope this helped some, and hopefully someone here has more (better) information on the topic. (The odds are high)

rianquinn · **Joined:** Thu Jan 21, 2010 9:31 pm **Posts:** 16

My understanding of ELF is that for PIC, the start location for the library can change, but the relative offsets between the different sections remain the same, which is why the compiler should be able to figure out where each global symbol is (excluding external symbols of course). That is, the offset between sections like .text and .data is the same, even though the overall position of the library is different.

If you were to move the different sections within the library then that would make sense to me, but how the code find's it's GOT would not as I'm not sure how you would go about doing that without modifying the code itself.

Owen · **Posted:** Sat Oct 24, 2015 1:04 pm

There are four classes of ELF binary:

Statically linked, position dependent (ET_EXEC with no DYNAMIC segment). These have no GOT and no shared library dependencies but are also not relocatable.
Statically linked, position independent (ET_DYN, with a DYNAMIC segment, no DT_NEEDED entries; some tools may misidentify these as shared objects). These have no GOT and no shared library dependneices, but are relocatable
Dynamicaly linked, position dependent (ET_EXEC with a DYNAMIC segment, with DT_NEEDED entries). These have a GOT and shared library dependencies.
Dynamically linked, position independent (ET_DYN, with a DYNAMIC segment, with DT_NEEDED entries). These have a GOT and shared library dependencies and can be relocated.

The GOT is primarily used for symbols from external libraries. The compiler cannot hardcode offsets into those libraries, because those offsets would be invalid upon a recompile of the library. Instead, it creates a GOT entry for the symbol and sets up a relocation to fill in the GOT entry with the address of the symbol at load time.

The GOT is also often used for intra-library calls and data references (e.g. calls of a function in a library from another function in the same library) because of traditional Unix symbol pre-emption rules (If your application defines a function "write", then everywhere a call to "write" is made that call should go to your application's version, even if another loaded library defines that symbol).

BASICFreak · **Posted:** Sat Oct 24, 2015 1:09 pm

rianquinn wrote:

My understanding of ELF is that for PIC, the start location for the library can change, but the relative offsets between the different sections remain the same, which is why the compiler should be able to figure out where each global symbol is (excluding external symbols of course). That is, the offset between sections like .text and .data is the same, even though the overall position of the library is different.

If you were to move the different sections within the library then that would make sense to me, but how the code find's it's GOT would not as I'm not sure how you would go about doing that without modifying the code itself.

Ok, then what happens when the creator of the shared object adds functions and/or bug patchs - changing the offset of all functions after the change?

You could go through the shared object Sym Table and link all calls to (SO_OFFSET + SYM_OFFSET) or continue using GOT, which is (partly) setup at Compile-Time of the SO

Without GOT you have relocation - which is slower on load; but, quicker on run.

Now, do not get me wrong - I understand your point. But the only way to avoid GOT is to avoid ELF - or hackish ways.
I do dislike GOT myself, which is why I do not support it (yet) - but again this is a hackish way and the output of linking files is something along the lines of:

Code:

LINKING BIN/TEST.ELF...
SRC/test.o: In function `init':
test.c:(.text+0x8): undefined reference to `initHeap'
test.c:(.text+0x74): undefined reference to `Bochs_puts'
SRC/test.o: In function `_JustATest':
test.c:(.text+0x88): undefined reference to `calloc'
test.c:(.text+0xa6): undefined reference to `Bochs_printf'
test.c:(.text+0xd3): undefined reference to `PCI_findByClass'
test.c:(.text+0xee): undefined reference to `PCI_getConfig'
test.c:(.text+0x10a): undefined reference to `Bochs_printf'
test.c:(.text+0x134): undefined reference to `Bochs_printf'
test.c:(.text+0x149): undefined reference to `Bochs_putch'
SRC/test.o: In function `_AnotherThread':
test.c:(.text+0x1a7): undefined reference to `calloc'
SRC/test.o: In function `_PIT_Test':
test.c:(.text+0x27e): undefined reference to `calloc'

Which looks like an error (well, it is, but I told the linker to ignore it)

jnc100 · **Posted:** Sun Oct 25, 2015 6:46 am

rianquinn wrote:

My understanding of ELF is that for PIC, the start location for the library can change, but the relative offsets between the different sections remain the same, which is why the compiler should be able to figure out where each global symbol is (excluding external symbols of course). That is, the offset between sections like .text and .data is the same, even though the overall position of the library is different.

If you were to move the different sections within the library then that would make sense to me, but how the code find's it's GOT would not as I'm not sure how you would go about doing that without modifying the code itself.

The issue is that gcc doesn't know whether a function/object external to the current compilation unit (.c file) will eventually be defined in a different compilation unit in the same object or whether it will be in a separate library. Thus, if you ask it to produce position-independent code it assumes you are writing a shared library, and will thus have a GOT. The opcodes is chooses for each command are therefore already fixed to be ones which reference via a GOT. When you come to link, ld may spot that what you are actually accessing is within the same output file, but by this point it is too late - gcc has already outputted code to access the variable by the GOT so it has to be used.

BASICFreak wrote:

I do dislike GOT myself, which is why I do not support it (yet) - but again this is a hackish way and the output of linking files is something along the lines of:

You may want to investigate the '-r' option to ld - it essentially lets you combine lots of object files together into one large relocatable object file - i.e. it is an object file rather than an executable, so to load it as a kernel module (for example), you pretend to be ld, i.e. parse the section table (rather than the segment one), allocate memory for each section and patch up the relocations.

Regards,
John.

BASICFreak · **Posted:** Sun Oct 25, 2015 12:11 pm

jnc100 wrote:

BASICFreak wrote:

I do dislike GOT myself, which is why I do not support it (yet) - but again this is a hackish way and the output of linking files is something along the lines of:

You may want to investigate the '-r' option to ld - it essentially lets you combine lots of object files together into one large relocatable object file - i.e. it is an object file rather than an executable, so to load it as a kernel module (for example), you pretend to be ld, i.e. parse the section table (rather than the segment one), allocate memory for each section and patch up the relocations.

Well, that's almost what I was (still am) doing:
LD flags = "-melf_i386 -r -T../../linkLib.ld" for library (not shared object)
LD flags = "-melf_i386 -q --noinhibit-exec -T../../linkExec.ld" for EXEC (which creates the reloc table and allows undefined symbols)
Then when reading the head of the elf:

Code:

if(Head->e_type == ET_EXEC) {
   _VMM_newDIR(ELFPDir); // Create new PDIR
   if(ELFPDir)
      _LoadExecElf(ELFPDir, ELFLocation); // Starts a new process
} else if(Head->e_type == ET_REL) {
   //Relocatable
   _LoadRelocElf(ELFLocation); // Loads into shared memory
}

Either way I will be supporting Shared Objects (with GOT tables) soon, I am still in the experimentation phase.

rianquinn · **Joined:** Thu Jan 21, 2010 9:31 pm **Posts:** 16

Quote:

The issue is that gcc doesn't know whether a function/object external to the current compilation unit (.c file) will eventually be defined in a different compilation unit in the same object or whether it will be in a separate library. Thus, if you ask it to produce position-independent code it assumes you are writing a shared library, and will thus have a GOT.

That was the answer I was looking for. That makes sense to me. To test that theory, I compiled the code with "static" in front of the variables / functions and watched them get removed from the GOT, so that goes well with your explanation.

OSDev.org

Why is a GOT needed for x86_64?

Who is online