Building 64-bit GCC target for RDOS

rdos · Post by **rdos** » Sun Dec 16, 2012 4:56 am

Owen wrote:You should be able to avoid having to build gcc -> newlib -> gcc by copying the contents of the newlib archive into the GCC directory. You can likewise overlay Binutils in this manner, to just have one (very large) build to do.

It's best to use the files from the latest package; the maintainers keep them in sync between binutils/gcc/newlib/gdb/etc

Maybe. but I think at least initially I want to build the projects separately, especially so I know I build everything from scratch. Later, when I know the builds are ok, I might try experimenting with faster builds.

I just noticed that both binutils and GCC now have synced their config.sub with the config repository so they contain the correct settings for RDOS. Seems like this works a lot better than it did 6 years ago.

I also filed a bug report for the medium memory model problem in libgcc bugzilla. I hope somebody that has access to the repository provides a patch for this. This could be a relevant fix for other OS developpers that don't want their code to be located either in top or bottom 2G.

Regarding default memory models, I made a new post on the GCC list in hope of getting better responses.

It might be that -fpie and small memory model on x86-64 will provide the correct relocations, but that still doesn't solve the issue of the map-file for the executable which would contain the wrong positions. Before I have a working remote-gdb solution (which will be one of the last stages), I will need to match RIP with offsets in map-file in order to know where in the code I am.

rdos · Post by **rdos** » Wed Dec 19, 2012 1:57 pm

Still no progress with GCC.

The bug report in bugzilla has been edited and the keyboard "ra" (register allocation) has been added. Nobody has proposed any solution, and I still have no idea what is wrong. Link to bug-report in case anybody knows: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55712

There also is no progress with the patch to be able to default the compiler to medium memory model.

I think I'll recompile the source with small memory model instead, and try to relocate it to the address I want it at.

jnc100 · Post by **jnc100** » Wed Dec 19, 2012 3:15 pm

I cannot explain this. I've managed to track it down to the definition of __cpuid_count in cpuid.h. There is a series of #if statements there to select the appropriate definition. Essentially there are four different architectures to support: 32 bit without PIC, 32 bit with PIC, 64 bit without PIC and 64 bit with PIC. The implementation for 32 bit with PIC preserves the ebx register, all the implementations do not (which is correct as AFAIK 64 bit PIC doesn't rely on rbx). If you try and say that ebx is set by an asm block when PIC is being used in 32 bit mode then you will get the 'inconsistent operands constraints' error. Unfortunately, what I can't explain is that my stock ubuntu gcc (4.6.2 for x86_64-linux-gnu) also errors on the same statement when -fPIC or -fpic is used, suggesting that the PIC code in gcc is broken as it appears to erroneously rely on rbx in 64 bit mode (or at least error if it is used in the results of an asm block).

The following demonstrates the issue:

Code: Select all

#include <stdio.h>

int main()
{
    unsigned int a, b;
    a = 42;

    __asm__ ("movl %1, %0" : "=b" (b) : "a" (a));

    printf("Result: %d\n", b);
    return 0;
}

If compiled on a stock linux x86_64 gcc with -fPIC (or -fpic) -mcmodel=(anything but small) this again gives the same error. Changing the "=b" to "=c" (or any other register) seems to avoid the problem. I think gcc is at fault here as the x86_64 ABI does not mention relying on rbx but does state that with code models of medium and above then r15 is used to hold the GOT address.

As a work-around for your problem with cpuinfo.c then compiling that particular file with -D__i386__ should force it to use the 32 bit PIC code which doesn't write out to ebx.

Regards,
John.

rdos · Post by **rdos** » Thu Dec 20, 2012 2:53 pm

jnc100 wrote:I cannot explain this. I've managed to track it down to the definition of __cpuid_count in cpuid.h. There is a series of #if statements there to select the appropriate definition. Essentially there are four different architectures to support: 32 bit without PIC, 32 bit with PIC, 64 bit without PIC and 64 bit with PIC. The implementation for 32 bit with PIC preserves the ebx register, all the implementations do not (which is correct as AFAIK 64 bit PIC doesn't rely on rbx). If you try and say that ebx is set by an asm block when PIC is being used in 32 bit mode then you will get the 'inconsistent operands constraints' error. Unfortunately, what I can't explain is that my stock ubuntu gcc (4.6.2 for x86_64-linux-gnu) also errors on the same statement when -fPIC or -fpic is used, suggesting that the PIC code in gcc is broken as it appears to erroneously rely on rbx in 64 bit mode (or at least error if it is used in the results of an asm block).

The following demonstrates the issue:
Code: Select all
#include <stdio.h>

int main()
{
    unsigned int a, b;
    a = 42;

    __asm__ ("movl %1, %0" : "=b" (b) : "a" (a));

    printf("Result: %d\n", b);
    return 0;
}
If compiled on a stock linux x86_64 gcc with -fPIC (or -fpic) -mcmodel=(anything but small) this again gives the same error. Changing the "=b" to "=c" (or any other register) seems to avoid the problem. I think gcc is at fault here as the x86_64 ABI does not mention relying on rbx but does state that with code models of medium and above then r15 is used to hold the GOT address.

As a work-around for your problem with cpuinfo.c then compiling that particular file with -D__i386__ should force it to use the 32 bit PIC code which doesn't write out to ebx.

Regards,
John.

Thanks to your analysis of the problem, I've come up with a solution. It is possible to add a new section in cpuid.h (using __x86_64__ and PIC, and then change "=b" to "=r". With that change in place, libgcc compiles, and so does newlib.

However, there is still some other problem because when linking my test.c file, I now get unresolved errors (which are reported a little strange).

Code: Select all

/usr/local/rdos/lib/gcc/rdos/4.8.0/../../../../rdos/bin/ld: error in /usr/local/rdos/lib/gcc/rdos/4.8.0/crtend.o(.eh_frame); no .eh_frame_hdr table will be created.
/usr/local/rdos/lib/gcc/rdos/4.8.0/crtbegin.o: In function `__do_global_dtors_aux':
crtstuff.c:(.text+0xda): relocation truncated to fit: R_X86_64_PLT32 against undefined symbol `__deregister_frame_info'
/usr/local/rdos/lib/gcc/rdos/4.8.0/crtbegin.o: In function `frame_dummy':
crtstuff.c:(.text+0x10d): relocation truncated to fit: R_X86_64_PLT32 against undefined symbol `__register_frame_info'
/usr/local/rdos/lib/gcc/rdos/4.8.0/../../../../rdos/lib/libc.a(lib_a-__call_atexit.o): In function `__call_exitprocs':
/usr/src/build-newlib/rdos/newlib/libc/stdlib/../../../../../newlib-1.20.0/newlib/libc/stdlib/__call_atexit.c:147:(.text+0x144): relocation truncated to fit: R_X86_64_PLT32 against undefined symbol `free'
collect2: error: ld returned 1 exit status

jnc100 · Post by **jnc100** » Thu Dec 20, 2012 3:26 pm

I hope your new section preserves rbx across the cpuid call as gcc won't know you've trashed it. I'm reasonably convinced this shouldn't be necessary though and I've submitted my test case as a new bug to gcc (http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55744).

As regards your new problem, are you compiling everything (i.e. libgcc, newlib and your test program) with the medium code model? If you are then could you post the output of objdump -drS crtstuff.o (particularly the '__do_global_dtors_aux' function) to see if the failing relocation is 'if (__deregister_frame_info)' or '__deregister_frame_info (__EH_FRAME_BEGIN__);'.

Regards,
John.

rdos · Post by **rdos** » Thu Dec 20, 2012 3:42 pm

jnc100 wrote:I hope your new section preserves rbx across the cpuid call as gcc won't know you've trashed it. I'm reasonably convinced this shouldn't be necessary though and I've submitted my test case as a new bug to gcc (http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55744).

As regards your new problem, are you compiling everything (i.e. libgcc, newlib and your test program) with the medium code model? If you are then could you post the output of objdump -drS crtstuff.o (particularly the '__do_global_dtors_aux' function) to see if the failing relocation is 'if (__deregister_frame_info)' or '__deregister_frame_info (__EH_FRAME_BEGIN__);'.

Regards,
John.

Here is the whole output (crtbegin.o):

Code: Select all

0000000000000000 <deregister_tm_clones>:
   0:   48 8d 05 00 00 00 00    lea    0x0(%rip),%rax        # 7 <deregister_tm_clones+0x7>
                        3: R_X86_64_PC32        __TMC_END__+0x3
   7:   48 8d 3d 00 00 00 00    lea    0x0(%rip),%rdi        # e <deregister_tm_clones+0xe>
                        a: R_X86_64_PC32        .tm_clone_table-0x4
   e:   48 29 f8                sub    %rdi,%rax
  11:   48 83 f8 0e             cmp    $0xe,%rax
  15:   77 02                   ja     19 <deregister_tm_clones+0x19>
  17:   f3 c3                   repz retq
  19:   48 8b 05 00 00 00 00    mov    0x0(%rip),%rax        # 20 <deregister_tm_clones+0x20>
                        1c: R_X86_64_GOTPCREL   _ITM_deregisterTMCloneTable-0x4
  20:   48 85 c0                test   %rax,%rax
  23:   74 f2                   je     17 <deregister_tm_clones+0x17>
  25:   ff e0                   jmpq   *%rax
  27:   66 0f 1f 84 00 00 00    nopw   0x0(%rax,%rax,1)
  2e:   00 00

0000000000000030 <register_tm_clones>:
  30:   48 8d 05 00 00 00 00    lea    0x0(%rip),%rax        # 37 <register_tm_clones+0x7>
                        33: R_X86_64_PC32       __TMC_END__-0x4
  37:   48 8d 3d 00 00 00 00    lea    0x0(%rip),%rdi        # 3e <register_tm_clones+0xe>
                        3a: R_X86_64_PC32       .tm_clone_table-0x4
  3e:   48 29 f8                sub    %rdi,%rax
  41:   48 c1 f8 03             sar    $0x3,%rax
  45:   48 89 c2                mov    %rax,%rdx
  48:   48 c1 ea 3f             shr    $0x3f,%rdx
  4c:   48 01 d0                add    %rdx,%rax
  4f:   48 d1 f8                sar    %rax
  52:   48 89 c6                mov    %rax,%rsi
  55:   75 02                   jne    59 <register_tm_clones+0x29>
  57:   f3 c3                   repz retq
  59:   48 8b 15 00 00 00 00    mov    0x0(%rip),%rdx        # 60 <register_tm_clones+0x30>
                        5c: R_X86_64_GOTPCREL   _ITM_registerTMCloneTable-0x4
  60:   48 85 d2                test   %rdx,%rdx
  63:   74 f2                   je     57 <register_tm_clones+0x27>
  65:   ff e2                   jmpq   *%rdx
  67:   66 0f 1f 84 00 00 00    nopw   0x0(%rax,%rax,1)
  6e:   00 00

0000000000000070 <__do_global_dtors_aux>:
  70:   80 3d 00 00 00 00 00    cmpb   $0x0,0x0(%rip)        # 77 <__do_global_dtors_aux+0x7>
                        72: R_X86_64_PC32       .bss-0x5
  77:   75 73                   jne    ec <__do_global_dtors_aux+0x7c>
  79:   41 54                   push   %r12
  7b:   4c 8d 25 00 00 00 00    lea    0x0(%rip),%r12        # 82 <__do_global_dtors_aux+0x12>
                        7e: R_X86_64_PC32       .dtors-0x4
  82:   55                      push   %rbp
  83:   48 8d 2d 00 00 00 00    lea    0x0(%rip),%rbp        # 8a <__do_global_dtors_aux+0x1a>
                        86: R_X86_64_PC32       __DTOR_END__-0x4
  8a:   48 83 ec 08             sub    $0x8,%rsp
  8e:   4c 29 e5                sub    %r12,%rbp
  91:   48 8b 05 00 00 00 00    mov    0x0(%rip),%rax        # 98 <__do_global_dtors_aux+0x28>
                        94: R_X86_64_PC32       .bss+0x4
  98:   48 c1 fd 03             sar    $0x3,%rbp
  9c:   48 83 ed 01             sub    $0x1,%rbp
  a0:   48 39 e8                cmp    %rbp,%rax
  a3:   73 1e                   jae    c3 <__do_global_dtors_aux+0x53>
  a5:   0f 1f 00                nopl   (%rax)
  a8:   48 83 c0 01             add    $0x1,%rax
  ac:   48 89 05 00 00 00 00    mov    %rax,0x0(%rip)        # b3 <__do_global_dtors_aux+0x43>
                        af: R_X86_64_PC32       .bss+0x4
  b3:   41 ff 14 c4             callq  *(%r12,%rax,8)
  b7:   48 8b 05 00 00 00 00    mov    0x0(%rip),%rax        # be <__do_global_dtors_aux+0x4e>
                        ba: R_X86_64_PC32       .bss+0x4
  be:   48 39 e8                cmp    %rbp,%rax
  c1:   72 e5                   jb     a8 <__do_global_dtors_aux+0x38>
  c3:   e8 38 ff ff ff          callq  0 <deregister_tm_clones>
  c8:   48 83 3d 00 00 00 00    cmpq   $0x0,0x0(%rip)        # d0 <__do_global_dtors_aux+0x60>
  cf:   00
                        cb: R_X86_64_GOTPCREL   __deregister_frame_info-0x5
  d0:   74 0c                   je     de <__do_global_dtors_aux+0x6e>
  d2:   48 8d 3d 00 00 00 00    lea    0x0(%rip),%rdi        # d9 <__do_global_dtors_aux+0x69>
                        d5: R_X86_64_PC32       .eh_frame-0x4
  d9:   e8 00 00 00 00          callq  de <__do_global_dtors_aux+0x6e>
                        da: R_X86_64_PLT32      __deregister_frame_info-0x4
  de:   c6 05 00 00 00 00 01    movb   $0x1,0x0(%rip)        # e5 <__do_global_dtors_aux+0x75>
                        e0: R_X86_64_PC32       .bss-0x5
  e5:   48 83 c4 08             add    $0x8,%rsp
  e9:   5d                      pop    %rbp
  ea:   41 5c                   pop    %r12
  ec:   c3                      retq
  ed:   0f 1f 00                nopl   (%rax)

00000000000000f0 <frame_dummy>:
  f0:   48 83 ec 08             sub    $0x8,%rsp
  f4:   48 83 3d 00 00 00 00    cmpq   $0x0,0x0(%rip)        # fc <frame_dummy+0xc>
  fb:   00
                        f7: R_X86_64_GOTPCREL   __register_frame_info-0x5
  fc:   74 13                   je     111 <frame_dummy+0x21>
  fe:   48 8d 35 00 00 00 00    lea    0x0(%rip),%rsi        # 105 <frame_dummy+0x15>
                        101: R_X86_64_PC32      .bss+0x1c
 105:   48 8d 3d 00 00 00 00    lea    0x0(%rip),%rdi        # 10c <frame_dummy+0x1c>
                        108: R_X86_64_PC32      .eh_frame-0x4
 10c:   e8 00 00 00 00          callq  111 <frame_dummy+0x21>
                        10d: R_X86_64_PLT32     __register_frame_info-0x4
 111:   48 83 3d 00 00 00 00    cmpq   $0x0,0x0(%rip)        # 119 <frame_dummy+0x29>
 118:   00
                        114: R_X86_64_PC32      .jcr-0x5
 119:   74 15                   je     130 <frame_dummy+0x40>
 11b:   48 8b 05 00 00 00 00    mov    0x0(%rip),%rax        # 122 <frame_dummy+0x32>
                        11e: R_X86_64_GOTPCREL  _Jv_RegisterClasses-0x4
 122:   48 85 c0                test   %rax,%rax
 125:   74 09                   je     130 <frame_dummy+0x40>
 127:   48 8d 3d 00 00 00 00    lea    0x0(%rip),%rdi        # 12e <frame_dummy+0x3e>
                        12a: R_X86_64_PC32      .jcr-0x4
 12e:   ff d0                   callq  *%rax
 130:   48 83 c4 08             add    $0x8,%rsp
 134:   e9 f7 fe ff ff          jmpq   30 <register_tm_clones>

Disassembly of section .fini:

00000000000014f8 <.fini>:
    14f8:       e8 00 00 00 00          callq  14fd <frame_dummy+0x140d>
                        14f9: R_X86_64_PC32     .text+0x6c

Disassembly of section .init:

00000000000029f5 <.init>:
    29f5:       e8 00 00 00 00          callq  29fa <frame_dummy+0x290a>
                        29f6: R_X86_64_PC32     .text+0xec

jnc100 · Post by **jnc100** » Thu Dec 20, 2012 3:46 pm

Thanks, the error is in the call to the function (rather than checking it exists).

Thinking about it a bit more I think the clue is in the 'undefines symbol' bit. If a symbol is undefined then the linker will try to substitute the value 0 for that symbol whenever it is referenced. As you are using PIC, gcc by default tries to reference the PLT for function addresses. In the medium model, the PLT is always within 2 GiB of any function from where it is referenced, thus all references to the PLT can be 32 bit in size. If you set your program to be linked to a virtual address greater than 2 GiB away from address 0, then it cannot fit a relocation to 0 within the 32 bit relocation size and will error out (remember that all references are RIP relative i.e. relative to the current virtual address).

Workaround: define the requested functions (__deregister_frame_info, __register_frame_info and free).

Regards,
John.

rdos · Post by **rdos** » Thu Dec 20, 2012 4:05 pm

jnc100 wrote:Thanks, the error is in the call to the function (rather than checking it exists).

Thinking about it a bit more I think the clue is in the 'undefines symbol' bit. If a symbol is undefined then the linker will try to substitute the value 0 for that symbol whenever it is referenced. As you are using PIC, gcc by default tries to reference the PLT for function addresses. In the medium model, the PLT is always within 2 GiB of any function from where it is referenced, thus all references to the PLT can be 32 bit in size. If you set your program to be linked to a virtual address greater than 2 GiB away from address 0, then it cannot fit a relocation to 0 within the 32 bit relocation size and will error out (remember that all references are RIP relative i.e. relative to the current virtual address).

Workaround: define the requested functions (__deregister_frame_info, __register_frame_info and free).

Regards,
John.

Yes. I suspected that at first too. I just defined place-holders for these in cr0.S, and now it links as it should. The *frame_info functions are part of exception handling, so I suppose I need to define/write them in order to support exceptions. I'm just a little bit puzzled why these appear as I changed memory model.

BTW, the executable now shows the correct addresses when I disassemble it.

jnc100 · Post by **jnc100** » Thu Dec 20, 2012 4:19 pm

rdos wrote:I'm just a little bit puzzled why these appear as I changed memory model.

I suspect its a combination of your load address and the memory model. I'd guess if you linked at a low address e.g. 0x400000 as per linux it would be fine even with the medium memory model (which is probably why no-one using other OSes has this problem). Similarly if you used the large model and linked at your current address you should have no problem. In addition, if the symbols were not defined as weak then the error you'd get would be 'undefined symbol' rather than the more obscure one you're currently getting. In summary I think the problem is the unique combination of the medium code model, your high load address (> 2GiB away from 0) and these three undefined weak symbols, which is why this probably hasn't cropped up on fora, mailing lists etc before.

Regards,
John.

rdos · Post by **rdos** » Thu Dec 20, 2012 4:23 pm

jnc100 wrote:I hope your new section preserves rbx across the cpuid call as gcc won't know you've trashed it.

The first attempt was wrong. Now I took the 386-PIC code instead, which uses two exchanges with ebx register to preserve it. That won't preserve upper part of rbx, but gcc shouldn't care anyway. But I need to learn how to write inline assembly for GAS. My next attempt might be to do push rbx / mov r,ebx and pop rbx instead.

The next step would be to figure out how to do syscalls, and implementing inline assembly macros for that.

jnc100 · Post by **jnc100** » Thu Dec 20, 2012 4:44 pm

The upper part of rbx should really be preserved as if you don't tell it otherwise gcc will assume the whole of rbx is unchanged by your asm block. It may have stored a 64 bit value in rbx which it expects to be there afterwards.

Regards,
John.

rdos · Post by **rdos** » Fri Dec 21, 2012 4:32 pm

In order to separate code better from data, I've now redefined both the base for code and for data like this:

Code: Select all


TEXT_START_ADDR=0x180E0000000
DATA_ADDR=0x18120000000

I actually would have liked to separate them much more, but it seems like the medium memory model doesn't allow more than +/- 2G difference, so I spaced them 1G apart. That put's a limit for the code section of 1G, but I don't think that is a problem.

It seems like the file is now 2M large, mostly because of an alignment of 2^21 between code and data!

rdos · Post by **rdos** » Sun Dec 23, 2012 4:58 am

It seems like in the x86-64 system V ABI, rbx must be saved by callee. This might be the reason why the cpuid-code fails when ebx is used as an output. Reference: http://www.classes.cs.uchicago.edu/arch ... out-03.pdf

jnc100 · Post by **jnc100** » Sun Dec 23, 2012 5:12 am

Yes, I saw this, but it doens't explain why it only fails with PIC with code model >= medium. Besides, if you state it as being specifically used by the asm block there's nothing stopping gcc saving and reloading it on either side of the asm block (similar the saving that occurs with clobbered registers).

I suggest you try the "-z max-page-size=0x1000" linker option to avoid the 2 MiB alignment issues (by default ld expects 2 MiB pages in x86_64 mode).

Regards,
John.

rdos · Post by **rdos** » Sun Dec 23, 2012 3:45 pm

jnc100 wrote: I suggest you try the "-z max-page-size=0x1000" linker option to avoid the 2 MiB alignment issues (by default ld expects 2 MiB pages in x86_64 mode).

Yes, it is the -z max-page-size option, but I've not yet found out how to change the default. There is an MAXPAGESIZE option for .sh-files, but I've not yet managed to get it to work.

OSDev.org

Building 64-bit GCC target for RDOS

Re: Building 64-bit GCC target for RDOS

Re: Building 64-bit GCC target for RDOS

Re: Building 64-bit GCC target for RDOS

Re: Building 64-bit GCC target for RDOS

Re: Building 64-bit GCC target for RDOS

Re: Building 64-bit GCC target for RDOS

Re: Building 64-bit GCC target for RDOS

Re: Building 64-bit GCC target for RDOS

Re: Building 64-bit GCC target for RDOS

Re: Building 64-bit GCC target for RDOS

Re: Building 64-bit GCC target for RDOS

Re: Building 64-bit GCC target for RDOS

Re: Building 64-bit GCC target for RDOS

Re: Building 64-bit GCC target for RDOS

Re: Building 64-bit GCC target for RDOS