Enabling compiler optimizations ruins the kernel

devc1 · **Posted:** Sat Sep 24, 2022 5:13 am

The code generated by the compiler without optimizations is generally a piece of trash, it contains a lot of bloat. Enabling compiler optimizations in other apps removes this bloat and makes the app faster, but on the kernel it just have an undefined behaviour, it this normal ? and why ?

iansjack · **Posted:** Sat Sep 24, 2022 6:05 am

Check for:

1. Uninitialised variables.

2. Potential array overruns.

3. Variables that should be declared “volatile”.

4. Variables being accessed after they have gone out of scope.

The short answer is - there are bugs in your code.

vvaltchev · **Joined:** Fri May 11, 2018 6:51 am **Posts:** 274

devc1 wrote:

The code generated by the compiler without optimizations is generally a piece of trash, it contains a lot of bloat. Enabling compiler optimizations in other apps removes this bloat and makes the app faster, but on the kernel it just have an undefined behaviour, it this normal ? and why ?

The reasons for that is a long, complex and controversial story. I'd prefer to avoid re-starting that controversial topic here.

The only thing I can say to help you practically is to learn what UB is and how to avoid it, especially with modern compilers which brutally take advantage of it. With the right combination of compiler options, "syntactic approaches" towards certain things and runtime testing using the UBSAN, you could potentially write kernel code and compile it with -O3 without having problems. My operating system compiles and runs with all the optimization levels, from -O0 -fno-inline-functions to -O3.

The list of what is actually UB in C is very long, even if plenty of people try to mention UB things here, there could still be non-trivial UB cases. The general idea to avoid introducing UB in your code is thinking that it should run correctly on the "abstract C virtual machine" mentioned by the standard, NOT on your target architecture. For example, if signed integers wrap-around on your architecture (e.g. x86), don't assume that's the case for the "abstract C virtual machine". If you assume that, you're introducing UB so the compiler reserves the right to do whatever it likes. Specifically:

Code:

int a = INT_MAX;
a++; // this is undefined behavior. Don't make assumptions about the assembly instructions that the compiler will generate here.

For some UB cases, there are compiler-specific options to disable them. Like in this case, there is -fwrapv (gcc and clang).

Another example: dereferencing a null pointer. With -O0, the compiler just do what you asked to do, so it will try to dereference the pointer and, typically, that will cause some kind of CPU exception. With -O3 the compiler will assume that the pointer `ptr` cannot be NULL (even if it can prove statically that it could be NULL!!). Therefore, it might generate code that does not really de-reference the pointer is case it's NULL, but do something else completely (that is obviously useless, but could be faster for some reason in the base case where the pointer is not NULL).

In summary, when you turn on optimizations, the compiler feels entitled to make assumptions that some things will NEVER happen so it generates code accordingly. You have to forget the concept of "I told you exactly what to do". When you really need to force the compiler to do something, as I said above, compiler options, attributes (e.g. volatile), extensions or inline assembly are required. Note that even inline assembly can be moved around unless it's marked as volatile inline assembly.

Be prepared with a lot of patience to learn how to avoid UB. Good luck! :-)

devc1 · **Posted:** Sat Sep 24, 2022 6:25 am

Yes, No variable in my kernel is declared volatile. I guess without optimizations every variable is volatile so there are no problems. I will try this suggestion.

devc1 · **Posted:** Sat Sep 24, 2022 6:40 am

Now calling memcpy tripple faults even when doing return NULL in the start of the function : )))

Gigasoft · **Joined:** Sat Nov 21, 2009 5:11 pm **Posts:** 852

Your compiler can output an assembly listing so that you can verify if the generated code does what you think it does. It is also essential to have some way of stepping through code, even if it is just the Bochs debugger which I still revert to when I manage to break things too much. As for it crashing when you call memcpy, it sounds like the stack pointer was corrupted. A likely cause could be assembly functions not adhering to the correct calling convention, for example not saving and restoring required registers or adjusting the stack pointer by a wrong amount, or incorrect inline assembly.

devc1 · **Posted:** Sat Sep 24, 2022 1:05 pm

in MSVC, /O2 is the equivalent of all optimizations. Enabling all optimizations does the same exception, but removing the /g (Global optimizations) fixes it. The memcpy function does not get even called (because I set a while loop on it)

devc1 · **Posted:** Sat Sep 24, 2022 1:10 pm

Just to see how it is slow to just call a function without optimizations, Drawing the bezier curve which is called from assembly took almost 50000 draws per second, with optimizations it took 300000 draws per second.

Octocontrabass · **Joined:** Mon Mar 25, 2013 7:01 pm **Posts:** 5137

Without seeing your code, it's impossible to say what the problem is.

Did you try looking at the generated code to figure out where the compiled code doesn't work the way you want? Did you try debugging it to see where the problem might be?

thewrongchristian · **Joined:** Tue Apr 03, 2018 2:44 am **Posts:** 402

devc1 wrote:

in MSVC, /O2 is the equivalent of all optimizations. Enabling all optimizations does the same exception, but removing the /g (Global optimizations) fixes it. The memcpy function does not get even called (because I set a while loop on it)

If you've implemented your own memcpy, it might be that the optimization has looked at your code, thought "hey, this looks like a memcpy", and replaced the code with a call to memcpy.

The now recursive call blows up your stack. Check the assembler output, it will be pretty obvious if this is the case.

devc1 · **Posted:** Sat Sep 24, 2022 1:27 pm

/Og is deprecated and it is enabled when setting /O2 (Maximum optimizations and equivalent of : /Otyb2g /GF /Gy), it enables some kind of optimizations. However this is how this thing triple faults and doesn't even call my memcpy :
The memcpy is called at the GetBezierPoint() function:
in C :

Code:

UINT64 GetBezierPoint(float* cordinates, float* beta, UINT8 NumCordinates, float percent){
   memcpy(beta, cordinates, NumCordinates << 2);
   while(1);
        ...
}
LPVOID memcpy(LPVOID dest, LPCVOID src, size_t size){
    while(1); // memcpy does not get called with maximum optimizations (even when intrinsics are disabled)
}

Assembly output :

Code:

0000000000000000 <GetBezierPoint>:
   0:   48 83 ec 28             sub    $0x28,%rsp
   4:   4c 8b ca                mov    %rdx,%r9
   7:   45 0f b6 c0             movzbl %r8b,%r8d
   b:   48 8b d1                mov    %rcx,%rdx
   e:   41 c1 e0 02             shl    $0x2,%r8d
  12:   49 8b c9                mov    %r9,%rcx
  15:   e8 00 00 00 00          callq  1a <GetBezierPoint+0x1a>
  1a:   66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)
  20:   eb fe                   jmp    20 <GetBezierPoint+0x20>
0000000000000000 <memcpy>:
   0:   66 90                   xchg   %ax,%ax
   2:   eb fe                   jmp    2 <memcpy+0x2>

I guess the call to the relocation pointer thing at 0x1A is not set by the linker ?

devc1 · **Posted:** Sat Sep 24, 2022 1:31 pm

This is how the function gets called :

in C :

Code:

void TripleFaultingFunction() {
   UINT XOff = 200;
   UINT YOff = 300;
   float XCords[] = {0, 50, 100, 150};
   float YCords[] = {0, 50, -50, 0};
   float betabuffer[0x10] = {0};
   float IncValue = 0.1;
   float X0 = GetBezierPoint(XCords, betabuffer, 4, 0.1), X1 = GetBezierPoint(XCords, betabuffer, 4, 0.2), Y0 = GetBezierPoint(YCords, betabuffer, 4, 0.1), Y1 = GetBezierPoint(YCords, betabuffer, 4, 0.2);
   while(1);
   double Distance = __sqrt(pow(X1 - X0, 2) + pow(Y1-Y0, 2));
        ....
}

Output :

Code:

0000000000000000 <TripleFaultingFunction>:
   0:   48 8b c4                mov    %rsp,%rax
   3:   48 81 ec 88 00 00 00    sub    $0x88,%rsp
   a:   0f 28 05 00 00 00 00    movaps 0x0(%rip),%xmm0        # 11 <TripleFaultingFunction+0x11>
  11:   48 8d 50 b8             lea    -0x48(%rax),%rdx
  15:   0f 28 0d 00 00 00 00    movaps 0x0(%rip),%xmm1        # 1c <TripleFaultingFunction+0x1c>
  1c:   48 8d 48 98             lea    -0x68(%rax),%rcx
  20:   f3 0f 10 1d 00 00 00    movss  0x0(%rip),%xmm3        # 28 <TripleFaultingFunction+0x28>
  27:   00
  28:   41 b0 04                mov    $0x4,%r8b
  2b:   0f 11 40 98             movups %xmm0,-0x68(%rax)
  2f:   0f 57 c0                xorps  %xmm0,%xmm0
  32:   0f 11 40 b8             movups %xmm0,-0x48(%rax)
  36:   0f 11 40 c8             movups %xmm0,-0x38(%rax)
  3a:   0f 11 40 d8             movups %xmm0,-0x28(%rax)
  3e:   0f 11 40 e8             movups %xmm0,-0x18(%rax)
  42:   0f 11 48 a8             movups %xmm1,-0x58(%rax)
  46:   e8 00 00 00 00          callq  4b <TripleFaultingFunction+0x4b>
  4b:   f3 0f 10 1d 00 00 00    movss  0x0(%rip),%xmm3        # 53 <TripleFaultingFunction+0x53>
  52:   00
  53:   48 8d 54 24 40          lea    0x40(%rsp),%rdx
  58:   41 b0 04                mov    $0x4,%r8b
  5b:   48 8d 4c 24 20          lea    0x20(%rsp),%rcx
  60:   e8 00 00 00 00          callq  65 <TripleFaultingFunction+0x65>
  65:   f3 0f 10 1d 00 00 00    movss  0x0(%rip),%xmm3        # 6d <TripleFaultingFunction+0x6d>
  6c:   00
  6d:   48 8d 54 24 40          lea    0x40(%rsp),%rdx
  72:   41 b0 04                mov    $0x4,%r8b
  75:   48 8d 4c 24 30          lea    0x30(%rsp),%rcx
  7a:   e8 00 00 00 00          callq  7f <TripleFaultingFunction+0x7f>
  7f:   f3 0f 10 1d 00 00 00    movss  0x0(%rip),%xmm3        # 87 <TripleFaultingFunction+0x87>
  86:   00
  87:   48 8d 54 24 40          lea    0x40(%rsp),%rdx
  8c:   41 b0 04                mov    $0x4,%r8b
  8f:   48 8d 4c 24 30          lea    0x30(%rsp),%rcx
  94:   e8 00 00 00 00          callq  99 <TripleFaultingFunction+0x99>
  99:   0f 1f 80 00 00 00 00    nopl   0x0(%rax)
  a0:   eb fe                   jmp    a0 <TripleFaultingFunction+0xa0>

Octocontrabass · **Joined:** Mon Mar 25, 2013 7:01 pm **Posts:** 5137

devc1 wrote:

I guess the call to the relocation pointer thing at 0x1A is not set by the linker ?

Are you disassembling one of your intermediate object files or the final kernel binary?

devc1 · **Posted:** Sat Sep 24, 2022 1:38 pm

Intermediate object files.

devc1 · **Posted:** Sat Sep 24, 2022 1:40 pm

This thing is weird, removing while(1) from TripleFaultingFunction() not from memcpy or GetBezierPoint() makes a tripple fault.
But now I enabled /Ox (Full Speed optimization) which generates a very optimized output of assembly.

With or Without global optimizations, there are no visible differences.

However, enabling /O2 (Maximize speed and better than /Ox) along with disabling Global optimizations /Og- works fine.

I also disable intrinsics, because enabling them and removing my functions makes some : "unresolved reference to symbol : memset"

Maybe disabling intrinsics is the reason ?

this is my compiling flags :

Code:

set CFLAGS= /GS- /O2g-i-

OSDev.org

Enabling compiler optimizations ruins the kernel

Who is online