Finding an UB

bzt · **Joined:** Thu Oct 13, 2016 4:55 pm **Posts:** 1584

@Solar: was it not misaligned?

Code:

<bochs:7> c
01991983900e[CPU0  ] write_linear_xmmword_aligned(): #GP misaligned access

Cheers,
bzt

bzt · **Joined:** Thu Oct 13, 2016 4:55 pm **Posts:** 1584

Hah, took me several hours, but I've figured it out! Forget UB, forget misaligned stack. It was a plain old bug, a very nasty one for that matter. Not unlike Russian Roulette a'la OSDev.

TL;DR
For those who interested, you should know two things: one, my kernel has a 32 bytes random seed in 4 ints. Second, I test parts of my code, one unit at a time, therefore the random seed is usually not properly set up, it's initialized to zeros. To make it more unpredictable, I quite often shuffle the seed bits like "seed[ somevariable%4 ] ^= anothervariable;" or "seed [ seed[1]%4 ] *= 16807;". I also use bytes from the text segment for that, so even though the seed is constant and consistent in the test environment, it's always different for -O0 and -O2 (regarding to the same test).

The bug: at one place in the source I forgot to add the "%4" modulus. Because my kernel is in the higher half near the top the address space, the seed address almost every time overflowed and turned around to the identity mapped lower half. The beginning of the lower half is mostly the territory of the boot loader, so the chance to change something that can cause trouble is very slim. Therefore most of the time, for almost every tests the bug remained hidden and undetectable. But sometimes the index had larger values, and it changed bytes on pages that were mapped in the kernel's text or data segment...

Now how could this bug clear a list without triggering a watchpoint in gdb? That's a complicated one. First, the overflowed address ended up in the boot loader's data area. Namely inside the paging tables, on a page that was used to map the lfb into the higher half for kprintf. Not only that, but it had the value of the address of the free memory list, and among others present bit set and read-only bit clear. You may think the chance of this is astronomical, but not: the very first thing I allocate is the free memory list, so it has a low base address, so most of the bits in the page table entry are zeros. So one page among many, right in the middle of the screen, did not map to the framebuffer but instead to the physical page which happened to hold the free memory list. The kprintf is written in a way that when it receives a newline character, it clears the entire line. So when the debug messages printed enough newline characters (about two dozen), the clearline routine did not put black pixels on the screen, but filled up the free memory list with zeros instead. Naturally as lfb was mapped in higher half, gdb watchpoint did not noticed the write.

After hours of debugging and singlestepping, I've narrowed the problem down to the clearline routine. Because that's a very basic function, nothing special in it, I dumped the relevant page table for the framebuffer to see where is it writing to. Once I saw the entry which were different than the others, I put a watch on the entry's address and bam, that's how I've found this nasty little baggar. After that I went back to the original lang_init test, this time looked for that misindexed seed[] write and surprise, surprise. In a way I think I was very lucky, because it was easy to spot the altered lfb page table entry (lot of incremental addresses starting from 0x3C100000, and one that's below 0x80000). Otherwise I would have never suspected that the code changed in run-time, specially because one of the first things I do is to remap everything either writable and non-executable or read-only (but not before that particular seed[] took place as it turned out).

So, mistery solved! :-D

Cheers,
bzt

alexfru · **Joined:** Tue Mar 04, 2014 5:27 am **Posts:** 1108

Congrats!

Octocontrabass · **Joined:** Mon Mar 25, 2013 7:01 pm **Posts:** 5143

bzt wrote:

Forget UB,

Accessing beyond the end of an array is undefined behavior. :wink:

eekee · **Posted:** Wed Mar 06, 2019 3:03 am

Wow! Congrats on finding that! I would have suspected the compiler too, because...

Octocontrabass wrote:

bzt wrote:

Forget UB,

Accessing beyond the end of an array is undefined behavior. :wink:

A few years ago, a friend of mine tried to access an array in an undefined way. He found gcc produced code to copy the entire array out of the way, make the write, and copy the array back! Granted, my friend had the unnecessary (i hope!) expectation that two arrays would be consecutive in memory, but the incident rather put me off using gcc and, by association, llvm. Not that I particularly wanted to use them before that.

Solar · **Posted:** Wed Mar 06, 2019 7:20 am

Undefined behavior is undefined. To reiterate, it is not the job of a compiler to identify UB in your source. To the contrary: The standard leaves certain things undefined on purpose, so that compiler implementations do not need to take those cases into account and are by definition free to do whatever "happens to happen" then. This makes compilers and the binaries they produce more efficient.

The solution here is not "picking a compiler that works with my source", but making sure that your source does not invoke UB.

Using all the warnings a compiler provides is just a first step. Double-check your source on different compilers, use static and dynamic code analysers (CLang's static analyzer, Valgrind etc.), write tests shining light into all the dark corners of your code.

Just, don't blame the compiler.

eekee · **Posted:** Wed Mar 06, 2019 9:42 am

Solar wrote:

This makes compilers and the binaries they produce more efficient.

I believe this, and I'm sorry, my comment was rather off-topic. To try to tidy up by way of explaining myself: when a C compiler shows it can generate code to copy an entire array under the surface, it seems shockingly un-C-like altogether! But, the more I think about it, the more my attitude seems like an artifact of a somewhat unrealistic view of C: that it does the absolute minimum under the surface. It might be "minimal" relative to many other high-level languages, but I have to admit it was producing 'invisible' code for type casting right from its inception. I suspect a lot of UB-creep-haters object to the addition of more such 'invisible' code.

Solar · **Posted:** Thu Mar 07, 2019 2:06 am

There is probably a very good explanation for why that compiler did what it did; because yes, C is all about the idea of not doing unnecessary or wasteful things "under the hood". (Likewise for C++, by the way, regardless of what Linus in his lack of understanding might claim...)

nullplan · **Joined:** Wed Aug 30, 2017 8:24 am **Posts:** 1604

Solar wrote:

There is probably a very good explanation for why that compiler did what it did; because yes, C is all about the idea of not doing unnecessary or wasteful things "under the hood". (Likewise for C++, by the way, regardless of what Linus in his lack of understanding might claim...)

Constructors, destructors, references, overloaded operators, .... the list of ways in which C++ can perform unexpectedly much work is staggering.

References especially, I mean, they had to patch the standard to allow you to return a reference to a local variable, because people just kept making that mistake.

C's way of achieving its goal is to be simple enough to be standardized in 150 pages (the rest of the standard talks about the library and other stuff).

C++'s way of achieving is goal is to throw enough toys at the programmer to be able to solve any problem. More toys mean more solvable problems, right? The result is a standards document that is rivalled in complexity only by the US tax code.

Truth be told, C isn't as low-level as it likes to think of itself. For instance there is no way to tell the compiler (without extensions) to do something on overflow, when every microcontroller I have ever used had an overflow flag. And the type model works incredibly poor regarding the keyword "const". If I have function that takes a string argument, and doesn't want to change that string, but won't care if the string is writable or not, I can declare that argument "const char*", because a "char*" is implicitly convertible to a "const char*". Now, what about string lists?

Solar · **Posted:** Thu Mar 07, 2019 4:39 pm

Ah, another Linus.

Quote:

Constructors, destructors, references, overloaded operators...

If any of those surprise you, it's not the fault of the language... what I was talking about is stuff like garbage collection or paying for features you don't use. Note that none of the features you mentioned actually introduce additional code or run-time cost when compared to doing the same things with C functions, or assembly. In the case of references, they save you the bother of checking for NULL...

eekee · **Posted:** Mon Mar 11, 2019 7:20 am

Solar wrote:

There is probably a very good explanation for why that compiler did what it did; because yes, C is all about the idea of not doing unnecessary or wasteful things "under the hood".

Ugh!

I'm the sort of person who wants to know all the "very good reasons". There's 4 things I can do with this:

Try to forget about it
Find out what the reason is
Assume gcc maintainers have bad attitudes
Assume the standards developers no longer believe C is all about the idea of not doing unnecessary or wasteful things "under the hood"

Given that #1 goes against my nature and I expect #2 would stress or even exhaust me, (I'm not expecting to get straight simple answers,) it's awfully tempting to assume #3, #4, or both.

Actively doing so has got me into some embarrassment, so I've worked out a different plan, choosing my design to avoid any need to keep to standards and, between major versions, locking the version of whatever compiler I use (perhaps apart from small fixes).

Solar · **Posted:** Mon Mar 11, 2019 9:03 am

No, the point is that invoking UB by you means you shouldn't be looking for sense in what the compiler does in return. It does not have to make sense, that's the point of UB!

It can be somewhat interesting to figure out what particular trap you've walked into, but as the compiler might do Something Else Entirely next code line, next compile run, or next minor compiler update (because the behavior is undefined), it's also an exercise in futility. You don't learn anything from it that you might rely on, or expect to see repeated.

If you positively want to know, prepare a MCVE and ask, e.g. on StackOverflow, or the mailing list of the compiler in question. Just remember, the code is broken, and whatever the compiler did or did not do, you have no guarantee that it won't be something else entirely tomorrow.

Don't invoke UB. That's the standard you absolutely, positively have to adhere to: ISO/IEC 9899 (Programming Language C).

My favorite "tales from the wilderness" example for this, if you excuse the C++:

Code:

#include <string>
#include <cassert>
#include <cstdarg>
 
using namespace std;
 
void foo( std::string & parmN, ... )
{
    va_list ap;
    va_start( ap, parmN );
    int i = va_arg( ap, int );
    va_end( ap );
    assert( i == 42 );
}
 
int main()
{
    string broken;
    foo( broken, 42 );
    return 0;
}

The variable i will (most likely...) not be set to 42, because you may not mix C style varargs with C++ references. I could explain what, exactly, will happen here for a platform that keeps parameters in stack memory. But it doesn't "teach" you anything, other than "don't mix C style varargs with C++ references".

nullplan · **Joined:** Wed Aug 30, 2017 8:24 am **Posts:** 1604

Your example fails to break for me... if I add the include file you forgot (<cstdarg>), and the assert() you forgot (assert(i == 42)), then I get neither warning nor error, nor runtime error. It builds and runs without issue, both for AMD64 and for i386, so both for a parameter stack architecture and a fastcall architecture.

I would have assumed that if it really was not allowed to mix references with varargs, that GCC would complain at least, but no.

Oh, I tried a few more things, and clang actually warns of this. The problem is passing a reference into va_start(). Which works for GCC and clang, because va_start() is defined in terms of compiler magic. But the "normal" definition of that macro, before the compiler magic came along, was in terms of the address-of operator, which has a completely different meaning for a reference. So that makes sense.

And indeed, if I add another parameter to foo(), so that the last fixed argument is no longer a reference, clang will also no longer complain. Not about the undefined behavior, anyway.

bzt · **Joined:** Thu Oct 13, 2016 4:55 pm **Posts:** 1584

Hi,

Another interesting gcc optimization issue. I've reorganized the timer queue check before the time shared queue check in my scheduler, and my kernel started to behave inconsistently, randomly throwing page faults in the sched_awake() function. Only if compiled with gcc, not with Clang, that worked as it should. After hours and hours of debugging, I've found the very unlikely and unexpected reason, and if it's an UB, I really would like to know what. As far as I know using "continue" in a loop should not cause any UB.

Here's the relevant part of the code (yes, the algorithm is a bit unortodox, but perfectly correct and does exactly what I want):

Code:

void sched_pick()
{
    tcb_t *tcba = (tcb_t*)LDYN_tcbalarm;
    ccb_t *ccb = (ccb_t*)LDYN_ccb;
    uint i, nonempty;

    do {
        for(nonempty=false,i=PRI_SYS; i<PRI_IDLE; i++) {
            if(ccb->hd_timerq && i==tcba->priority && tcba->alarmusec <= ccb->sched_ticks) {
                sched_awake(tcba);
                goto found;
            }
            if(ccb->hd_active[i]) {
                nonempty = true;
                if(ccb->cr_active[i] == 0) {
                    ccb->cr_active[i] = ccb->hd_active[i];
                    continue;
                } else
                    goto found;
            }
        }
    } while(nonempty);
    ...
    found:
    ...

And this is what "gcc -ansi -Wall -Wextra -Wpedantic -O2 -fno-delete-null-pointer-checks -fno-stack-protector" compiled of it:

Code:

ffffffffffe0ef40 <sched_pick>:
ffffffffffe0ef40:       31 c0                   xor    %eax,%eax
ffffffffffe0ef42:       e9 59 fa ff ff          jmpq   ffffffffffe0e9a0 <sched_awake+0x290>
ffffffffffe0ef47:       66 0f 1f 84 00 00 00    nopw   0x0(%rax,%rax,1)
ffffffffffe0ef4e:       00 00

WTF, hah? All the loops and queue head checks are gone! This is definitely changed semantics! No wonder that poor sched_awake() got random input! Even worse, this jumps into the sched_awake after the input verification!

I've tried all the loop related "-fno-*" command line arguments, but nothing changed. Interestingly, when I finally tried "-fno-partial-inlining", then gcc generated the correct code. Also, if I remove the "continue" (which is not needed any more, yet I think should not cause any trouble), that solves the issue, and the generated code now contains the necessary and very important checks and loops, along with setting the proper argument for sched_awake():

Code:

ffffffffffe0ec80 <sched_pick>:
ffffffffffe0ec80:       55                      push   %rbp
ffffffffffe0ec81:       53                      push   %rbx
ffffffffffe0ec82:       48 83 ec 08             sub    $0x8,%rsp
ffffffffffe0ec86:       48 8b 34 25 78 00 00    mov    0xffffffff80000078,%rsi
ffffffffffe0ec8d:       80
ffffffffffe0ec8e:       31 d2                   xor    %edx,%edx
ffffffffffe0ec90:       31 db                   xor    %ebx,%ebx
ffffffffffe0ec92:       eb 40                   jmp    ffffffffffe0ecd4 <sched_pick+0x54>
ffffffffffe0ec94:       0f 1f 40 00             nopl   0x0(%rax)
ffffffffffe0ec98:       48 8b 04 dd 88 00 00    mov    -0x7fffff78(,%rbx,8),%rax
ffffffffffe0ec9f:       80
ffffffffffe0eca0:       48 8d 0c dd 00 00 00    lea    0x0(,%rbx,8),%rcx
ffffffffffe0eca7:       00
ffffffffffe0eca8:       48 85 c0                test   %rax,%rax
ffffffffffe0ecab:       74 1d                   je     ffffffffffe0ecca <sched_pick+0x4a>
ffffffffffe0ecad:       48 8b 14 dd c8 00 00    mov    -0x7fffff38(,%rbx,8),%rdx
...

My question is, does any of you know of an UB regarding to "continue" in a loop? I have never seen anything like this before, and I'm a professional developer for quite some time now. If this is a developer error on my part, I'd like to learn from it, but right know I fail to see the UB.

Cheers,
bzt

linuxyne · **Joined:** Sat Jul 02, 2016 7:02 am **Posts:** 210

bzt wrote:

WTF, hah? All the loops and queue head checks are gone! This is definitely changed semantics! No wonder that poor sched_awake() got random input! Even worse, this jumps into the sched_awake after the input verification!

Did the code allow gcc to prove certain properties about tcba and ccb as always true (or false)?
If so, the compiler is within its rights (under the appropriate optimization-levels) to exploit them.

If gcc can conclude that sched_awake needs to be called even before it has a chance to check
ccb->hd_active[i], then it can eradicate the loop and checks (including similar checks in sched_awake).

The compiler is also within its rights to move the checks about 'ccb->hd_timerq' and 'tcba->alarmusec <= ccb->sched_ticks' out of the for and the while loops, or conclude them as always true/false (unless volatile or similar restrictions are imposed).

OSDev.org

Finding an UB

Who is online