Post undefined behaviour horror stories here

Mikumiku747 · **Joined:** Thu Apr 16, 2015 7:37 am **Posts:** 64

Hey all,

After having a bit of a read through this thread on pointer safety and compiler optimisation, I was curious as to what kinds of undefined behaviour people have encountered and the kinds of results they got. I've seen this article on MSDN before, but apart from that, not much has tipped me off as to why undefined behaviour is feared so much by everyone. I mean, people get mad at computers for always doing exactly what they're told, to the letter, even though it's not what they meant, but I'm curious as to what happens when they forego even that restriction and do things they really aren't supposed to.

Especially in the world of Operating Systems development, many of the safety nets that are brought in by the OS are gone, and we're often left with a sandbox far deeper than first anticipated, so I'm sure this would be a good place to ask. Anyways, I just wanted a few more interesting stories to learn from / be entertained by, and I'm sure some of the veterans and newbies alike have something to share.

- Mikumiku747

Solar · **Posted:** Wed Jun 14, 2017 8:48 am

In my nine-to-five job, I was looking at crashes with a certain input of several thousand data records.

I tried to find the data record responsible for the crash. Divide in half, check which half crashes.

At some point, I was left with a set of ~200 data records that crashed. No further subdivision crashed. Reshuffling the records meant it no longer crashed. Having so many data records meant that debugger breakpoints got triggered all the time, with little to guide me as to which call it was that triggered the crash. (Hint: It wasn't the last data record that crashed...)

I think I repressed most of the subsequent bughunt.

I know I was frantically juggling printf()'s, gdb, and valgrind. In the end, it turned out to be this:

An internal string class used memory pools for allocation. A certain off-by-one error gave a crash only if the string in question resided at the very start of the memory pool, which happened only in very specific circumstances (and very deep into the process, those strings were not the complete data records, and actually it was still non-trivial to figure out which data record I was looking at when I found the point of crash).

A subsequent off-by-one error covered the data corruption of the first error, so if the process didn't crash, everything appeared to be correct.

Eventually I refactored the whole string class to, basically, std::string. ;-)

(The internal class was written, reputedly, "because not all target platforms had good-enough std::string back then". Go figure. The Linux and Windows versions of the software were actually faster after the refactoring; the AIX version ran at half speed, but I figured, to hell with AIX. :twisted:

And no, AIX's std::string was not the culprit that "inspired" the homegrown string handling, it was HP-UX -- which we'd stopped supporting over a decade ago.

)

Mikumiku747 wrote:

...not much has tipped me off as to why undefined behaviour is feared so much by everyone.

UB can (and usually does...) give you heisenbugs, breakage that cannot be readily reproduced. Crashes that happen only on production, or that happen only once in a thousand calls, and when you make the exact same call a second time, nothing happens. Breakage that happens all the time except when run in a debugger. Stuff like that, where most of the debugging procedures you might be familiar with stop working.

(Sidenote, I found a bug in IBM's debugger for AIX in the process, as that was exactly what was happening -- the error vanished when run in the debugger. Turned out the IBM debugger, in that version, did not honor the library search path of the AIX host, i.e. the debugger loaded a different set of libraries than the original process. The IBM lady on the phone was rather stricken when she told me that, indeed, their product was at fault for that particular muckup. :twisted:

)

Korona · **Joined:** Thu May 17, 2007 1:27 pm **Posts:** 999

As a side note: The sanitizers that are featured in modern versions of gcc and clang are really great tools to find UB bugs. Today I rarely use a debugger at all; if I'm under the impression that I have an UB bug I just compile with asan and ubsan enabled. It gives much more precise error information than debuggers ("Store of size S is X bytes past the buffer of size B that was allocated in function F in thread T") while being thousands of times faster than valgrind and it is sufficient to find 95% of my UB bugs.

ronsor · **Joined:** Wed Jan 25, 2017 5:31 pm **Posts:** 27

Code:

p[i++] = ++i + i++;

Boris · **Joined:** Sat Nov 07, 2015 3:12 pm **Posts:** 145

ronsor wrote:

Code:

p[i++] = ++i + i++;

Some good compiler actually define the output of this.
I wish Gcc had a warning flag -Whywouldyoudothat

ronsor · **Joined:** Wed Jan 25, 2017 5:31 pm **Posts:** 27

Boris wrote:

ronsor wrote:

Code:

p[i++] = ++i + i++;

Some good compiler actually define the output of this.
I wish Gcc had a warning flag -Whywouldyoudothat

More like -Wstupid (find warnings in code that is ridiculous and stupid)

Solar · **Posted:** Mon Jun 19, 2017 4:02 am

Not exactly UB, but a very "interesting" bug...

I had a file, Latin-1 encoded, which made the processing bomb out with a segfault: The (XML-based) protocol module tried to write an error message containing a line from the input file, but croaked on an "conversion error".

That was strange, because there was no conversion involved, was it? The input was in Latin-1, the output was in Latin-1, and the protocol was in Latin-1 as well...

The offending data record that ought to be writen to the protocol contained an address in "Μünchen" (Munich, Germany), which looked innocent enough. I poked the "ü" in there a bit, as it seemed to be the only non-ASCII-7 character in the data record, but it was not the offender.

(The more alert reader may have noticed the "seemed" in that last sentence...)

Eventually I wanted to isolate the input record.

It read "µünchen". (That's the Greek letter mu, for "micro".)

What had happened?

Well, internally the software uses Unicode. It took the Latin-1 input "µünchen" (probably someone hitting AltGr instead of Shift by accident when entering the data), then capitalized it (using Unicode rules). That gives "Μünchen". That first letter in there is U+039c (Greek Capital Letter Mu). Looks just like an "M" (Latin Capital Letter M), doesn't it?

Well, "µ" is available in Latin-1, "Μ" isn't. So when converting the (internal) Unicode data back to Latin-1 for output, a conversion error happened, which triggered a dump to the XML protocol, which also ran into a conversion error but handled it less than gracefully.

The bug was simple enough to fix (adding an exception handler), and I could have done that without actually figuring out how the error came to be, but it was a cunningly disguised offender that triggered it. ;-)

OSDev.org

Post undefined behaviour horror stories here

Who is online