OSDev.org

The Place to Start for Operating System Developers
It is currently Sun Jul 25, 2021 12:55 am

All times are UTC - 6 hours




Post new topic Reply to topic  [ 84 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6  Next
Author Message
 Post subject: Re: C and undefined behavior
PostPosted: Tue Jun 15, 2021 5:19 am 
Offline
Member
Member

Joined: Thu May 17, 2007 1:27 pm
Posts: 954
vvaltchev wrote:
The first time some "modern UB" problems start to emerge was around 2006-2007, several years after C99 was released. But that was just a "slow start". Most of the UB bugs came out in the 2010s, more than 10 years after C99. That's why UBSAN has been introduced. Think about it: if UB was such a problem from the '80s, why we did have to wait 30-40 years for such a tool to be widely available? UB sanitizers were created about the same time (or a little later) after compilers started to aggressively take advantage of UB, in a way that has never been done before in C or C++. GCC was released with an UB sanitizer in version 4.9 (2014). Before that, I remember that I used to compile the same software with clang, just to be able to use its sanitizers, released a few years before that.

That's just not true. Compilers that exploited UB existed in the 90s (and to some extend in the 80s, see the paper of optimizations in Andrew Tanenbaum's compiler kit that I linked earlier). GCC certainly took advantage of UB in the mid 90s, before C99.

vvaltchev wrote:
But, as nullplan pointed out, generally compilers didn't assume that UB will never happen, as they do today

That's something we can agree on, but you got the reasoning wrong: the reason for this change is that compilers were generally less sophisticated, and not that the understand of what "undefined behavior" means somehow changed fundamentally. The optimization techniques that compilers use to exploit UB existed in the 80s (which is generally just dead code elimination and strength reduction, really), but *no single compiler implemented all techniques at the same time* so it was less visible. For example, the dragon book mentions lots of these techniques, and it's from 1987. The techniques themselves were invented in the 70s.

_________________
managarm: Microkernel-based OS capable of running a Wayland desktop (Discord: https://discord.gg/7WB6Ur3). My OS-dev projects: [mlibc: Portable C library for managarm, qword, Linux, Sigma, ...] [LAI: AML interpreter] [xbstrap: Build system for OS distributions].


Top
 Profile  
 
 Post subject: Re: C and undefined behavior
PostPosted: Tue Jun 15, 2021 6:16 am 
Offline
Member
Member

Joined: Fri May 11, 2018 6:51 am
Posts: 198
Korona wrote:
That's just not true. Compilers that exploited UB existed in the 90s (and to some extend in the 80s, see the paper of optimizations in Andrew Tanenbaum's compiler kit that I linked earlier).
I still have to look at that.

Korona wrote:
GCC certainly took advantage of UB in the mid 90s, before C99.
In my previous answer to Solar, I prepared a fair challenge. Please, take a look at it. If you show me a '90s compiler that treats as UB one of the first two points (unaligned access, type punning with casts), OR all the other three below, I will agree with you.

Korona wrote:
vvaltchev wrote:
But, as nullplan pointed out, generally compilers didn't assume that UB will never happen, as they do today
That's something we can agree on, but you got the reasoning wrong: the reason for this change is that compilers were generally less sophisticated, and not that the understand of what "undefined behavior" means somehow changed fundamentally.
That looks correct, but I have good reasons to doubt it. Is "x/0" so hard to catch? I don't think so. Why GCC started to disallow this trivial kind of UB only with version 7.x? Also, there's no need for the understanding of UB to have fundamentally changed: it's enough to change a little bit the wording in the standard for it, in order to have an significant effect. In other words, a slight shift in the interpretation makes a big difference.

Korona wrote:
The optimization techniques that compilers use to exploit UB existed in the 80s (which is generally just dead code elimination and strength reduction, really)
I believe that dead code elimination and strength reduction do not require UB to work. "if (3 > 4) { ... }" can be easily proven as dead code. Also "if (INT_MAX + 1 > 0) { ... }" can be optimized without messing with UB. The compiler, unless it's a cross-compiler, just needs to represent literal integers with the same type used by the C code. So, INT_MAX+1 will naturally end up being < 0 in the compiler (on architectures that behave this way), so the code will optimized away. In order to INT_MAX+1 > 0 to be always true (modern assumption), it requires the compiler to do more work using bignums OR to realize that the expression INT_MAX+1 overflows the integer, so explicitly claim that's UB. For cross-compilers, things are different. I agree that even 30 years ago, if we tried to cross-compile code like that, we might have had some problems, like that the dead code optimization isn't performed or the compiler assumes that INT_MAX+1 is still > 0 because that's true on the host architecture, while on the target architecture it's not. In theory, compilers should use very portable code that keeps in mind the target architecture when evaluating such expressions, so it's possible that even in that case, everything worked as expected. I have no personal experience with cross-compiling software for very different architectures more than 20 years ago. So, no strong opinion here.

Korona wrote:
but *no single compiler implemented all techniques at the same time* so it was less visible.
I'd be curious if you guys can find a '90s compiler that treats as UB even one of the UB types I've mentioned.

Korona wrote:
For example, the dragon book mentions lots of these techniques, and it's from 1987. The techniques themselves were invented in the 70s.
Unfortunately, I haven't read the "dragon book" (yet) but I believe that most optimizations techniques have nothing to do with UB. The assumption of UB allows compilers to use the same techniques in more cases. Maybe that's not true for aliasing assumptions, I'm not sure.

Again, I'm not a compiler expert, but it's obvious to me that the modern assumption of UB opened the door for many more optimizations opportunities. And, it's definitively possible that some UB types weren't treated as UB because of the lack of optimizations that took advantage of that, but in some other cases, it looks to me that compilers didn't want to treat as UB some expressions. It lacked the intention, not the technology in some cases. Other cases like "i++ + ++i" have been UB from C's day 1, because it didn't feel right to force some order to evaluation in such cases. But what happened in those UB cases? Well, the whole program wasn't considered invalid because of that. Just, the expression didn't have a defined value.

_________________
Tilck, a Tiny Linux-Compatible Kernel: https://github.com/vvaltchev/tilck


Top
 Profile  
 
 Post subject: Re: C and undefined behavior
PostPosted: Tue Jun 15, 2021 6:55 am 
Offline
Member
Member

Joined: Fri May 11, 2018 6:51 am
Posts: 198
alexfru wrote:
vvaltchev wrote:
Unaligned access is done using very natural expressions in C. What is unnatural is using memcpy() for that.

You are entitled to your opinions. And I'm partially sympathetic because it's really a painful area of the language.
Thanks.

alexfru wrote:
But regardless of the opinions and feelings, the language standard says what should work correctly and what isn't guaranteed to. And that's the common denominator to live by.
I agree that we have to live with that. In practice, I've very careful about avoiding UB.

alexfru wrote:
vvaltchev wrote:
Again, no matter what the standard technically says it's OK to do.

Like I just said, we can be emotional about it, but the standard and its implementations don't exist in the emotional space.
Well, from any practical point of view, when I have to get the job done, I totally agree with that. But that doesn't mean it's worth discussing, right? I expressed my views on the history behind UB and how it has been treated by compilers, showing that things changed. No need to be emotional about this except maybe when people claim that the "UNIX hackers" didn't really understand C, at the time. It feels like the wrong narrative to me, unfair to those people.

vvaltchev wrote:
Let's forget for a moment what is supposed to be "right" and what is supposed to be "wrong" and just observe what developers did: until compilers allowed something, they took advantage of it.


alexfru wrote:
You seem to be echoing it...
alexfru wrote:
Lots of bad code used to be disallowed de jure but allowed de facto (because the guard wasn't employed or born yet). And then gradually de facto started to line up with de jure.


Well, I seemed to be echoing that, but I also stated that, in many cases, "de facto" != "de jure" because the compiler engineers didn't want that to happen, no only because of technological limitations, but also because it wasn't clear enough what "de jure" allowed to happen. Compilers preferred to be more conservative about many kinds of UB that today are aggressively optimized. Again, reading C89 with today's eyes feels like there always been a single, clear, interpretation of what compilers are allowed in case of UB, but I'm not sure that was true.

Just to be precise: I'm not sure about whether the interpretation of the standard changed or not. I suspect it did change, but I have no hard, absolute proof of that. I have, however, plenty of clues for that theory.
What I'm absolutely sure is that compilers didn't treat UB in the past as they do today. That's a simple easy to prove fact and I believe I've proven it already. If not convinced, check my response to Solar's challenge.
About why compilers didn't treat UB in the past the same way as they do today, again, I'm have no definitive answer nor proof. I suspect that happened because of:
- the uncertain interpretation of C89 OR
- some sort of "common sense", fear to break existing code OR
- technological limitations (we didn't have optimizations sophisticated enough), OR
- all these reasons combined

_________________
Tilck, a Tiny Linux-Compatible Kernel: https://github.com/vvaltchev/tilck


Top
 Profile  
 
 Post subject: Re: C and undefined behavior
PostPosted: Tue Jun 15, 2021 7:21 am 
Offline
Member
Member

Joined: Fri May 11, 2018 6:51 am
Posts: 198
alexfru wrote:
Btw, it's perfectly fine to blame the literature on C of the distant past. It exceedingly rarely introduced or properly explained the concept of undefined behavior. I guess, because it felt unnecessary ("unnatural", if you will :) ) at the time. I did see a number of books ignoring the topic altogether and I didn't have access to the standard back then, so I couldn't know any better. And many others didn't either and very often their code "just worked". Where we are today is just as natural a consequence of those events of the past.
Mmm... that's interesting :-) I'd like to blame the technical literature, but.. I'm not absolutely sure it's all guilty.

The fact that you "did see a number of books ignoring the topic" while now things are completely different, doesn't make you question why did all of that happen? Isn't weird that multiple credible authors with the best intentions of teaching C didn't consider warning people about how dangerous UB is? Can you at least theoretically consider the possibility (before discarding it) that what we considered to be allowed in C changed over time? Would it be that crazy to believe that C was born in the '70s as a "do what I say" language and gradually was pushed to become a "do what I mean" language with the same syntax, because that offered better opportunities for optimization? Putting technology aside for a moment, how many times did we already observe how some text written in the past has been interpreted in a different ways over time?

I mean, my theory might be completely wrong. But doesn't it at least trigger in you some doubts about the history of this language? On my side, I consider very seriously the opposite theory. I just have some unresolved doubts about it.

Meta-note: due to the limitation in the written communication, it's probably worth noting that all of my rhetorical questions in this post do not have a provocative nor aggressive tone at all. It's a very friendly discussion for me.

_________________
Tilck, a Tiny Linux-Compatible Kernel: https://github.com/vvaltchev/tilck


Top
 Profile  
 
 Post subject: Re: C and undefined behavior
PostPosted: Tue Jun 15, 2021 7:46 am 
Offline
Member
Member

Joined: Thu May 17, 2007 1:27 pm
Posts: 954
I don't have time to respond to all your pointers, but about the integer overflow: the reason that INTMAX + 1 is UB is not that it enables some tricky optimizations. It is because on early CPU architectures (those that don't use 2's complement), different instructions had to be emitted for signed and unsigned arithmethic and the signed ones just didn't work in case of overflow. Early C compilers targeted mutliple archs and were written in lower-level languages than C and it was just deemed to be too complicated to handle make this behavior architecture-dependent in the compiler's middle end. Note that the committee decided to make this UB and *not* implement-defined behavior *specifically* to make the middle end easier to write, i.e., it specifically allowed compiler writers to do *anything* instead of demanding a consistent behavior *on each individual platform*.

_________________
managarm: Microkernel-based OS capable of running a Wayland desktop (Discord: https://discord.gg/7WB6Ur3). My OS-dev projects: [mlibc: Portable C library for managarm, qword, Linux, Sigma, ...] [LAI: AML interpreter] [xbstrap: Build system for OS distributions].


Top
 Profile  
 
 Post subject: Re: C and undefined behavior
PostPosted: Tue Jun 15, 2021 8:06 am 
Offline
Member
Member

Joined: Thu May 17, 2007 1:27 pm
Posts: 954
I just took a look at my copy of the C89 standard and saw that even C89 distinguishes "unspecified behavior" (= any value can be returned by the program continues as usual, applies to operations such as looking at the bitwise representation of floats, evaluation order, etc.) from UB in Appendix A.6.1, so I don't really get the entire point of the blog post in the OP. In fact, the sentence directly above the definition of UB defines: "Unspecified behavior --- behavior, for a correct program construct and correct data, for which the Standard imposes no requirements." Could it be that the author of the blog post doesn't know about the difference of unspecified behavior and undefined behavior in the C standard?

The standard also lists examples for its definitions:
Quote:
An example of unspecified behavior is the order in which the arguments to a function are evaluated.
An example of undefined behavior is the behavior on integer overflow.
An example of implementation-defined behavior is the propagation of the high-order bit when a signed integer is shifted right.


And compilers are certainly within their rights to assume that no UB happens per the C89 standard, because the only requirement for them is to accept "strictly conforming programs" and strictly conforming programs are:
Quote:
A strictly conforming program shall use only those features of the language and library specified in this Standard. It shall not produce output dependent on any unspecified, undefined, or implementation-defined behavior, and shall not exceed any minimum implementation limit.


Note that the provisions here are much stronger than what compilers actually do: in principle, a conforming implementation can refuse to compile "printf("%zu\n", sizeof(int));" because the output of this statement depends on implementation-defined behavior.

EDIT: the blog post also conveniently ignores this example of the C89 rationale document, which should settle the question once and for all:
Quote:
Another common optimization is to pre-compute common subexpressions. In this loop:

volatile short *ttyport;
short mask1, mask2;
/* ... */
for (i = 0; i < N; ++i)
*ttyport = a[i] & mask1 & mask2;
evaluation of the subexpression mask1 & mask2 could be performed prior to the loop in the real implementation, assuming that neither mask1 nor mask2 appear as an operand of the address-of (&) operator anywhere in the function. In the abstract machine, of course, this subexpression is re-evaluated at each loop iteration, but the real implementation is not required to mimic this repetitiveness, because the variables mask1 and mask2 are not volatile and the same results are obtained either way.

The previous example shows that a subexpression can be pre-computed in the real implementation. A question sometimes asked regarding optimization is, ``Is the rearrangement still conforming if the pre-computed expression might raise a signal (such as division by zero)?'' Fortunately for optimizers, the answer is ``Yes,'' because any evaluation that raises a computational signal has fallen into an undefined behavior (§3.3), for which any action is allowable.

Emphasis mine.

Also:
Quote:
The bitwise logical operators can be arbitrarily regrouped, since any regrouping gives the same result as if the expression had not been regrouped. This is also true of integer addition and multiplication in implementations with twos-complement arithmetic and silent wraparound on overflow. Indeed, in any implementation, regroupings which do not introduce overflows behave as if no regrouping had occurred. (Results may also differ in such an implementation if the expression as written results in overflows: in such a case the behavior is undefined, so any regrouping couldn't be any worse.)

_________________
managarm: Microkernel-based OS capable of running a Wayland desktop (Discord: https://discord.gg/7WB6Ur3). My OS-dev projects: [mlibc: Portable C library for managarm, qword, Linux, Sigma, ...] [LAI: AML interpreter] [xbstrap: Build system for OS distributions].


Top
 Profile  
 
 Post subject: Re: C and undefined behavior
PostPosted: Tue Jun 15, 2021 9:14 am 
Offline
Member
Member

Joined: Fri May 11, 2018 6:51 am
Posts: 198
Korona wrote:
I don't have time to respond to all your pointers
Don't worry, it's nothing urgent. Take your time and respond, if you want, when you have enough time for this conversation.

Korona wrote:
but about the integer overflow: the reason that INTMAX + 1 is UB is not that it enables some tricky optimizations.
Of course the main reason for making the integer overflow UB is not optimizations, but portability. That's pretty obvious. Please, stop assuming I don't know this stuff. It starts to be offensive.

Also, the UB integer overflow DOES enable some fancy optimizations. Watch this CPPCON '16 talk by Chandler Carruth from minute 39:17: https://youtu.be/yG1OZ69H_-o?t=2359
Btw, I was lucky to personally attend that conference and it was great.

_________________
Tilck, a Tiny Linux-Compatible Kernel: https://github.com/vvaltchev/tilck


Top
 Profile  
 
 Post subject: Re: C and undefined behavior
PostPosted: Tue Jun 15, 2021 10:34 am 
Offline
Member
Member

Joined: Fri May 11, 2018 6:51 am
Posts: 198
Korona wrote:
The standard also lists examples for its definitions:
Quote:
An example of unspecified behavior is the order in which the arguments to a function are evaluated.
An example of undefined behavior is the behavior on integer overflow.
An example of implementation-defined behavior is the propagation of the high-order bit when a signed integer is shifted right.
That should be pretty clear to everybody here. BUT:

Korona wrote:
in principle, a conforming implementation can refuse to compile "printf("%zu\n", sizeof(int));" because the output of this statement depends on implementation-defined behavior.
If you're fine with that, I really see no point in further discussing. You're misinterpreting "the Standard imposes no requirements" at will. Fortunately, not even modern compiler engineers take such extreme positions. I'm trying to have a balanced position between "real world needs" and what is "technically allowed by the standard" but.. if that's what on the other side, I might really give up and go full "Torvalds mode": https://lkml.org/lkml/2018/6/5/769

Korona wrote:
this example of the C89 rationale document, which should settle the question once and for all:
Sorry, I wasn't able to find this document online. Can you share a link?

Also, how that example "settles everything once and for all"? I offered a challenge: find '90s compilers that treated UB like modern ones do. If you cannot do that, then my theory gets a strong point, otherwise yours do. I exposed the practical problem that, no matter what the standard allowed, modern compilers broke a TON of software because of types of UB that nobody was worried about. Even if you prove definitively (and I'm not sure you can) that ANSI C was almost the same as C99 from the UB point of view, the fact that compilers nor developers nor the technical literature cared about it for decades remains, as well as the legit question why that happened. My theory answers that question with plenty of facts and clues.

Your position instead seems to be (and forgive me if I'm wrong): no change ever happened. UB has always been that way. The standard allowed everything we see today, from the beginning. C compilers wanted to enforce UB much earlier and did that in many cases [proof?], while for others didn't have the technology at the time. Most developers and people who wrote technical literature had no clue about what ANSI C was really about, as well as the UNIX developers who invented the Berkeley sockets. The language lawyers are always right. Whatever is written on "the paper" beats ANY other kind of arguments. If there's a mistake or a stupid idea on the paper, we have to change the world to make it right according to the paper and NEVER the other way around. Breaking legacy code, no matter how expensive the damage might be is ALWAYS fine because "the paper" says so. The purpose of the C compilers is not to serve people who write C code, but to worship the "almighty paper", at the expense of everything else.

Forgive me for the irony in the last sentences, but that's the feeling I'm getting. I meant no offense towards you. It's the position your taking that looks unacceptable to me. It feels like the complete opposite of Torvalds' one: papers are garbage, inferior to toilet paper, written by brain-damaged people. We don't need one and we don't care what's written there. Portability across compilers means nothing to us.

BOTH those extreme positions seem unacceptable to me. So, at the end, I'm a moderate and everybody will hate me for that.

_________________
Tilck, a Tiny Linux-Compatible Kernel: https://github.com/vvaltchev/tilck


Top
 Profile  
 
 Post subject: Re: C and undefined behavior
PostPosted: Tue Jun 15, 2021 10:54 am 
Offline
Member
Member

Joined: Thu May 17, 2007 1:27 pm
Posts: 954
I couldn't find the PDF of the rationale online (well, maybe some stores still sell it but I didn't bother searching those). But here is an auto-generated HTML transcription (which I did not check for errors against the original).

I do not propose that implementations should only accept programs that fall under ISO's "strictly conforming" definition, but these are the "rules as written". (While the blog post claims that the rules as written changed in an incompatible way.)

Btw, the liberal use of bold text makes text quite hard to read, that's why written book usually use italic for emphasis. (And regarding your "I meant no offense" message: don't worry, I am not offended :D. EDIT: I don't know why you think that my personal position is that compilers should only respect rules-as-written. On the contrary, my position is that compilers should warn about obvious cases of UB, add sanitizers for the non-obvious ones, insert UD2 into dead code paths, ... But I certainly don't think that their provisions changed with C99.)

vvaltchev wrote:
I exposed the practical problem that, no matter what the standard allowed, modern compilers broke a TON of software because of types of UB that nobody was worried about.

I can easily agree with that statement. But, no, that's not what you claimed, at least not originally. Your original claim (and the one by the blog author) is that the reason for this fact is that the understanding of what UB is changed. And I think that latter claim is just wrong.

We don't need to argue about whether newer compilers often broke previously working code due to exploiting UB more aggressively, that is pretty much a fact. But that is also not the point of this thread, as far as I understood it.

EDIT:
vvaltchev wrote:
Even if you prove definitively (and I'm not sure you can) that ANSI C was almost the same as C99 from the UB point of view, the fact that compilers nor developers nor the technical literature cared about it for decades remains, as well as the legit question why that happened.

I think that's easy to answer: compilers were simple in the 70s (to the point of happily accepting broken syntax etc) and became much more sophisticated over time. But even the people drafting the first standard allowed these optimizations to happen, being aware of the implications (see the rationale document). They probably did not expect the extent of compiler optimizations that became possible with LTO and what not, but they were certainly OK with the compiler formatting your hard drive when it runs into UB. (And in terms of the ISO standard: not formatting your hard drive on UB is a quality of implementation issue that is outside of the standard.)

_________________
managarm: Microkernel-based OS capable of running a Wayland desktop (Discord: https://discord.gg/7WB6Ur3). My OS-dev projects: [mlibc: Portable C library for managarm, qword, Linux, Sigma, ...] [LAI: AML interpreter] [xbstrap: Build system for OS distributions].


Top
 Profile  
 
 Post subject: Re: C and undefined behavior
PostPosted: Tue Jun 15, 2021 11:21 am 
Offline
Member
Member

Joined: Fri May 11, 2018 6:51 am
Posts: 198
Korona wrote:
vvaltchev wrote:
I exposed the practical problem that, no matter what the standard allowed, modern compilers broke a TON of software because of types of UB that nobody was worried about.

I can easily agree with that statement.
I'm happy to hear that :-)

Korona wrote:
But, no, that's not what you claimed, at least not originally. Your original claim (and the one by the blog author) is that the reason for this fact is that the understanding of what UB is changed.
Yes, I agree. That was my original claim and it's the only thing I'm not sure about. That is just a theory, the rest are facts and clues, that, just could be interpreted in favor of my theory. Later, the discussion changed when Solar asked for some hard evidence that could convince me that such theory does not make any sense and so I started talking about concrete examples of how compilers treated UB.

Korona wrote:
And I think that latter claim is just wrong.
Ah OK. I have no problem with that position per se. I'm not sure of that, but you might be absolutely right as well. If we assume your position on the paper as "correct" (= no substantial change in about how UB can be treated), the following question remains: why didn't compilers seem to care about that, at all? (If we agree that compilers didn't care back then, of course. That's the where the discussion moved in the latest posts with Solar). Also, why didn't the literature care about talking about how dangerous UB is over and over again? Why evil type-punning casts have been used in examples? I mean, in my POV, the theory about the change in the standard explains those questions in an easy and convenient way. Other explanations are definitively possible, but seem too articulated to me.

Korona wrote:
We don't need to argue about whether newer compilers often broke previously working code due to exploiting UB more aggressively, that is pretty much a fact.
I feel much better now. We're converging on the hard facts.

Korona wrote:
But that is also not the point of this thread, as far as I understood it.
True, it wasn't point of the thread. We just got there recently.

_________________
Tilck, a Tiny Linux-Compatible Kernel: https://github.com/vvaltchev/tilck


Top
 Profile  
 
 Post subject: Re: C and undefined behavior
PostPosted: Tue Jun 15, 2021 11:46 am 
Offline
Member
Member
User avatar

Joined: Thu Nov 16, 2006 12:01 pm
Posts: 7530
Location: Germany
I think that the discussion has moved on to the point where a reply by me is unnecessary.

I just couldn't resist adding in some history here though:

[quote="vvaltchev"]I'd say: show me (compiler name, version & code snippet) at least one mainstream C compiler in the '90s that did weird stuff, taking advantage of UB, in the case of:

  • De-referencing of a NULL pointer. Not a "hard proof", but still a point in your favor. The compiler must be released before 1999. Code snippet (other snippets are accepted as well):
    Code:
    int foo(void) {
        char *p = 0;
        *p = 'a';
        return 1;
    }

Not so much the compiler "doing weird stuff".

But on AmigaOS (first version released 1985), IntuitionBase -- the address to the jump table allowing access to the basic OS functions -- resides at $00000004. If that hadn't been a char you wrote there, but e.g. a double, you'd just have shot your OS in the head. Any access to Intuition functions (like, starting a new executable) would now actually do undefined things, because it would interpret whatever you wrote to $00000004 as IntuitionBase, and execute basically random areas of memory.

8)

_________________
Every good solution is obvious once you've found it.


Top
 Profile  
 
 Post subject: Re: C and undefined behavior
PostPosted: Tue Jun 15, 2021 11:53 am 
Offline
Member
Member

Joined: Fri May 11, 2018 6:51 am
Posts: 198
Korona wrote:
compilers were simple in the 70s (to the point of happily accepting broken syntax etc) and became much more sophisticated over time.
OK, I agree with that.

However, in this thread I've learned that Fortran had noalias pointers by default and enforced UB, while, at the same time C compilers did not. Actually, it took many years for C compilers to catch up with that. Putting any standard upgrades aside, only around ~2010 when -fstrict-aliasing was introduced we got a feature apparently similar to what Fortran had 20 years earlier. How do you explain that? Again, we're assuming here your theory as correct, so C99 did not impact this feature. The restrict keyword is necessary only when we want to tell the compiler that pointers of the same type do not alias. Pointers of different types are always assumed to not alias.

Korona wrote:
They probably did not expect the extent of compiler optimizations that became possible with LTO and what not
Yeah, for sure.

Korona wrote:
but they were certainly OK with the compiler formatting your hard drive when it runs into UB. (And in terms of the ISO standard: not formatting your hard drive on UB is a quality of implementation issue that is outside of the standard.)
Well, that's evil. And, jokes like "formatting your hard drive" apart, garbage code generated because of UB caused security bugs as well.

_________________
Tilck, a Tiny Linux-Compatible Kernel: https://github.com/vvaltchev/tilck


Top
 Profile  
 
 Post subject: Re: C and undefined behavior
PostPosted: Tue Jun 15, 2021 12:09 pm 
Offline
Member
Member

Joined: Fri May 11, 2018 6:51 am
Posts: 198
Solar wrote:
I think that the discussion has moved on to the point where a reply by me is unnecessary.
Ahahaa I don't think so.. you just wanna escape the challenge :-)


Solar wrote:
But on AmigaOS (first version released 1985), IntuitionBase -- the address to the jump table allowing access to the basic OS functions -- resides at $00000004. If that hadn't been a char you wrote there, but e.g. a double, you'd just have shot your OS in the head. Any access to Intuition functions (like, starting a new executable) would now actually do undefined things, because it would interpret whatever you wrote to $00000004 as IntuitionBase, and execute basically random areas of memory.


So, if I've understood correctly, you're saying that if I wrote a double on $00000004, the program would execute "random areas of memory" and would crash. Well, that does not mean to me that the compiler treated it as UB (in the way we intend today). It simply means that the compiler did what I told it to. If that causes a crash or whatever behavior, that doesn't matter to me. That's not modern UB, at all. If we look at the generated instructions (and my understanding of your description is correct), we'd see something like (I'm translating into x86):

Code:
MOV QWORD PTR [0x00000004], 0x11223344aabbccdd  ; write the double to the evil location
MOV EAX, 1                                      ; return 1
RET


In my POV, no matter what the result of those instructions might be, I accept that because the compiler did exactly what I told it to. If the generated code was something like:
Code:
UB2

Then, it meant that the compiler refused intentionally to generate the code I wanted because it's UB. That's a BIG difference.

So, if I did something WRONG (UB) and the compiler emitted code doing exactly what I told it to and that caused my hard drive to be formatted, I'm fine with that. But, with I did wrote UB code and the compiler emitted arbitrary instructions formatting my drive, just because theoretically that could have been the actual result of my wrong UB code, that's not fine, for me. That's an abuse of UB, in my POV.

_________________
Tilck, a Tiny Linux-Compatible Kernel: https://github.com/vvaltchev/tilck


Top
 Profile  
 
 Post subject: Re: C and undefined behavior
PostPosted: Tue Jun 15, 2021 12:42 pm 
Offline
Member
Member

Joined: Thu May 17, 2007 1:27 pm
Posts: 954
This is a bit off-topic regarding the original premise of this thread but: one issue with letting the compiler emit "what you wrote" is that you now have to tell the middle end a lot of target-specific information. You need to pass information like "divisions can have side effects", "memory access can fault", "signed integers wrap" etc. to prevent the optimization passes from "exploiting" the UB.

_________________
managarm: Microkernel-based OS capable of running a Wayland desktop (Discord: https://discord.gg/7WB6Ur3). My OS-dev projects: [mlibc: Portable C library for managarm, qword, Linux, Sigma, ...] [LAI: AML interpreter] [xbstrap: Build system for OS distributions].


Top
 Profile  
 
 Post subject: Re: C and undefined behavior
PostPosted: Tue Jun 15, 2021 12:53 pm 
Offline
Member
Member

Joined: Fri May 11, 2018 6:51 am
Posts: 198
Korona wrote:
This is a bit off-topic regarding the original premise of this thread but: one issue with letting the compiler emit "what you wrote" is that you now have to tell the middle end a lot of target-specific information. You need to pass information like "divisions can have side effects", "memory access can fault", "signed integers wrap" etc. to prevent the optimization passes from "exploiting" the UB.
Yes, that's absolutely true.

Also, having all that working would mean having what I'd call a "portable assembler". Today we have instead an "abstract compiled language". In the past, because of the lack of advanced optimizations and/or other reasons [that's what we discussed about most of the time here], we had that "portable assembler".

_________________
Tilck, a Tiny Linux-Compatible Kernel: https://github.com/vvaltchev/tilck


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 84 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6  Next

All times are UTC - 6 hours


Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group