Framebuffer: Draw a PSF character

Korona · **Joined:** Thu May 17, 2007 1:27 pm **Posts:** 999

You can use unaligned access on x86 (obviously; it's clear that this works at the assembly level) but you have to tell the compiler that you're doing that (e.g., via __attribute__(packed), or you can access the ptr via memcpy; GCC will translate the memcpy call to a usual load/store and this is well-defined according to the standard). GCC is certainly allowed to assume that accesses are aligned, for example when auto vectorizing code (as alignment is demanded by the C standard and __alignof__(int) == 4).

For an example where GCC indeed assumes aligned access on x86 and "miscompiles" code that uses non-aligned pointers, see this stack overflow answer: https://stackoverflow.com/a/47512025 (note that the question is misleading in this context -- the issue is unrelated to mmap):

Quote:

gcc4.8 makes a loop prologue that tries to reach an alignment boundary, but it assumes that uint16_t *p is 2-byte aligned, i.e. that some number of scalar iterations will make the pointer 16-byte aligned.

Additionally, here is a minimal Godbolt example where GCC assumes that pointers are aligned: https://godbolt.org/z/ezx4oo. GCC optimizes this function to always return 1 (see the assembly code) but if pointers are unaligned, this function can return other results, such as 257.

EDIT: a feature request ("When the underlying ISA does not force pointer alignment, option to make GCC not assume it") to make unaligned access defined behavior via a GCC command line flag was closed as WONTFIX last year.

vvaltchev · **Joined:** Fri May 11, 2018 6:51 am **Posts:** 274

Strict aliasing is really mandatory in C99 and later. The problem is well-explained on stackoverflow:
https://stackoverflow.com/questions/98650/what-is-the-strict-aliasing-rule

But it has no direct relationship with unaligned access, AFAIK. For example:

Code:

int *arr = (void *) ( (char *)malloc(sizeof(int) * 16) + 1 );
*arr = 42;

It's safe, in my understanding. It does not break the strict aliasing rule, because
everything can alias a "char *" pointer and vice-versa.

Back to the strict aliasing rule: I agree that sometimes it's convenient to alias pointers of different types (!= char *).
One workaround is to use a union, the other is just to compile with: -fno-strict-aliasing. That has some price
because some optimizations will be avoided, BUT the compiler will generate correct code. Maybe, for low-level
kernel projects, -fno-strict-aliasing is OK. Another flag like that is -fwrapv, which makes the wrapping of signed
integers to be machine-dependent instead of being UB. The Linux kernel uses both -fwrapv and -fno-strict-aliasing,
so does my project.

In conclusion, I prefer, when possible, to write the safest and most portable code possible. I've experience writing
C and C++ code that had to be compiled by a plethora of compilers and had to run on 4 different operating systems,
and 2 architectures, behaving exactly the same way. But, sometimes, it makes sense to bend some rules: compiling
a kernel with -fwrapv and -fno-strict-aliasing is safe and fine in my opinion, BUT "breaking the rules" without telling
the compiler we're doing that is a big "NO NO". And yes, we're writing code for the "C abstract machine",
no matter if we like that or not. That's allows the compilers to do smarter optimizations. To write more imperative code,
we need to tell that to the compiler with flags, attributes etc. Cheating will only cause people an unbelievable amount of
headaches while debugging weird bugs.

Korona · **Joined:** Thu May 17, 2007 1:27 pm **Posts:** 999

strict aliasing != pointer alignment.

(You are right that aliasing is also important but this is a separate issue.)

vvaltchev · **Joined:** Fri May 11, 2018 6:51 am **Posts:** 274

OK, we all agree that "strict aliasing != pointer alignment", but I was partially wrong (see below) about the unaligned access.
In the C11 standard (draft N1570) there's the following statement in the "Undefined behavior" section:

Quote:

Conversion between two pointer types produces a result that is incorrectly aligned
(6.3.2.3).

In the Linux kernel there are two macros: put_unaligned and get_unaligned:
https://www.kernel.org/doc/html/latest/core-api/unaligned-memory-access.html

And there is a generic implementation as well:
https://elixir.bootlin.com/linux/latest/source/include/linux/unaligned/generic.h#L29

But, things seem to be tricky. On architectures where unaliagned access is supported by
the HW (like x86), the CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS flag is defined.

Therefore, unaligned access is just used. I believe that -fno-strict-aliasing makes the compiler to
never cause UB. Let's look at this code, from: https://elixir.bootlin.com/linux/latest/source/lib/strncpy_from_user.c#L15

Code:

#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
#define IS_UNALIGNED(src, dst)   0
#else
#define IS_UNALIGNED(src, dst)   \
   (((long) dst | (long) src) & (sizeof(long) - 1))
#endif

/*
 * Do a strncpy, return length of string without final '\0'.
 * 'count' is the user-supplied count (return 'count' if we
 * hit it), 'max' is the address space maximum (and we return
 * -EFAULT if we hit it).
 */
static inline long do_strncpy_from_user(char *dst, const char __user *src,
               unsigned long count, unsigned long max)
{
   const struct word_at_a_time constants = WORD_AT_A_TIME_CONSTANTS;
   unsigned long res = 0;

   if (IS_UNALIGNED(src, dst))
      goto byte_at_a_time;

   while (max >= sizeof(unsigned long)) {
      unsigned long c, data, mask;

      /* Fall back to byte-at-a-time if we get a page fault */
      unsafe_get_user(c, (unsigned long __user *)(src+res), byte_at_a_time);

      /*
       * Note that we mask out the bytes following the NUL. This is
       * important to do because string oblivious code may read past
       * the NUL. For those routines, we don't want to give them
       * potentially random bytes after the NUL in `src`.
       *
       * One example of such code is BPF map keys. BPF treats map keys
       * as an opaque set of bytes. Without the post-NUL mask, any BPF
       * maps keyed by strings returned from strncpy_from_user() may
       * have multiple entries for semantically identical strings.
       */
      if (has_zero(c, &data, &constants)) {
         data = prep_zero_mask(c, data, &constants);
         data = create_zero_mask(data);
         mask = zero_bytemask(data);
         *(unsigned long *)(dst+res) = c & mask;
         return res + find_zero(data);
      }

      *(unsigned long *)(dst+res) = c;

      res += sizeof(unsigned long);
      max -= sizeof(unsigned long);
   }

byte_at_a_time:
   while (max) {
      char c;

      unsafe_get_user(c,src+res, efault);
      dst[res] = c;
      if (!c)
         return res;
      res++;
      max--;
   }

   /*
    * Uhhuh. We hit 'max'. But was that the user-specified maximum
    * too? If so, that's ok - we got as much as the user asked for.
    */
   if (res >= count)
      return res;

   /*
    * Nope: we hit the address space limit, and we still had more
    * characters the caller would have wanted. That's an EFAULT.
    */
efault:
   return -EFAULT;
}

IS_UNALIGNED() is always 0, on x86 for example. Therefore yeah, I wasn't totally wrong saying that unaligned access is kind of safe.
Probably, the C11 standard is stricter, just to stay generic and include architectures where that is not supported natively.

bzt · **Joined:** Thu Oct 13, 2016 4:55 pm **Posts:** 1584

@Korona: that bug only applies to a specific case when vector instructions are enabled (via an optimization level, not choosing MOVAPS/MOVUPS correctly, not about using MOV vs. MOVxPS).

I agree that you can't always use pointer casting, and it could be misinterpreted under some circumstances. The question is, when you can use it safely and it results in a more compact and effective code then why shouldn't you use it? My point of view is you should (with an #ifdef arch or #ifdef compiler guard if you want the code to be portable).

Take a look at my RLE implementation, it's a perfect example for that. There's an rle_libc.h version which is less effective but portable using memcpy, and there's rle_opt.h version, totally dependency-free and optimized with pointer casts (I could have used one header file with ifdefs too). It is up to the developer to decide which one to use. If rle_opt.h works on the target architecture, I see no reason why to use the less effective implementation.

Another perfect example is my optimized SLERP. That uses pointer casts to avoid struct packing and make it compatible with any programming language and registers (GPR/SIMD). There's a crossplatform but not so efficient ANSI C version, slerp_cross.c, and there's a x86-only slerp_simd.c (it uses a define to select between MOVAPS and MOVUPS, because I'm aware that the compiler can't possibly choose the correct one). It is up to the developer to decide which one to use. If he knows that he's only compiling for x86, I see absolutely no reason why to use the portable version over the more effective version. And if he makes sure of it that only aligned floats are passed to the function, I see no reason why not to use MOVAPS. (Again, since both function's prototype are the same, I could have used a single file with #ifdef __X86_64__ instead of two files.)

My point is, just because not all architectures support unaligned access and SIMD intrinsics is not a reason to have a less effective ANSI C version only. Nope, I think if you can, then you should have the most optimized version possible.

(And again, these kind of optimizations are way beyond the scope of a wiki page in the "Tutorials" category.)

Cheers,
bzt

Octocontrabass · **Joined:** Mon Mar 25, 2013 7:01 pm **Posts:** 5144

bzt wrote:

Because it's a code example on a newcomer's tutorial page, not a fully blown, ready to be compiled kernel source tree.

Shouldn't example code prioritize clarity over speed? Especially if it's meant for newcomers!

bzt wrote:

I haven't seen you complaining about Babystep tutorial not being portable for example.

Drawing pixels on a framebuffer works exactly the same on every CPU.

bzt wrote:

And is a properly working compiler allowed to emit any of those instructions for this C code?

Yes. It probably won't, because it would be a very strange choice, but it is allowed to.

bzt wrote:

Actually I had a lot of trouble to make gcc emit MOVAPS even when I explicitly wanted it in my optimized SLERP code. I ended up using intrinsics instead because no, no optimizer smart enough to figure out using packed scalars on its own.

Clang's optimizer is smart enough to figure it out. Keep in mind two pointers to the same type might alias one another, so stores to one of the pointers might affect later loads from the other. That can prevent the optimizer from reordering and parallelizing the code. Try using the restrict keyword.

bzt wrote:

C89 specification won't change in the future that's for sure and it's very well known how a compiler should compile it, with all the side effects and quircks.

Unaligned pointers and aliasing of incompatible types are undefined behavior in C89 too, those aren't new. What has changed since 1989 is that now, compilers are smart enough to optimize in ways that will break your code if you try to rely on undefined behavior.

bzt wrote:

And I must ask again, how would adding 3 bytes at the end help you when you need to read 4 bytes from the middle of bitchunk?

Reading past the end of an array is undefined behavior. If you add 3 bytes to the end of the PSF array, then there is no problem if you do this:

Code:

      uint32_t mask=1<<31;
      uint32_t bitmap = (glyph[0] << 24) | (glyph[1] << 16) | (glyph[2] << 8) | glyph[3];
        for(x=0;x<font->width;x++){
            fb[offset] = bitmap & mask ? fg : bg;
            /* adjust to the next pixel */
            mask >>= 1;
            offset++;
        }

bzt wrote:

You make absolutely no sense. How is that an answer to my question at all? Which C compiler doesn't support them? GCC definitely does.

GCC requires you to tell it when a pointer will not be correctly aligned. It does not support arbitrary misaligned pointers.

bzt wrote:

Nope, you made it pretty clear that your priority is the language, with imaginary restrictions that no actual x86 compiler was restricted to in the last 30 years (and won't be in the future).

Korona already posted an example of an existing compiler with that restriction.

vvaltchev wrote:

Code:

*(unsigned long *)(dst+res) = c;

I'm pretty sure the GCC developers would tell you that's a Linux bug. I don't know if the Linux developers care, though.

bzt wrote:

@Korona: that bug only applies to a specific case when vector instructions are enabled (via an optimization level, not choosing MOVAPS/MOVUPS correctly, not about using MOV vs. MOVxPS).

Vector instructions are not enabled or disabled by optimization settings.

bzt wrote:

The question is, when you can use it safely and it results in a more compact and effective code then why shouldn't you use it?

Because there is no way to use it safely. Even if it works with all current compilers, you can't say that it will work with all future compilers too.

bzt wrote:

Take a look at my RLE implementation, it's a perfect example for that. There's an rle_libc.h version which is less effective but portable using memcpy, and there's rle_opt.h version, totally dependency-free and optimized with pointer casts (I could have used one header file with ifdefs too). It is up to the developer to decide which one to use. If rle_opt.h works on the target architecture, I see no reason why to use the less effective implementation.

Take rle_opt.h, replace the pointer casts with equivalent operations on individual bytes, add the restrict keyword to pointers that shouldn't overlap, and you'll have an optimized version with no undefined behavior. There's no need for memcpy() here.

bzt wrote:

Another perfect example is my optimized SLERP. That uses pointer casts to avoid struct packing and make it compatible with any programming language and registers (GPR/SIMD).

Neither GCC nor Clang have any trouble optimizing "__m128 ta = {qa->w,qa->x,qa->y,qa->z};" into a single MOVUPS instruction. How do the pointer casts help with that?

bzt wrote:

(it uses a define to select between MOVAPS and MOVUPS, because I'm aware that the compiler can't possibly choose the correct one)

Tell the compiler the pointer will be aligned (e.g. "qa = __builtin_assume_aligned(qa,16);") and the above example will compile into MOVAPS instead of MOVUPS. It might not be worth the trouble, though: on most recent CPUs, MOVAPS is never faster than MOVUPS.

bzt wrote:

My point is, just because not all architectures support unaligned access and SIMD intrinsics is not a reason to have a less effective ANSI C version only.

But you can still make the C version better without relying on undefined behavior.

bzt · **Joined:** Thu Oct 13, 2016 4:55 pm **Posts:** 1584

Octocontrabass wrote:

Shouldn't example code prioritize clarity over speed? Especially if it's meant for newcomers!

For a code that's going to be executed about a million times? Definitely not. If newcomers are interested in a plain stupid implementation, they can take a look at the code on the VGA Fonts page. There's no point in duplicating that code on this page.

Octocontrabass wrote:

bzt wrote:

And is a properly working compiler allowed to emit any of those instructions for this C code?

Yes. It probably won't, because it would be a very strange choice, but it is allowed to.

Let me rephrase: would you call that a "properly working compiler"? I wonder how could that be vectorized correctly in the first place.

Octocontrabass wrote:

bzt wrote:

And I must ask again, how would adding 3 bytes at the end help you when you need to read 4 bytes from the middle of bitchunk?

Reading past the end of an array is undefined behavior.

Okay, seriously, I'm asking again: "how would adding 3 bytes at the end help you when you need to read 4 bytes from the middle of bitchunk?" See, the emphasis is on reading from the middle.

BTW, with a valid PSF this code will NEVER read past the end of the array, and I'll leave it to you as homework to figure out why.

Octocontrabass wrote:

bzt wrote:

You make absolutely no sense. How is that an answer to my question at all? Which C compiler doesn't support them? GCC definitely does.

GCC requires you to tell it when a pointer will not be correctly aligned. It does not support arbitrary misaligned pointers.

You did not answer my question, and have you actually read that page? Where does that page say that "GCC requires you to tell it when a pointer will not be correctly aligned" or that "It does not support arbitrary misaligned pointers"? Citation needed!

Octocontrabass wrote:

Vector instructions are not enabled or disabled by optimization settings.

Are you serious? What does "-ftree-vectorize" or "-fno-tree-vectorize" do then? You should actually read the GCC documentation. This is the second time you're wrong about GCC in the same post.

Octocontrabass wrote:

Even if it works with all current compilers, you can't say that it will work with all future compilers too.

Yes I can say because compilers are not allowed to change ANSI C89 behaviour, not now, and not in the future. But if one of them do, it's always temporary as that's a compiler bug which gets fixed sooner or later.

Octocontrabass wrote:

Take rle_opt.h, replace the pointer casts with equivalent operations on individual bytes, add the restrict keyword to pointers that shouldn't overlap, and you'll have an optimized version

Optimized? A far worse and slower code! As I've said, you have absolutely no idea how the CPU actually works. This isn't your fault. (Hint: using individual bytes would require multiple MOVs, a temporary variable (could be a register, but still) and shifting and ORing, hell lot more CPU cycles than with a single MOV.)

Octocontrabass wrote:

Neither GCC nor Clang have any trouble optimizing "__m128 ta = {qa->w,qa->x,qa->y,qa->z};" into a single MOVUPS instruction. How do the pointer casts help with that?

Never said they have trouble with that. They have trouble with using MOVAPS instead, which is pretty obvious since there's no guarantee that the input will be aligned. And no, the solution isn't pointer cast (where on earth did you get that?). If you would take a glimpse of the code (which I am 100% sure you haven't) you'd know what the solution is and why.

Octocontrabass wrote:

But you can still make the C version better without relying on undefined behavior.

Your proposed modification compiles into a lot worse and less effective Assembly code. Try it out if you don't believe me. Longer Assembly, more variable requirement, uses more CPU cycles, I would most definitely call that a worse version, not a better one.

Cheers,
bzt

sham1 · **Joined:** Wed Nov 30, 2011 12:44 pm **Posts:** 16

bzt wrote:

Octocontrabass wrote:

Yes. It probably won't, because it would be a very strange choice, but it is allowed to.

Let me rephrase: would you call that a "properly working compiler"? I wonder how could that be vectorized correctly in the first place.

That is very much a "properly working compiler". It'd be an adversarial implementation, yes, but it would go according to the standard.

bzt wrote:

Octocontrabass wrote:

Even if it works with all current compilers, you can't say that it will work with all future compilers too.

Yes I can say because compilers are not allowed to change ANSI C89 behaviour, not now, and not in the future. But if one of them do, it's always temporary as that's a compiler bug which gets fixed sooner or later.

They're not allowed to change C99, C11, nor C17 behaviour either. But here's the thing, reads and writes through unaligned pointers is straight up invalid, even according to your oh-so-loved C89.

Meanwhile the later standards have introduced new capabilities to the language standard that can very much help you write better code. In this case notably C99's restrict-keyword, which allows you to assume that memory regions don't overlap and thus the compiler can be more aggressive when optimizing stuff.

bzt wrote:

Octocontrabass wrote:

Take rle_opt.h, replace the pointer casts with equivalent operations on individual bytes, add the restrict keyword to pointers that shouldn't overlap, and you'll have an optimized version

Optimized? A far worse and slower code! As I've said, you have absolutely no idea how the CPU actually works. This isn't your fault. (Hint: using individual bytes would require multiple MOVs, a temporary variable (could be a register, but still) and shifting and ORing, hell lot more CPU cycles than with a single MOV.)

You have absolutely no idea how the optimizer actually works. This isn't your fault either. Hint: despite using byte moves in the source code, the optimizer can, and very likely will, use larger moves if it notices that you're doing things with adjacent addresses. This includes, also, memcpy which is also why the compiler is allowed to optimize to and from using it.

bzt wrote:

Octocontrabass wrote:

Neither GCC nor Clang have any trouble optimizing "__m128 ta = {qa->w,qa->x,qa->y,qa->z};" into a single MOVUPS instruction. How do the pointer casts help with that?

Never said they have trouble with that. They have trouble with using MOVAPS instead, which is pretty obvious since there's no guarantee that the input will be aligned. And no, the solution isn't pointer cast (where on earth did you get that?). If you would take a glimpse of the code (which I am 100% sure you haven't) you'd know what the solution is and why.

Even if a pointer cast made the generated code contain MOVAPS instead of MOVUPS instructions, now you'd just made the CPU angry by trying to use MOVAPS on potentially unaligned data by using intrinsics to choose the wrong instruction. What was that about not knowing how the CPU works, huh?

Now granted, you did guard it with a preprocessor macro. But you know what might have also helped? Copying the data properly so the compiler knows that using aligned instructions would be alright. Copying 16 bytes of data is not going to be that bad.

bzt wrote:

Octocontrabass wrote:

But you can still make the C version better without relying on undefined behavior.

Your proposed modification compiles into a lot worse and less effective Assembly code. Try it out if you don't believe me. Longer Assembly, more variable requirement, uses more CPU cycles, I would most definitely call that a worse version, not a better one.

How do you know? Have you actually profiled it? Or are you just spreading FUD because you cannot fathom that maybe what you're doing is incorrect?

Octocontrabass · **Joined:** Mon Mar 25, 2013 7:01 pm **Posts:** 5144

bzt wrote:

Let me rephrase: would you call that a "properly working compiler"? I wonder how could that be vectorized correctly in the first place.

Yes. (Perhaps it could be vectorized to convert multiple pixels at once by broadcasting some of the bits into a mask to select between foreground and background pixel values.)

bzt wrote:

Okay, seriously, I'm asking again: "how would adding 3 bytes at the end help you when you need to read 4 bytes from the middle of bitchunk?" See, the emphasis is on reading from the middle.

BTW, with a valid PSF this code will NEVER read past the end of the array, and I'll leave it to you as homework to figure out why.

Because you're not always reading from the middle. The last glyph can be at the very end of the PSF file, and if each row is less than four bytes, you'll read past the end. Not every font includes a Unicode mapping table after the last glyph.

bzt wrote:

You did not answer my question, and have you actually read that page? Where does that page say that "GCC requires you to tell it when a pointer will not be correctly aligned" or that "It does not support arbitrary misaligned pointers"? Citation needed!

GCC implements the C language standard. All versions of the C language standard supported by GCC say misaligned pointers are undefined behavior. Unless specified elsewhere in the GCC documentation, everything in the C language standard applies to GCC's implementation. This means that, unless specified elsewhere, misaligned pointers are undefined behavior. The page on variable attributes is the only place in the GCC documentation that mentions reducing alignment below what's required by the C language standard (or the target ABI). Since this page does not specify that GCC allows all pointers to have less than the minimum alignment required by the C language standard, the C language standard requirements for pointer alignment apply by default.

bzt wrote:

Are you serious? What does "-ftree-vectorize" or "-fno-tree-vectorize" do then?

It enables and disables tree vectorization. It does not enable or disable vector instructions.

bzt wrote:

Yes I can say because compilers are not allowed to change ANSI C89 behaviour, not now, and not in the future. But if one of them do, it's always temporary as that's a compiler bug which gets fixed sooner or later.

ANSI C89 says unaligned pointers are undefined behavior. That means compilers can do whatever they want with unaligned pointers, including generating code that crashes when it encounters an unaligned pointer, while following ANSI C89 exactly as it's written:

ANSI C89 wrote:

Among the invalid values for dereferencing a pointer by the unary * operator are a null pointer, an address inappropriately aligned for the type of object pointed to, or the address of an object that has automatic storage duration when execution of the block in which the object is declared and of all enclosed blocks has terminated.

If an invalid value has been assigned to the pointer, the behavior of the unary * operator is undefined.

bzt wrote:

Optimized? A far worse and slower code! As I've said, you have absolutely no idea how the CPU actually works. This isn't your fault. (Hint: using individual bytes would require multiple MOVs, a temporary variable (could be a register, but still) and shifting and ORing, hell lot more CPU cycles than with a single MOV.)

What does this have to do with the CPU? The compiler is free to merge multiple bytes into a single move, and with optimizations enabled, it does exactly that.

But, for some reason, both GCC and Clang fall back to moving individual bytes once they can't fill a SIMD register. Perhaps giving the optimizer better hints will help?

bzt wrote:

Never said they have trouble with that. They have trouble with using MOVAPS instead, which is pretty obvious since there's no guarantee that the input will be aligned. And no, the solution isn't pointer cast (where on earth did you get that?). If you would take a glimpse of the code (which I am 100% sure you haven't) you'd know what the solution is and why.

I have looked at it. I fixed the mistaken promotion to double and the incorrect SSE4.1 guard. And yes, if you look at the generated code, the compiler does use MOVAPS when it knows the pointer will be aligned. It works with the generic version too, although the optimizer is not smart enough to beat your SIMD optimizations.

So, what is the purpose of those pointer casts aside from begging for more undefined behavior? You can create a C struct in C++, and you can put a C struct inside a C++ struct. (I'll admit I don't know for sure if those particular casts would be undefined behavior; aliasing rules are complicated.)

vvaltchev · **Joined:** Fri May 11, 2018 6:51 am **Posts:** 274

Octocontrabass wrote:

vvaltchev wrote:

Code:

*(unsigned long *)(dst+res) = c;

I'm pretty sure the GCC developers would tell you that's a Linux bug. I don't know if the Linux developers care, though.

In my understanding, the -fno-strict-aliasing option has been added exactly to support such code.
I mean, of course "strict aliasing" != "unaligned access" but this option prevents GCC and Clang to
make a whole set of assumptions like that might potentially turn unaligned access into UB.
Therefore, with -fno-strict-aliasing I believe it's guaranteed that unaligned access is not UB. There
are several people involved both in GCC and in Linux's kernel, so I'm confident that such code
is not buggy.

In general, I have the perception that the whole strict aliasing thing (somehow related with unaligned access & friends)
is controversial and the fact that modern compilers started to enforce it, made unsafe a ton of old-school C code.
I did a little research about the topic, because I was curious.

1. An interesting essay by Dennis Ritchie back in 1988 about the proposal of "noalias" which was a stronger version
of modern's restrict keyword: http://www.lysator.liu.se/c/dmr-on-noalias.html Obviously, it's not the same
as restrict, but it's interesting to read about DMR's general concerns about the topic.

2. A question about the safety of type-punning with unions where the author concludes that "semantics here is not completely clear".
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2222.htm#TOC

3. Linus Torvalds ranting about a patch where type-punning with unions are replaced with something else:
https://lkml.org/lkml/2018/6/5/769

4. A blogger ranting against strict aliasing, making some interesting examples: https://blog.regehr.org/archives/1307

I'm not trying to prove anything other than the topic is controversial, no matter which part won in the C standard.
At the end, I personally understand both the sides. Just, for kernel development I'd say that probably is more convenient
to not follow strictly the C standard. In particular, about the strict aliasing, I've understood that the biggest reason for
enforcing it was to allow compilers to do auto-vectorization in more cases, but that doesn't apply to kernel code as there,
in 99.99% of the code, we don't allow FPU instructions anyway (because we don't want to save the FPU context etc.).

Another thing, completely unrelated with strict aliasing/unaligned access, but in topic with the problem of following or not
the C standard while writing a kernel is the lack of inline assembly support in the C language. In kernel development, I believe
it's essential and cannot be replaced by separated assembly files, unless we pay the price of function calls, which is not fine
in some cases. I totally understand why inline assembly is not part of the C standard: it's extremely hard or practically impossible
to define it in a generic, arch-agnostic way. All of that, makes perfect sense to me, but it won't stop me from not using inline assembly.
It's simply too convenient. Therefore, we write operating systems either for the C dialect offered by GCC+Clang compilers (very compatible),
or for the C dialect offered by Microsoft VC compiler or for the C dialect offered by the Intel C compiler. In any case, since the C standard
alone it's not just enough for kernel code and we have to live with that, I think it's fine bending some other minor rules here and there if that
will make our lives easier :-)

vvaltchev · **Joined:** Fri May 11, 2018 6:51 am **Posts:** 274

vvaltchev wrote:

In my understanding, the -fno-strict-aliasing option has been added exactly to support such code.
I mean, of course "strict aliasing" != "unaligned access" but this option prevents GCC and Clang to
make a whole set of assumptions like that might potentially turn unaligned access into UB.
Therefore, with -fno-strict-aliasing I believe it's guaranteed that unaligned access is not UB. There
are several people involved both in GCC and in Linux's kernel, so I'm confident that such code
is not buggy.

@Octocontrabass I'd like to add that while I believe that such code with -fno-strict-aliasing is safe with gcc and clang,
it doesn't mean I'm absolutely 100% sure of that. I wasn't able to find anything in the documentation explicitly said
that it is, nor that it isn't and I'm not a compiler expert to check gcc's source quickly and find it by myself. I believe in that
theory with 99.9% confidence, at the moment. But, if you have ANY kind of proof that, with gcc and/or clang -fno-strict-aliasing
is not enough for allowing unaligned access to be safe in any case, that would be a BIG DEAL. I would highly appreciate
such a proof and I would probably spend some time talking with the GCC guys and see what can be done. That would mean
that a TON of code in the Linux kernel and elsewhere just lives under the illusion that -fno-strict-aliasing is saving them from the
UB beast, while it's not always the case. If you have any ways to check that, it would be great.

Octocontrabass · **Joined:** Mon Mar 25, 2013 7:01 pm **Posts:** 5144

vvaltchev wrote:

I'd like to add that while I believe that such code with -fno-strict-aliasing is safe with gcc and clang,
it doesn't mean I'm absolutely 100% sure of that. I wasn't able to find anything in the documentation explicitly said
that it is, nor that it isn't and I'm not a compiler expert to check gcc's source quickly and find it by myself.

There's no relation between misaligned pointers and -fno-strict-aliasing.

I am not a compiler expert either, so I can't dig through GCC's source to figure out which optimizations are enabled or what they do. However, I do know that the most likely two causes of unwanted behavior as a result of misaligned pointers are (a) use of instructions that require aligned addresses, and (b) removal of code paths that lead to misaligned pointers on the assumption that the undefined behavior is unreachable.

Linux avoids (a) by disabling SIMD instructions. Presumably, no current version of GCC or Clang is capable of (b), but I don't see anything that guarantees they won't implement such optimizations in the future. The GCC devs have explicitly refused to allow misaligned pointers in the past.

With Linux being so big and influential, it's possible that future optimizations along the lines of (b) will get a flag to disable them, like how -fno-delete-null-pointer-checks can disable similar optimizations around null pointer dereferences.

Korona · **Joined:** Thu May 17, 2007 1:27 pm **Posts:** 999

The Godbolt snippet that I posted above optimizes based on alignment even if -fno-strict-aliasing is given (it's actually extracted from a blog post by the guy who reported your linked bug).

@vvaltchev btw, manually breaking your lines makes your text quite hard to read if line length do not sync up with window width.

vvaltchev · **Joined:** Fri May 11, 2018 6:51 am **Posts:** 274

Octocontrabass wrote:

I am not a compiler expert either, so I can't dig through GCC's source to figure out which optimizations are enabled or what they do. However, I do know that the most likely two causes of unwanted behavior as a result of misaligned pointers are (a) use of instructions that require aligned addresses, and (b) removal of code paths that lead to misaligned pointers on the assumption that the undefined behavior is unreachable.

Linux avoids (a) by disabling SIMD instructions. Presumably, no current version of GCC or Clang is capable of (b), but I don't see anything that guarantees they won't implement such optimizations in the future. The GCC devs have explicitly refused to allow misaligned pointers in the past.

Your theory sounds solid to me, in particular after reading the comments on gcc's bug #93031. I'd just add that it doesn't make any sense to me to not add a new option for supporting unaligned access. It feels like the compilers are pushing the UB boundaries "too far" without giving any hope to legacy code to work, not even with special options. Also, it feels unfair to me seeing C compilers "throw away" an ISA feature like unaligned access, available almost for free on modern CPUs. I would understand for C++, which targets more high-level code, but in C, if I do unaligned access, it usually means "I know what I'm doing". I'm not very happy with that, in particular because they refuse to even support an ad-hoc option. It's like I have no other choice other can using memcpy or memmove or writing assembly, when C code could have been just fine. (And yeah, I know what the C standard says, but I'm not happy with all of its limitations.)

Korona wrote:

The Godbolt snippet that I posted above optimizes based on alignment even if -fno-strict-aliasing is given (it's actually extracted from a blog post by the guy who reported your linked bug).

Wow, that's surprising! I was convinced that -fno-strict-aliasing allowed that code to work in any case, but I was wrong. That's another STRONG point in favor of Octocontrabass' theory.

I played a little with the example on Compiler Explorer you linked previously and I added another function, g():

Code:

int h(int *p, int *q){
  *p = 1;
  *q = 1;
  return *p;
}

int g(int *p) {
    int *p2 = (int *)(((char *)p)+1);
    *p = 1;
    *p2 = 1;
    return *p;
}

See on compiler explorer: https://gcc.godbolt.org/z/hY3aT3

I was shocked by what I've discovered: not only h() always returns 1, no matter -fno-strict-aliasing, but so does g()! And there's a big difference: in g() the compiler knows that p and p2 alias, while in h() it doesn't know and just assume they don't. In both the cases, the functions return always 1. Weird fact: we if change "*p2=1" to "*p2=2", g() returns 513 (as constant value), because the compiler figured out the aliasing. That's super-weird!!

Another weird fact. That optimization doesn't happen with even relatively recent GCC compilers like 7.x (version 7.5 was release on November 14, 2019). The first GCC compiler (among the ones in compiler explorer) that behaves this way is GCC 8.1.

Korona wrote:

@vvaltchev btw, manually breaking your lines makes your text quite hard to read if line length do not sync up with window width.

I'm sorry for that: sometimes the text looks so much more readable in my case (full-screen window) when I break the lines, that I forget that in the general case (smaller windows) the whole thing will become a mess. I'll try to avoid that in future posts.

Thanks guys for pointing out all of this!

EDIT: All of that does not happen with clang instead. Even the most recent version (clang 11.0.1) does not make such bold assumptions. Both g() and h() return the actual value at *p.

linguofreak · **Joined:** Wed Mar 09, 2011 3:55 am **Posts:** 509

bzt wrote:

Octocontrabass wrote:

And the compiler is free to silently generate code that does not work.

Interesting, but how does it generate non-working code if it forbids aliasing in the first place? I thought you were saying that. (And a compiler silently generating non-working code is a buggy compiler, period.)

I think Octocontrabass misspoke a bit here, because by definition, anywhere the C standard says "undefined behavior", the standard is explicitly refusing to say what it means for code using that construct to "work". So a compiler can't generate "working" or "non-working" code for UB, because neither "working" nor "non-working" code exist for UB. The only thing you can measure as far as how well the code "works" for UB is how well the actual behavior matches the behavior the programmer expected, and compilers are free to make changes that break any given programmer's mental model of how the compiler functions as long as those changes do not result in code being generated for defined behavior that does something other than what the standard defines.

Another way of saying it is that whenever the C standard says "undefined behavior", it is explicitly refusing to say what it means for a compiler to be "buggy" when handling that construct.

OSDev.org

Framebuffer: Draw a PSF character

Who is online