bzt wrote:
Because it's a code example on a newcomer's tutorial page, not a fully blown, ready to be compiled kernel source tree.
Shouldn't example code prioritize clarity over speed? Especially if it's meant for newcomers!
bzt wrote:
I haven't seen you complaining about Babystep tutorial not being portable for example.
Drawing pixels on a framebuffer works exactly the same on every CPU.
bzt wrote:
And is a properly working compiler allowed to emit any of those instructions for this C code?
Yes. It probably won't, because it would be a very strange choice, but it is allowed to.
bzt wrote:
Actually I had a lot of trouble to make gcc emit MOVAPS even when I explicitly wanted it in my optimized SLERP code. I ended up using intrinsics instead because no, no optimizer smart enough to figure out using packed scalars on its own.
Clang's optimizer is smart enough to figure it out. Keep in mind two pointers to the same type might alias one another, so stores to one of the pointers might affect later loads from the other. That can prevent the optimizer from reordering and parallelizing the code. Try using the restrict keyword.
bzt wrote:
C89 specification won't change in the future that's for sure and it's very well known how a compiler should compile it, with all the side effects and quircks.
Unaligned pointers and aliasing of incompatible types are undefined behavior in C89 too, those aren't new. What has changed since 1989 is that now, compilers are smart enough to optimize in ways that will break your code if you try to rely on undefined behavior.
bzt wrote:
And I must ask again, how would adding 3 bytes at the end help you when you need to read 4 bytes from the middle of bitchunk?
Reading past the end of an array is undefined behavior. If you add 3 bytes to the end of the PSF array, then there is no problem if you do this:
Code:
uint32_t mask=1<<31;
uint32_t bitmap = (glyph[0] << 24) | (glyph[1] << 16) | (glyph[2] << 8) | glyph[3];
for(x=0;x<font->width;x++){
fb[offset] = bitmap & mask ? fg : bg;
/* adjust to the next pixel */
mask >>= 1;
offset++;
}
bzt wrote:
You make absolutely no sense. How is that an answer to my question at all? Which C compiler doesn't support them? GCC definitely does.
GCC requires you to tell it when a pointer will not be correctly aligned. It does not support arbitrary misaligned pointers.
bzt wrote:
Nope, you made it pretty clear that your priority is the language, with imaginary restrictions that no actual x86 compiler was restricted to in the last 30 years (and won't be in the future).
Korona already posted an example of an existing compiler with that restriction.
vvaltchev wrote:
Code:
*(unsigned long *)(dst+res) = c;
I'm pretty sure the GCC developers would tell you that's a Linux bug. I don't know if the Linux developers care, though.
bzt wrote:
@Korona: that bug only applies to a specific case when vector instructions are enabled (via an optimization level, not choosing MOVAPS/MOVUPS correctly, not about using MOV vs. MOVxPS).
Vector instructions are not enabled or disabled by optimization settings.
bzt wrote:
The question is, when you can use it safely and it results in a more compact and effective code then why shouldn't you use it?
Because there is no way to use it safely. Even if it works with all current compilers, you can't say that it will work with all future compilers too.
bzt wrote:
Take a look at my RLE implementation, it's a perfect example for that. There's an
rle_libc.h version which is less effective but portable using memcpy, and there's
rle_opt.h version, totally dependency-free and optimized with pointer casts (I could have used one header file with ifdefs too). It is up to the developer to decide which one to use. If rle_opt.h works on the target architecture, I see no reason why to use the less effective implementation.
Take rle_opt.h, replace the pointer casts with equivalent operations on individual bytes, add the restrict keyword to pointers that shouldn't overlap, and you'll have an optimized version with no undefined behavior. There's no need for memcpy() here.
bzt wrote:
Another perfect example is my optimized SLERP. That uses pointer casts to avoid struct packing and make it compatible with any programming language and registers (GPR/SIMD).
Neither GCC nor Clang have any trouble optimizing "__m128 ta = {qa->w,qa->x,qa->y,qa->z};" into a single MOVUPS instruction. How do the pointer casts help with that?
bzt wrote:
(it uses a define to select between MOVAPS and MOVUPS, because I'm aware that the compiler can't possibly choose the correct one)
Tell the compiler the pointer will be aligned (e.g. "qa = __builtin_assume_aligned(qa,16);") and the above example will compile into MOVAPS instead of MOVUPS. It might not be worth the trouble, though: on most recent CPUs, MOVAPS is never faster than MOVUPS.
bzt wrote:
My point is, just because not all architectures support unaligned access and SIMD intrinsics is not a reason to have a less effective ANSI C version only.
But you can still make the C version better without relying on undefined behavior.