Dear OSDevers,
I have another interesting optimization issue, and if that's caused by an UB, I failed to see where and what.
The code is extremely simple, it's in my RAM allocator, when I clear the page(s) before I return them to the caller. Just a note, both
i and
pages are defined as uint64_t in the local stack, and they are 8 bytes aligned.
Code:
for (i=0; i<pages; i++) {
vmm_page(tmpmap1, base+i*__PAGESIZE, PG_CORE_RWNOCACHE);
memset((void*)tmpmap1, 0, __PAGESIZE);
}
Both vmm_map() and memset() are well tested, and they work fine. Just to be complete, here's the source for the former (both virt_t and phys_t are aliases to uint64_t, I use those to make the source more readable), the latter is written in assembly (hence not influenced by optimizations):
Code:
void vmm_page(virt_t virt, phys_t phys, uint16_t access) {
virt_t *ptr;
switch(virt) {
...
case tmpmap1: ptr = tmpmap1ptr; break;
...
}
*ptr = phys | access;
vmm_invalidate(virt); /* this is __asm__ __volatile__ ("invplg (%0)" : : "r"(virt) ); on x86 */
}
Here are the test results (-ansi -O2 -Wall -Wextra -Wpedantic -fno-delete-null-pointer-checks -fno-stack-protector -mno-red-zone):
- gcc aarch64 mtune=cortex-a53 OK
- clang aarch64 mtune=cortex-a53 OK
- gcc x86_64 mtune=pentium4 OK
- clang x86_64 mtune=pentium4 OK
- gcc x86_64 mtune=core2 BROKEN
- clang x86_64 mtune=core2 OK
If I add these:
Code:
kprintf("i=%d pages=%d", i, pages); /* goes after "for(i=0;...") */
kprintf("vmm_page(%x,%x)\n", virt, phys); /* goes before "switch(virt) {" */
then what I see on the console with the gcc core2 optimized code, is this:
Code:
i=0 pages=1
vmm_page(FFFFFFFF00006000, 3C1000)
i=-429492720 pages=1
vmm_page(FFFFFFFF00006000, FFF000063C1000)
...page fault naturally...
With debugging, I can confirm that the virtual address of tmpmap1 is correct, and the page is mapped properly there, so the memset does not mess up things on the first iteration for sure.
Clearly
i changed more than it should. Now the most interesting part, if I declare
i as
Code:
volatile uint64_t i;
then everything works as expected.
So, where is the UB?
Btw, I've used gcc 7.3.0 and clang 7.0.1. Right now I'm compiling the latest gcc 8.2.0 and I will repeat the test with that too.
EDIT: I've just tried (
p is another uint64_t local variable)
Code:
p = base;
for (i=0; i<pages; i++) {
vmm_page(tmpmap1, p, PG_CORE_RWNOCACHE);
memset((void*)tmpmap1, 0, __PAGESIZE);
p += __PAGESIZE;
}
And it runs perfectly without the volatile keyword on
i even when optimized for core2.
EDIT2: I take it back. If I remove the kprintf() from the loop, the error cames back, so back to square one. With "volatile i", it works either with or without kprintf and with multiplication or with addition.
Bests,
bzt