OSDev.org
https://forum.osdev.org/

Problems with memset() implementation on GCC 10.2.0
https://forum.osdev.org/viewtopic.php?f=13&t=38672
Page 1 of 1

Author:  kzinti [ Tue Dec 15, 2020 7:27 pm ]
Post subject:  Problems with memset() implementation on GCC 10.2.0

I just upgraded my cross compiler to GCC 10.2.0 and my OS crashes early on memset().

I am sure I am doing something wrong and GCC 10.2.0 compiles it into something unexpected:

Code:
void* memset(void* ptr, int value, size_t num)
{
    for (unsigned char* p = ptr; num; --num)
    {
        *p++ = (unsigned char)value;
    }

    return ptr;
}

Code:
ffffffff80006360 <memset>:
ffffffff80006360:   48 85 d2                test   %rdx,%rdx
ffffffff80006363:   74 13                   je     ffffffff80006378 <memset+0x18>
ffffffff80006365:   55                      push   %rbp
ffffffff80006366:   40 0f b6 f6             movzbl %sil,%esi
ffffffff8000636a:   48 89 e5                mov    %rsp,%rbp
ffffffff8000636d:   e8 ee ff ff ff          callq  ffffffff80006360 <memset>
ffffffff80006372:   5d                      pop    %rbp
ffffffff80006373:   c3                      retq   
ffffffff80006374:   0f 1f 40 00             nopl   0x0(%rax)
ffffffff80006378:   48 89 f8                mov    %rdi,%rax
ffffffff8000637b:   c3                      retq   
ffffffff8000637c:   0f 1f 40 00             nopl   0x0(%rax)

What happens is I call memset with a non-zero length (in %rdx)... so the code above ends up calling memset() recursively at address ffffffff8000636d until I run out of stack space.

Please help if you can. I refuse to believe the problem is with GCC, I must be missing something.

Author:  nexos [ Tue Dec 15, 2020 7:30 pm ]
Post subject:  Re: Problems with memset() implementation on GCC 10.2.0

It might be better just to use __builtin_memset IMO.

Author:  kzinti [ Tue Dec 15, 2020 7:33 pm ]
Post subject:  Re: Problems with memset() implementation on GCC 10.2.0

Agreed. I would still like to understand why it is broken though.

Author:  kzinti [ Tue Dec 15, 2020 7:35 pm ]
Post subject:  Re: Problems with memset() implementation on GCC 10.2.0

Well what do you know, I am not the first to run into this:

https://github.com/micropython/micropython/issues/6053

It looks like GCC detects that the loop is memset and optimizes the loop by calling... memset. Good times.

Author:  kzinti [ Tue Dec 15, 2020 7:53 pm ]
Post subject:  Re: Problems with memset() implementation on GCC 10.2.0

Adding "-fno-builtin" when compiling the kernel fixes the issue, but clearly not what I want.

Author:  Octocontrabass [ Tue Dec 15, 2020 8:16 pm ]
Post subject:  Re: Problems with memset() implementation on GCC 10.2.0

GCC assumes it can emit calls to memcpy(), memmove(), memset(), and memcmp() at any point - including inside your attempt at implementing one of those four functions. As the optimizer gets smarter, it will get better at creating endless recursion loops.

Various GCC bug reports suggest the following function attribute:
Code:
__attribute__((optimize("no-tree-loop-distribute-patterns")))


You can also disable this optimization at a global level, although that seems like a poor choice.

You can also implement those four functions in assembly, to be sure GCC can never create an endless recursion loop.

You can also use Clang, which seems to automatically avoid infinite recursion and/or emitting C library calls in freestanding mode.

nexos wrote:
It might be better just to use __builtin_memset IMO.

No, __builtin_memset() is only an optimization hint. The optimizer may still translate __builtin_memset() into a memset() call, and then you'll have a link error due to the undefined function.

Author:  kzinti [ Wed Dec 16, 2020 12:56 am ]
Post subject:  Re: Problems with memset() implementation on GCC 10.2.0

Thanks, I went with the following at the top of my file:

Code:
#pragma GCC optimize "no-tree-loop-distribute-patterns"

Author:  moonchild [ Fri Dec 18, 2020 5:31 pm ]
Post subject:  Re: Problems with memset() implementation on GCC 10.2.0

Can also implement strings functions in assembly; this also gives you a pretty easy perf boost, at least on x86. Here are a couple:

Code:
memcpy:
mov rcx, rdx
mov rax, rdi
rep movs byte ptr [rdi], byte ptr [rsi]
ret

memmove:
cmp rdi, rsi
ja memcpy
mov rax, rdi
mov rcx, rdx
lea rdi, [rdi + rdx - 1]
lea rsi, [rsi + rdx - 1]
std
rep movs byte ptr [rdi], byte ptr [rsi]
cld
ret

memset:
mov rcx, rdx
mov rdx, rdi
mov al, sil
rep stos byte ptr [rdi]
mov rax, rdx
ret

Page 1 of 1 All times are UTC - 6 hours
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group
http://www.phpbb.com/