Having a hard time with __atomic_xxx intrinsics on aarch64

Question about which tools to use, bugs, the best way to implement a function, etc should go here. Don't forget to see if your question is answered in the wiki first! When in doubt post here.
Post Reply
kzinti
Member
Member
Posts: 898
Joined: Mon Feb 02, 2015 7:11 pm

Having a hard time with __atomic_xxx intrinsics on aarch64

Post by kzinti »

I have this very simple implementation of std::shared_ptr<> for my kernel. It's been working just fine in single-thread mode, now I want to support SMP. The way to support this is to basically use atomic operations on the reference counts.

OK, so I replaced the "++refCount" and "--refCount" with "__atomic_add_fetch()" and "__atomic_sub_fetch()". It still works just fine on qemu (x86_64 and aarch64), but on the Raspberry Pi 3 (aarch64), it doesn't work anymore. I have tried using memory barriers (dmb and dsb), compiler barrier, etc. Nothing seems to help. Not that it should, I am still running with a single CPU. The generated code looks fine. If I try to use these two intrinsics and print the value of variables before and after, everything looks fine. Using __sync_add_fetch() doesn't work either. Code is running in EL1 at virtual address 0xFFFFFFFF8000xxxx, I do wonder if that can be an issue here. The problem exists with clang-12 and also clang-15 (I just upgraded to it hoping it would solve the issue, it didn't).

I've spend a few days on this and I am out of idea.

You can see the code here:
https://github.com/kiznit/rainbow-os/bl ... tr.hpp#L57

If I comment line 59 and use the intrinsic on line 60 instead, it doesn't work anymore. I have a hard time getting any info out of it because I am using shared_ptr<> everywhere in my logging code :(
Octocontrabass
Member
Member
Posts: 5218
Joined: Mon Mar 25, 2013 7:01 pm

Re: Having a hard time with __atomic_xxx intrinsics on aarch

Post by Octocontrabass »

kzinti wrote:The way to support this is to basically use atomic operations on the reference counts.
This is very easy to do: declare the reference count as an atomic type, e.g. using "std::atomic<long>" instead of "long". All operations on an atomic variable will be atomic unless you explicitly relax the memory order.
kzinti wrote:OK, so I replaced the "++refCount" and "--refCount"
You don't need to change how you access an atomic variable unless you want to relax the memory order.
kzinti wrote:with "__atomic_add_fetch()" and "__atomic_sub_fetch()".
Generally, you should use std::atomic instead of the __atomic builtins.
kzinti wrote:Using __sync_add_fetch() doesn't work either.
The __sync builtins are deprecated.
kzinti wrote:Code is running in EL1 at virtual address 0xFFFFFFFF8000xxxx, I do wonder if that can be an issue here.
As long as your memory types are correct, I don't see how this could be the problem.
kzinti wrote:The problem exists with clang-12 and also clang-15
How did you convince Clang to not call GCC?

You may need to pass "-mno-outline-atomics" to ensure lock-free atomic operations are inline.
kzinti
Member
Member
Posts: 898
Joined: Mon Feb 02, 2015 7:11 pm

Re: Having a hard time with __atomic_xxx intrinsics on aarch

Post by kzinti »

Octocontrabass wrote:This is very easy to do: declare the reference count as an atomic type, e.g. using "std::atomic<long>" instead of "long". All operations on an atomic variable will be atomic unless you explicitly relax the memory order.
There is no such thing as std::atomic<> in bare metal. I have my own implementation which is what I am trying to debug here.
Octocontrabass wrote:You don't need to change how you access an atomic variable unless you want to relax the memory order.
I know all about memory orders. I am just trying to narrow down what the problem is without using my std::atomic<> implementation (see https://github.com/kiznit/rainbow-os/bl ... %2B/atomic).
Octocontrabass wrote:Generally, you should use std::atomic instead of the __atomic builtins.
That's what I am trying to do, debug my implementation of std::atomic<>.
Octocontrabass wrote:The __sync builtins are deprecated.
Yes they are. Was this not worth a try? They still compile just fine.
Octocontrabass wrote:How did you convince Clang to not call GCC?
That doesn't seem to make any sense. I am not using GCC. Why would clang call GCC? What am I missing here?
Octocontrabass wrote:You may need to pass "-mno-outline-atomics" to ensure lock-free atomic operations are inline.
The generated code is inlining the atomics. I don't see how that flag would do anything here. Using "-moutline-atomics" does produce link errors as expected (again, bare metal). I suppose I could implement the missing functions here and see if it helps with anything...

I appreciate you are trying to help, maybe I should have been more clear that this is kernel code (i.e. bare metal) and that I don't have an implementation of std::atomic<> provided by the compiler.
Last edited by kzinti on Sat Dec 17, 2022 11:05 pm, edited 1 time in total.
Octocontrabass
Member
Member
Posts: 5218
Joined: Mon Mar 25, 2013 7:01 pm

Re: Having a hard time with __atomic_xxx intrinsics on aarch

Post by Octocontrabass »

kzinti wrote:There is no such thing as std::atomic<> in bare metal.
It's part of the freestanding <atomic> header. It's always available.
kzinti
Member
Member
Posts: 898
Joined: Mon Feb 02, 2015 7:11 pm

Re: Having a hard time with __atomic_xxx intrinsics on aarch

Post by kzinti »

Octocontrabass wrote:It's part of the freestanding <atomic> header. It's always available.
Very interesting... I didn't know that. I found confirmation here: https://en.cppreference.com/w/cpp/freestanding.

I do see <stdatomic.h> and all the other C freestanding headers, but I don't see any of the C++ freestanding header.

This is how I configure clang: https://github.com/kiznit/rainbow-os/bl ... xt#L43-L48. I must be missing something there to enable C++ headers.

Thanks a lot!

edit: I can't find any file named "atomic" in the LLVM archive. Mmm.
kzinti
Member
Member
Posts: 898
Joined: Mon Feb 02, 2015 7:11 pm

Re: Having a hard time with __atomic_xxx intrinsics on aarch

Post by kzinti »

From https://en.cppreference.com/w/cpp/freestanding:
Compiler vendors might not correctly support freestanding implementation, for either implementation issues, or just ignoring freestanding as a whole (LLVM libcxx and msvc stl).
So it doesn't look like these headers are available with clang... But perhaps I can implement it on top of <stdatomic.h>.

edit: using C's atomic_long and atomic_fetch_add() doesn't solve the issue. I want to conclude that something is wrong with my std::shared_ptr<> implementation. What is puzzling is that it works just fine without atomics.

https://github.com/kiznit/rainbow-os/bl ... tr.hpp#L61
https://github.com/kiznit/rainbow-os/bl ... tr.hpp#L81
kzinti
Member
Member
Posts: 898
Joined: Mon Feb 02, 2015 7:11 pm

Re: Having a hard time with __atomic_xxx intrinsics on aarch

Post by kzinti »

I found the implementation of std::atomic<> for clang (libc++). It turns out that std::atomic<>::fetch_add() basically resolves to a __atomic_fetch_add(), which is what I was doing before switching to C's version of atomics.

Something else is going on here...

One thing I forgot to mention previously is that this code works just fine in my bootloader but not the kernel. This is was wondering if high-address space is a problem for the Cortex-A53 processor. It doesn't seem to make sense to me either, but I've come to expect the unexpected from hardware.
nullplan
Member
Member
Posts: 1643
Joined: Wed Aug 30, 2017 8:24 am

Re: Having a hard time with __atomic_xxx intrinsics on aarch

Post by nullplan »

So what is actually the problem? Does the increment function not adequately increment the counter? I would try to write a test function that calls the increment function from multiple threads, then collects all the results and looks to see if the result is what you expect.
Carpe diem!
Octocontrabass
Member
Member
Posts: 5218
Joined: Mon Mar 25, 2013 7:01 pm

Re: Having a hard time with __atomic_xxx intrinsics on aarch

Post by Octocontrabass »

kzinti wrote:
Compiler vendors might not correctly support freestanding implementation, for either implementation issues, or just ignoring freestanding as a whole (LLVM libcxx and msvc stl).
So it doesn't look like these headers are available with clang...
At least one LLVM developer disagrees with that statement. I'm not sure how you're supposed to get the headers, but they should work once you have them.
kzinti wrote:edit: using C's atomic_long and atomic_fetch_add() doesn't solve the issue.
Using atomic_fetch_add() shouldn't make any difference, operating on atomic variables is always atomic. (And if it does make a difference, your compiler is broken.)

I really can't think of what would cause atomics to fail like this, unless you've somehow messed up something fundamental like alignment or memory type.
kzinti wrote:One thing I forgot to mention previously is that this code works just fine in my bootloader but not the kernel.
...Maybe you should double-check the alignment and the memory type...
User avatar
qookie
Member
Member
Posts: 68
Joined: Sun Apr 30, 2017 12:16 pm
Freenode IRC: qookie
Location: Poland

Re: Having a hard time with __atomic_xxx intrinsics on aarch

Post by qookie »

Have you enabled the data cache (SCTLR.C=1)? With it disabled, the stxr instruction will just continuously report failure (on Cortex-A72 at least, and I'm guessing A53 too), presumably because the exclusive monitor lock is per cache line on these cores.
Working on managarm.
kzinti
Member
Member
Posts: 898
Joined: Mon Feb 02, 2015 7:11 pm

Re: Having a hard time with __atomic_xxx intrinsics on aarch

Post by kzinti »

clang doesn't provide the freestanding headers. That comment above basically says that the libc++ people try to make their library work in freestanding mode, but it doesn't appear that they provide a way to install it as such. I will see if I can get to build the whole thing with my cross-compiler or just copy the files I need for now. But I doubt it's a C++ problem, I think it might have to do with aarch64 memory configuration as suggested by both of you.

SCTLR_EL1 is configured as 0xc5183d in QEMU (and the Pi starts et EL2 where SCTLR_EL2 is also 0xc5183d). The breakdown is:

Code: Select all

SCTLR_EL1: 0000000000c5183d
    SPAN   : 1 - The value of PSTATE.PAN is left unchanged on taking an exception to EL1.
    EIS    : 1 - The taking of an exception to EL1 is a context synchronizing event.
    nTWE   : 1 - This control does not cause any instructions to be trapped.
    nTWI   : 1 - This control does not cause any instructions to be trapped.
    I      : 1 - Stage 1 instruction access Cacheability control, for accesses at EL0 and EL1.
    EOS    : 1 - An exception return from EL1 is a context synchronizing event.
    CP15BEN: 1 - EL0 using AArch32: EL0 execution of the CP15DMB, CP15DSB, and CP15ISB instructions is enabled.
    SA0    : 1 - SP Alignment check enable for EL0.
    SA     : 1 - SP Alignment check enable.
    C      : 1 - Stage 1 Cacheability control, for data accesses.
    M      : 1 - EL1&0 stage 1 address translation enabled.
In the case of the Pi, I programe SCTLR_EL1 like this:

Code: Select all

        // Enable MMU at EL1
        uint64_t sctlr_el1{};
        sctlr_el1 |= (1 << 0);  // M = 1        EL1&0 stage 1 address translation enabled.
        sctlr_el1 |= (1 << 2);  // C = 1        Enable Data Cache
        sctlr_el1 |= (1 << 3);  // SA = 1       SP Alignment check enable for EL1
        sctlr_el1 |= (1 << 4);  // SA0 = 1      SP Alignment check enable for EL0
        sctlr_el1 |= (1 << 12); // I = 1        Enable Instruction Cache
        mtl::Write_SCTLR_EL1(sctlr_el1);     // 0x101D
So I do enable both instruction (I) and data (C) cache... The rest doesn't seem related. I did try hardcoding 0xc5183d but it didn't make any difference.
linuxyne
Member
Member
Posts: 211
Joined: Sat Jul 02, 2016 7:02 am

Re: Having a hard time with __atomic_xxx intrinsics on aarch

Post by linuxyne »

Was the qemu-aarch64 test done with kvm enabled?
If not, running your OS within a kvm-enabled VM on RPi3 Linux can probably help with the debug.

Since the problem occurs within the kernel, one can create a small stub containing the instructions that seem to trigger the problem, and call that stub at various eligible locations, starting at the kernel's entry point.
kzinti
Member
Member
Posts: 898
Joined: Mon Feb 02, 2015 7:11 pm

Re: Having a hard time with __atomic_xxx intrinsics on aarch

Post by kzinti »

Trying this under qemu with kvm is an excellent idea. I was running qemu on my desktop x86_64, so no kvm. I probably won't have time to try this before next year, but I look forward to it.
kzinti
Member
Member
Posts: 898
Joined: Mon Feb 02, 2015 7:11 pm

Re: Having a hard time with __atomic_xxx intrinsics on aarch

Post by kzinti »

For posterity, I found the issue: I had a bug in how I was initializing MAIR_EL1. Basically my code is setup to use MAIR index 3 for writeback memory, but MAIR index 3 was initialized to 0x00 which is uncacheable memory. Atomic operations on aarch64 don't work on uncacheable memory.
Post Reply