OSDev.org

The Place to Start for Operating System Developers
It is currently Thu Mar 28, 2024 7:16 am

All times are UTC - 6 hours




Post new topic Reply to topic  [ 14 posts ] 
Author Message
 Post subject: Having a hard time with __atomic_xxx intrinsics on aarch64
PostPosted: Sat Dec 17, 2022 8:18 pm 
Offline
Member
Member

Joined: Mon Feb 02, 2015 7:11 pm
Posts: 898
I have this very simple implementation of std::shared_ptr<> for my kernel. It's been working just fine in single-thread mode, now I want to support SMP. The way to support this is to basically use atomic operations on the reference counts.

OK, so I replaced the "++refCount" and "--refCount" with "__atomic_add_fetch()" and "__atomic_sub_fetch()". It still works just fine on qemu (x86_64 and aarch64), but on the Raspberry Pi 3 (aarch64), it doesn't work anymore. I have tried using memory barriers (dmb and dsb), compiler barrier, etc. Nothing seems to help. Not that it should, I am still running with a single CPU. The generated code looks fine. If I try to use these two intrinsics and print the value of variables before and after, everything looks fine. Using __sync_add_fetch() doesn't work either. Code is running in EL1 at virtual address 0xFFFFFFFF8000xxxx, I do wonder if that can be an issue here. The problem exists with clang-12 and also clang-15 (I just upgraded to it hoping it would solve the issue, it didn't).

I've spend a few days on this and I am out of idea.

You can see the code here:
https://github.com/kiznit/rainbow-os/bl ... tr.hpp#L57

If I comment line 59 and use the intrinsic on line 60 instead, it doesn't work anymore. I have a hard time getting any info out of it because I am using shared_ptr<> everywhere in my logging code :(

_________________
https://github.com/kiznit/rainbow-os


Top
 Profile  
 
 Post subject: Re: Having a hard time with __atomic_xxx intrinsics on aarch
PostPosted: Sat Dec 17, 2022 9:54 pm 
Offline
Member
Member

Joined: Mon Mar 25, 2013 7:01 pm
Posts: 5099
kzinti wrote:
The way to support this is to basically use atomic operations on the reference counts.

This is very easy to do: declare the reference count as an atomic type, e.g. using "std::atomic<long>" instead of "long". All operations on an atomic variable will be atomic unless you explicitly relax the memory order.

kzinti wrote:
OK, so I replaced the "++refCount" and "--refCount"

You don't need to change how you access an atomic variable unless you want to relax the memory order.

kzinti wrote:
with "__atomic_add_fetch()" and "__atomic_sub_fetch()".

Generally, you should use std::atomic instead of the __atomic builtins.

kzinti wrote:
Using __sync_add_fetch() doesn't work either.

The __sync builtins are deprecated.

kzinti wrote:
Code is running in EL1 at virtual address 0xFFFFFFFF8000xxxx, I do wonder if that can be an issue here.

As long as your memory types are correct, I don't see how this could be the problem.

kzinti wrote:
The problem exists with clang-12 and also clang-15

How did you convince Clang to not call GCC?

You may need to pass "-mno-outline-atomics" to ensure lock-free atomic operations are inline.


Top
 Profile  
 
 Post subject: Re: Having a hard time with __atomic_xxx intrinsics on aarch
PostPosted: Sat Dec 17, 2022 10:55 pm 
Offline
Member
Member

Joined: Mon Feb 02, 2015 7:11 pm
Posts: 898
Octocontrabass wrote:
This is very easy to do: declare the reference count as an atomic type, e.g. using "std::atomic<long>" instead of "long". All operations on an atomic variable will be atomic unless you explicitly relax the memory order.

There is no such thing as std::atomic<> in bare metal. I have my own implementation which is what I am trying to debug here.

Octocontrabass wrote:
You don't need to change how you access an atomic variable unless you want to relax the memory order.

I know all about memory orders. I am just trying to narrow down what the problem is without using my std::atomic<> implementation (see https://github.com/kiznit/rainbow-os/bl ... %2B/atomic).

Octocontrabass wrote:
Generally, you should use std::atomic instead of the __atomic builtins.

That's what I am trying to do, debug my implementation of std::atomic<>.

Octocontrabass wrote:
The __sync builtins are deprecated.

Yes they are. Was this not worth a try? They still compile just fine.

Octocontrabass wrote:
How did you convince Clang to not call GCC?

That doesn't seem to make any sense. I am not using GCC. Why would clang call GCC? What am I missing here?

Octocontrabass wrote:
You may need to pass "-mno-outline-atomics" to ensure lock-free atomic operations are inline.

The generated code is inlining the atomics. I don't see how that flag would do anything here. Using "-moutline-atomics" does produce link errors as expected (again, bare metal). I suppose I could implement the missing functions here and see if it helps with anything...

I appreciate you are trying to help, maybe I should have been more clear that this is kernel code (i.e. bare metal) and that I don't have an implementation of std::atomic<> provided by the compiler.

_________________
https://github.com/kiznit/rainbow-os


Last edited by kzinti on Sat Dec 17, 2022 11:05 pm, edited 1 time in total.

Top
 Profile  
 
 Post subject: Re: Having a hard time with __atomic_xxx intrinsics on aarch
PostPosted: Sat Dec 17, 2022 11:04 pm 
Offline
Member
Member

Joined: Mon Mar 25, 2013 7:01 pm
Posts: 5099
kzinti wrote:
There is no such thing as std::atomic<> in bare metal.

It's part of the freestanding <atomic> header. It's always available.


Top
 Profile  
 
 Post subject: Re: Having a hard time with __atomic_xxx intrinsics on aarch
PostPosted: Sat Dec 17, 2022 11:13 pm 
Offline
Member
Member

Joined: Mon Feb 02, 2015 7:11 pm
Posts: 898
Octocontrabass wrote:
It's part of the freestanding <atomic> header. It's always available.

Very interesting... I didn't know that. I found confirmation here: https://en.cppreference.com/w/cpp/freestanding.

I do see <stdatomic.h> and all the other C freestanding headers, but I don't see any of the C++ freestanding header.

This is how I configure clang: https://github.com/kiznit/rainbow-os/bl ... xt#L43-L48. I must be missing something there to enable C++ headers.

Thanks a lot!

edit: I can't find any file named "atomic" in the LLVM archive. Mmm.

_________________
https://github.com/kiznit/rainbow-os


Top
 Profile  
 
 Post subject: Re: Having a hard time with __atomic_xxx intrinsics on aarch
PostPosted: Sat Dec 17, 2022 11:22 pm 
Offline
Member
Member

Joined: Mon Feb 02, 2015 7:11 pm
Posts: 898
From https://en.cppreference.com/w/cpp/freestanding:

Quote:
Compiler vendors might not correctly support freestanding implementation, for either implementation issues, or just ignoring freestanding as a whole (LLVM libcxx and msvc stl).

So it doesn't look like these headers are available with clang... But perhaps I can implement it on top of <stdatomic.h>.

edit: using C's atomic_long and atomic_fetch_add() doesn't solve the issue. I want to conclude that something is wrong with my std::shared_ptr<> implementation. What is puzzling is that it works just fine without atomics.

https://github.com/kiznit/rainbow-os/bl ... tr.hpp#L61
https://github.com/kiznit/rainbow-os/bl ... tr.hpp#L81

_________________
https://github.com/kiznit/rainbow-os


Top
 Profile  
 
 Post subject: Re: Having a hard time with __atomic_xxx intrinsics on aarch
PostPosted: Sun Dec 18, 2022 12:28 am 
Offline
Member
Member

Joined: Mon Feb 02, 2015 7:11 pm
Posts: 898
I found the implementation of std::atomic<> for clang (libc++). It turns out that std::atomic<>::fetch_add() basically resolves to a __atomic_fetch_add(), which is what I was doing before switching to C's version of atomics.

Something else is going on here...

One thing I forgot to mention previously is that this code works just fine in my bootloader but not the kernel. This is was wondering if high-address space is a problem for the Cortex-A53 processor. It doesn't seem to make sense to me either, but I've come to expect the unexpected from hardware.

_________________
https://github.com/kiznit/rainbow-os


Top
 Profile  
 
 Post subject: Re: Having a hard time with __atomic_xxx intrinsics on aarch
PostPosted: Sun Dec 18, 2022 12:41 am 
Offline
Member
Member

Joined: Wed Aug 30, 2017 8:24 am
Posts: 1593
So what is actually the problem? Does the increment function not adequately increment the counter? I would try to write a test function that calls the increment function from multiple threads, then collects all the results and looks to see if the result is what you expect.

_________________
Carpe diem!


Top
 Profile  
 
 Post subject: Re: Having a hard time with __atomic_xxx intrinsics on aarch
PostPosted: Sun Dec 18, 2022 12:45 am 
Offline
Member
Member

Joined: Mon Mar 25, 2013 7:01 pm
Posts: 5099
kzinti wrote:
Quote:
Compiler vendors might not correctly support freestanding implementation, for either implementation issues, or just ignoring freestanding as a whole (LLVM libcxx and msvc stl).

So it doesn't look like these headers are available with clang...

At least one LLVM developer disagrees with that statement. I'm not sure how you're supposed to get the headers, but they should work once you have them.

kzinti wrote:
edit: using C's atomic_long and atomic_fetch_add() doesn't solve the issue.

Using atomic_fetch_add() shouldn't make any difference, operating on atomic variables is always atomic. (And if it does make a difference, your compiler is broken.)

I really can't think of what would cause atomics to fail like this, unless you've somehow messed up something fundamental like alignment or memory type.

kzinti wrote:
One thing I forgot to mention previously is that this code works just fine in my bootloader but not the kernel.

...Maybe you should double-check the alignment and the memory type...


Top
 Profile  
 
 Post subject: Re: Having a hard time with __atomic_xxx intrinsics on aarch
PostPosted: Sun Dec 18, 2022 8:31 am 
Offline
Member
Member
User avatar

Joined: Sun Apr 30, 2017 12:16 pm
Posts: 68
Location: Poland
Have you enabled the data cache (SCTLR.C=1)? With it disabled, the stxr instruction will just continuously report failure (on Cortex-A72 at least, and I'm guessing A53 too), presumably because the exclusive monitor lock is per cache line on these cores.

_________________
Working on managarm.


Top
 Profile  
 
 Post subject: Re: Having a hard time with __atomic_xxx intrinsics on aarch
PostPosted: Sun Dec 18, 2022 12:32 pm 
Offline
Member
Member

Joined: Mon Feb 02, 2015 7:11 pm
Posts: 898
clang doesn't provide the freestanding headers. That comment above basically says that the libc++ people try to make their library work in freestanding mode, but it doesn't appear that they provide a way to install it as such. I will see if I can get to build the whole thing with my cross-compiler or just copy the files I need for now. But I doubt it's a C++ problem, I think it might have to do with aarch64 memory configuration as suggested by both of you.

SCTLR_EL1 is configured as 0xc5183d in QEMU (and the Pi starts et EL2 where SCTLR_EL2 is also 0xc5183d). The breakdown is:
Code:
SCTLR_EL1: 0000000000c5183d
    SPAN   : 1 - The value of PSTATE.PAN is left unchanged on taking an exception to EL1.
    EIS    : 1 - The taking of an exception to EL1 is a context synchronizing event.
    nTWE   : 1 - This control does not cause any instructions to be trapped.
    nTWI   : 1 - This control does not cause any instructions to be trapped.
    I      : 1 - Stage 1 instruction access Cacheability control, for accesses at EL0 and EL1.
    EOS    : 1 - An exception return from EL1 is a context synchronizing event.
    CP15BEN: 1 - EL0 using AArch32: EL0 execution of the CP15DMB, CP15DSB, and CP15ISB instructions is enabled.
    SA0    : 1 - SP Alignment check enable for EL0.
    SA     : 1 - SP Alignment check enable.
    C      : 1 - Stage 1 Cacheability control, for data accesses.
    M      : 1 - EL1&0 stage 1 address translation enabled.


In the case of the Pi, I programe SCTLR_EL1 like this:
Code:
        // Enable MMU at EL1
        uint64_t sctlr_el1{};
        sctlr_el1 |= (1 << 0);  // M = 1        EL1&0 stage 1 address translation enabled.
        sctlr_el1 |= (1 << 2);  // C = 1        Enable Data Cache
        sctlr_el1 |= (1 << 3);  // SA = 1       SP Alignment check enable for EL1
        sctlr_el1 |= (1 << 4);  // SA0 = 1      SP Alignment check enable for EL0
        sctlr_el1 |= (1 << 12); // I = 1        Enable Instruction Cache
        mtl::Write_SCTLR_EL1(sctlr_el1);     // 0x101D


So I do enable both instruction (I) and data (C) cache... The rest doesn't seem related. I did try hardcoding 0xc5183d but it didn't make any difference.

_________________
https://github.com/kiznit/rainbow-os


Top
 Profile  
 
 Post subject: Re: Having a hard time with __atomic_xxx intrinsics on aarch
PostPosted: Mon Dec 19, 2022 5:41 am 
Offline
Member
Member

Joined: Sat Jul 02, 2016 7:02 am
Posts: 207
Was the qemu-aarch64 test done with kvm enabled?
If not, running your OS within a kvm-enabled VM on RPi3 Linux can probably help with the debug.

Since the problem occurs within the kernel, one can create a small stub containing the instructions that seem to trigger the problem, and call that stub at various eligible locations, starting at the kernel's entry point.


Top
 Profile  
 
 Post subject: Re: Having a hard time with __atomic_xxx intrinsics on aarch
PostPosted: Mon Dec 19, 2022 9:15 am 
Offline
Member
Member

Joined: Mon Feb 02, 2015 7:11 pm
Posts: 898
Trying this under qemu with kvm is an excellent idea. I was running qemu on my desktop x86_64, so no kvm. I probably won't have time to try this before next year, but I look forward to it.

_________________
https://github.com/kiznit/rainbow-os


Top
 Profile  
 
 Post subject: Re: Having a hard time with __atomic_xxx intrinsics on aarch
PostPosted: Tue May 09, 2023 2:57 pm 
Offline
Member
Member

Joined: Mon Feb 02, 2015 7:11 pm
Posts: 898
For posterity, I found the issue: I had a bug in how I was initializing MAIR_EL1. Basically my code is setup to use MAIR index 3 for writeback memory, but MAIR index 3 was initialized to 0x00 which is uncacheable memory. Atomic operations on aarch64 don't work on uncacheable memory.

_________________
https://github.com/kiznit/rainbow-os


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 14 posts ] 

All times are UTC - 6 hours


Who is online

Users browsing this forum: Majestic-12 [Bot], Sa41848, SemrushBot [Bot] and 63 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group