OSDev.org

The Place to Start for Operating System Developers
It is currently Wed Apr 24, 2024 12:33 pm

All times are UTC - 6 hours




Post new topic Reply to topic  [ 4 posts ] 
Author Message
 Post subject: Alignment on x86_64
PostPosted: Mon Oct 10, 2022 10:51 am 
Offline
Member
Member

Joined: Fri Feb 11, 2022 4:55 am
Posts: 435
Location: behind the keyboard
The tests were done on : 3.60 GHz Intel Xeon (Sandy bridge where AVX was invented) E51620 0 - 1600 MHz DDR3 Memory - NUMA System.

What is the memory access granularity of an x86_64 CPU ?

My tests has shown that the granularity depends on the word size of the load/store instruction, and the page where memory was accessed has some effect.

I made a loop that tested Aligned, Unaligned write operations.
The two had actually almost the same performance with a small amount of milliseconds between them, Aligned wins.
Almost 450 ms for aligned and 470 ms for unaligned.

But lets say you want to load or store a DWORD from the address 0x1FFD where 3 bytes are in the page and 1 byte is from the other page.

The result was a massive difference compared to load/store inside a single page with 450 ms against 1900 ms.

Is this what you guys call page granularity ?
Or is this an effect of paging ?

Sometimes (at first) accessing address 0x1000 is so slow but accessing 0x1010 is so fast.


Top
 Profile  
 
 Post subject: Re: Alignment on x86_64
PostPosted: Mon Oct 10, 2022 11:00 am 
Offline
Member
Member

Joined: Wed Aug 30, 2017 8:24 am
Posts: 1604
devc1 wrote:
What is the memory access granularity of an x86_64 CPU ?
Typically 64 bytes, at least with write-back caching enabled (L1 cache line size).
devc1 wrote:
The result was a massive difference compared to load/store inside a single page with 450 ms against 1900 ms.
Yeah, the manuals warn about unaligned accesses crossing page boundaries. Inside of a cache line, the effects are barely measurable, across cache lines, there is some effect, across page boundaries, latency spikes.
devc1 wrote:
Sometimes (at first) accessing address 0x1000 is so slow but accessing 0x1010 is so fast.
Same L1 cache line. So accessing 0x1000 has already created the TLB entry and the L1 cache line, and then the access to 0x1010 hits the same cache line.

_________________
Carpe diem!


Top
 Profile  
 
 Post subject: Re: Alignment on x86_64
PostPosted: Mon Oct 10, 2022 2:41 pm 
Offline
Member
Member

Joined: Fri Feb 11, 2022 4:55 am
Posts: 435
Location: behind the keyboard
I meant that when I started the testing program, accessing aligned pages was slow. Then it magically worked.

I will take some measures from what I discovered to optimize my OS.


However, thanks.


Top
 Profile  
 
 Post subject: Re: Alignment on x86_64
PostPosted: Tue Oct 11, 2022 6:14 am 
Offline
Member
Member

Joined: Tue Feb 18, 2020 3:29 pm
Posts: 1071
That's because of the TLB and the cache. When you first access a page it's metadata gets cached in the TLB, avoiding extra memory accesses to the page tables for successive accesses on that page. Also, at first, it must access to memory to read the pages data and then it sticks it in the cache for successive accesses.

_________________
"How did you do this?"
"It's very simple — you read the protocol and write the code." - Bill Joy
Projects: NexNix | libnex | nnpkg


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 4 posts ] 

All times are UTC - 6 hours


Who is online

Users browsing this forum: Bing [Bot], Google [Bot] and 202 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group