Alignment on x86_64

devc1 · **Posted:** Mon Oct 10, 2022 10:51 am

The tests were done on : 3.60 GHz Intel Xeon (Sandy bridge where AVX was invented) E51620 0 - 1600 MHz DDR3 Memory - NUMA System.

What is the memory access granularity of an x86_64 CPU ?

My tests has shown that the granularity depends on the word size of the load/store instruction, and the page where memory was accessed has some effect.

I made a loop that tested Aligned, Unaligned write operations.
The two had actually almost the same performance with a small amount of milliseconds between them, Aligned wins.
Almost 450 ms for aligned and 470 ms for unaligned.

But lets say you want to load or store a DWORD from the address 0x1FFD where 3 bytes are in the page and 1 byte is from the other page.

The result was a massive difference compared to load/store inside a single page with 450 ms against 1900 ms.

Is this what you guys call page granularity ?
Or is this an effect of paging ?

Sometimes (at first) accessing address 0x1000 is so slow but accessing 0x1010 is so fast.

nullplan · **Joined:** Wed Aug 30, 2017 8:24 am **Posts:** 1604

devc1 wrote:

What is the memory access granularity of an x86_64 CPU ?

Typically 64 bytes, at least with write-back caching enabled (L1 cache line size).

devc1 wrote:

The result was a massive difference compared to load/store inside a single page with 450 ms against 1900 ms.

Yeah, the manuals warn about unaligned accesses crossing page boundaries. Inside of a cache line, the effects are barely measurable, across cache lines, there is some effect, across page boundaries, latency spikes.

devc1 wrote:

Sometimes (at first) accessing address 0x1000 is so slow but accessing 0x1010 is so fast.

Same L1 cache line. So accessing 0x1000 has already created the TLB entry and the L1 cache line, and then the access to 0x1010 hits the same cache line.

devc1 · **Posted:** Mon Oct 10, 2022 2:41 pm

I meant that when I started the testing program, accessing aligned pages was slow. Then it magically worked.

I will take some measures from what I discovered to optimize my OS.

However, thanks.

nexos · **Joined:** Tue Feb 18, 2020 3:29 pm **Posts:** 1071

That's because of the TLB and the cache. When you first access a page it's metadata gets cached in the TLB, avoiding extra memory accesses to the page tables for successive accesses on that page. Also, at first, it must access to memory to read the pages data and then it sticks it in the cache for successive accesses.

OSDev.org

Alignment on x86_64

Who is online