OSDev.org

The Place to Start for Operating System Developers
It is currently Sat Apr 27, 2024 1:01 am

All times are UTC - 6 hours




Post new topic Reply to topic  [ 11 posts ] 
Author Message
 Post subject: why is my rep movsb not faster
PostPosted: Tue Mar 26, 2024 11:49 pm 
Offline
Member
Member

Joined: Fri Jun 28, 2013 1:48 am
Posts: 65
Tested on real machine, Intel core i3-5005U, 64-bit mode. CPUID shows enhanced rep movsb is supported.

Use rep movsb and rep movsq to copy from memory to framebuffer, and rep movsq about 8x faster than rep movsb. (EDIT: about 3~4 times, see the reply)

It seems rep movsb has no speed up at all. Source and target pointer are both 16-byte aligned. Is there any flag I have to enable?

_________________
Reinventing the Wheel, code: https://github.com/songziming/wheel


Last edited by songziming on Thu Mar 28, 2024 5:55 am, edited 1 time in total.

Top
 Profile  
 
 Post subject: Re: why is my rep movsb not faster
PostPosted: Wed Mar 27, 2024 5:12 pm 
Offline
Member
Member
User avatar

Joined: Fri Sep 03, 2021 5:20 pm
Posts: 96
I may be shooting in the dark here, but I recall reading something about all movs instructions being optimized the same way as movsb. If this is true (and hopefully someone with more knowledge on this will pitch in) then your results would make sense - movsq moves eight bytes at a time instead of one for movsb, i.e., it would not mean that you're not using the enhanced movsb, it would only mean that the relative difference in performance would be more or less the same, as both are (supposedly!!) optimized in equivalent ways. Please be aware that I'm not confident on the reliability of the original source.

_________________
Writing a bootloader in under 15 minutes: https://www.youtube.com/watch?v=0E0FKjvTA0M


Top
 Profile  
 
 Post subject: Re: why is my rep movsb not faster
PostPosted: Wed Mar 27, 2024 7:40 pm 
Offline
Member
Member

Joined: Mon Mar 25, 2013 7:01 pm
Posts: 5146
songziming wrote:
framebuffer

Intel wrote:
Fast-string operation is used only if the source and destination addresses both use either the WB or WC memory types.

What memory type is your framebuffer?


Top
 Profile  
 
 Post subject: Re: why is my rep movsb not faster
PostPosted: Wed Mar 27, 2024 9:42 pm 
Offline
Member
Member

Joined: Fri Jun 28, 2013 1:48 am
Posts: 65
I also tested copy data from physical address zero to a dynamically allocated temp buffer. Copy size is 8 pages (32K). And time measure is using timestamp counter.

In this test no framebuffer is involved. There's no other running task, so no preemption.

"mem copy" means copy using rep movsb, "fast copy" means rep movsq. source and destination is always page aligned.


Attachments:
File comment: real machine string copy test
str3.jpg
str3.jpg [ 102.08 KiB | Viewed 3304 times ]

_________________
Reinventing the Wheel, code: https://github.com/songziming/wheel
Top
 Profile  
 
 Post subject: Re: why is my rep movsb not faster
PostPosted: Wed Mar 27, 2024 10:53 pm 
Offline
Member
Member

Joined: Mon Mar 25, 2013 7:01 pm
Posts: 5146
Your results are very inconsistent. Does your test program repeat the tests several times, or does it only run them once?

What is your CPU's TSC frequency?

Bit 0 of IA32_MISC_ENABLE (MSR 0x1A0) enables fast string instructions. It should already be set by firmware, but you might as well check.


Top
 Profile  
 
 Post subject: Re: why is my rep movsb not faster
PostPosted: Thu Mar 28, 2024 9:20 pm 
Offline
Member
Member

Joined: Fri Jun 28, 2013 1:48 am
Posts: 65
Bit 0 of IA32_MISC_ENABLE is enabled, I've checked.

What's the correct way to get TSC frequency? What I got from cpuid.15h and cpuid.16h is zero.

The buffer is allocated on the first run, and I ran the test several times. Interrupt is also disabled before the test.


Attachments:
File comment: test result
str_test_res.jpg
str_test_res.jpg [ 111.63 KiB | Viewed 3247 times ]
File comment: frequency got from cpuid
str_cpuid.jpg
str_cpuid.jpg [ 77.09 KiB | Viewed 3247 times ]

_________________
Reinventing the Wheel, code: https://github.com/songziming/wheel
Top
 Profile  
 
 Post subject: Re: why is my rep movsb not faster
PostPosted: Thu Mar 28, 2024 10:57 pm 
Offline
Member
Member

Joined: Mon Mar 25, 2013 7:01 pm
Posts: 5146
songziming wrote:
What's the correct way to get TSC frequency?

You're using a Broadwell CPU, so the TSC frequency is calculated by taking the value in bits 8-15 of MSR_PLATFORM_INFO (0xCE) and multiplying by 100MHz.

songziming wrote:
The buffer is allocated on the first run, and I ran the test several times.

It doesn't count if you have to manually start each test run! The multiple test runs need to be completely automatic with nothing else between each run (so don't display results until all runs are complete).


Top
 Profile  
 
 Post subject: Re: why is my rep movsb not faster
PostPosted: Sat Mar 30, 2024 5:39 am 
Offline
Member
Member

Joined: Fri Jun 28, 2013 1:48 am
Posts: 65
Octocontrabass wrote:
The multiple test runs need to be completely automatic with nothing else between each run


I modified the test code, and the result seems valid. Time for movsb and movsq is close. I also tested copying data into framebuffer, their result is different.

So I think it's ok to use movsb for normal memcpy, and use special aligned copy in framebuffer driver.

btw, tsc frequency is 2GHz.


Attachments:
File comment: time for copying to framebuffer and copying within ram
mov.jpg
mov.jpg [ 124.29 KiB | Viewed 3169 times ]

_________________
Reinventing the Wheel, code: https://github.com/songziming/wheel
Top
 Profile  
 
 Post subject: Re: why is my rep movsb not faster
PostPosted: Sat Mar 30, 2024 11:12 am 
Offline
Member
Member

Joined: Mon Mar 25, 2013 7:01 pm
Posts: 5146
songziming wrote:
I also tested copying data into framebuffer, their result is different.

What memory type is your framebuffer? Fast-string instructions only work with WB and WC, but the firmware will usually set the framebuffer to UC.


Top
 Profile  
 
 Post subject: Re: why is my rep movsb not faster
PostPosted: Sat Mar 30, 2024 7:21 pm 
Offline
Member
Member

Joined: Fri Jun 28, 2013 1:48 am
Posts: 65
Octocontrabass wrote:
What memory type is your framebuffer?


MTRR shows they're uncacheable, and I didn't set PAT/PCD/PWT bit in the page entry, IA32_PAT[7:0] is 6 (writeback). So the memory type for framebuffer is UC.

_________________
Reinventing the Wheel, code: https://github.com/songziming/wheel


Top
 Profile  
 
 Post subject: Re: why is my rep movsb not faster
PostPosted: Sat Mar 30, 2024 10:00 pm 
Offline
Member
Member

Joined: Mon Mar 25, 2013 7:01 pm
Posts: 5146
That's why it's slower. Try setting up your framebuffer for WC instead of UC.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 11 posts ] 

All times are UTC - 6 hours


Who is online

Users browsing this forum: No registered users and 7 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group