OSDev.org

The Place to Start for Operating System Developers
It is currently Thu Mar 28, 2024 8:35 am

All times are UTC - 6 hours




Post new topic Reply to topic  [ 6 posts ] 
Author Message
 Post subject: fast memcopy
PostPosted: Thu Oct 29, 2015 1:23 pm 
Offline
Member
Member

Joined: Mon Dec 16, 2013 6:50 pm
Posts: 27
Hi,

For a long time, I've been using my own version of AVXmemcpy (and now AVX2memcpy with 256bit transfers). The "homemade" memcpy seem to be popular on the forum also. Copying 128 or 256bit is faster than 64bit transfers, assuming your data is aligned.

But has anyone ever considered the cost of a context switch vs the benefits of the fast memcpy?

For example, in my OS, the AVX context is lazy-saved only when a thread uses the AVX instuctions. So if no one is using AVX, context switch are fast. But if 2 threads would start using AVXmemcpy, then they both would trigger a DeviceNotAvailable exception and the whole AVX context would need to be saved/restored. In the case of AVX2 that's a total of 480bytes of transfer X2 plus the whole overhead of the exception. So it seems to me like if those 2 threads were copying anything under 1K, it would actually be slower. Of course, if they did that 1000 times during their time-slice, it would be ok.

Anyone has an opinion on this? Surely, I'm not the first one who thought about that.


Top
 Profile  
 
 Post subject: Re: fast memcopy
PostPosted: Thu Oct 29, 2015 3:14 pm 
Offline
Member
Member
User avatar

Joined: Sun Sep 19, 2010 10:05 pm
Posts: 1074
As with most "optimizations", you can't target every possible scenario at the same time. You have to choose the best approach for the situation that you are targeting, and you have to measure the results to prove that your assumptions that you made are, indeed, accurate.

If you want your code to be faster in a highly multi-threaded environment, then you should probably use fewer "shared" resources, like registers. If you want your code to run faster in a single-threaded environment, then you should use any resources that you can find that will make your code run faster.

The best advice that I can give is to implement all of these approaches as separate functions, and switch between them as needed. I would also expose all of them to the application layer, if at all possible, since the application will be in the best position to know which "optimization" works best for its usage pattern.

Or, better yet, let the user decide which method to use, preferably per-application... With a default setting for new applications...

_________________
Project: OZone
Source: GitHub
Current Task: LIB/OBJ file support
"The more they overthink the plumbing, the easier it is to stop up the drain." - Montgomery Scott


Top
 Profile  
 
 Post subject: Re: fast memcopy
PostPosted: Thu Oct 29, 2015 6:00 pm 
Offline
Member
Member
User avatar

Joined: Sat Jan 15, 2005 12:00 am
Posts: 8561
Location: At his keyboard!
Hi,

xmm15 wrote:
Anyone has an opinion on this?


A good generic "memcpy()" is impossible. You need a "memcpy_small()" designed for tiny copies (where the start-up overhead is more significant than the actual copy so you just do a "one byte at a time" loop); a "memcpy_medium()" designed for medium sized copies that just uses (e.g.) "rep movsb"; plus a "memcpy_large()" just in case.

The best possible code for "memcpy_large()" is something like this:

Code:
void *memcpy_large(void *dest, const void *src, size_t n) {
    if(n > 65536) {
        fprintf(stderr, "ERROR: Some noob failed to avoid a huge memory copy!\n");
        exit(EXIT_FAILURE);
    }
    return memcpy_medium(dest, src, n);
}


Basically; if you think SSE or AVX is going to help then you're solving the wrong problem.


Cheers,

Brendan

_________________
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.


Top
 Profile  
 
 Post subject: Re: fast memcopy
PostPosted: Thu Oct 29, 2015 6:02 pm 
Offline
Member
Member

Joined: Mon Feb 02, 2015 7:11 pm
Posts: 898
=)

_________________
https://github.com/kiznit/rainbow-os


Top
 Profile  
 
 Post subject: Re: fast memcopy
PostPosted: Thu Oct 29, 2015 10:16 pm 
Offline
Member
Member

Joined: Sun Jun 16, 2013 4:09 am
Posts: 333
LOL, gotta love this response.

Brendan wrote:
Hi,

xmm15 wrote:
Anyone has an opinion on this?


A good generic "memcpy()" is impossible. You need a "memcpy_small()" designed for tiny copies (where the start-up overhead is more significant than the actual copy so you just do a "one byte at a time" loop); a "memcpy_medium()" designed for medium sized copies that just uses (e.g.) "rep movsb"; plus a "memcpy_large()" just in case.

The best possible code for "memcpy_large()" is something like this:

Code:
void *memcpy_large(void *dest, const void *src, size_t n) {
    if(n > 65536) {
        fprintf(stderr, "ERROR: Some noob failed to avoid a huge memory copy!\n");
        exit(EXIT_FAILURE);
    }
    return memcpy_medium(dest, src, n);
}


Basically; if you think SSE or AVX is going to help then you're solving the wrong problem.


Cheers,

Brendan


Top
 Profile  
 
 Post subject: Re: fast memcopy
PostPosted: Sat Oct 31, 2015 1:20 am 
Offline
Member
Member
User avatar

Joined: Fri Oct 27, 2006 9:42 am
Posts: 1925
Location: Athens, GA, USA
Brendan wrote:
The best possible code for "memcpy_large()" is something like this:

Code:
void *memcpy_large(void *dest, const void *src, size_t n) {
    if(n > 65536) {
        fprintf(stderr, "ERROR: Some noob failed to avoid a huge memory copy!\n");
        exit(EXIT_FAILURE);
    }
    return memcpy_medium(dest, src, n);
}


Unfortunately, with the gargantuan sizes of things like image or audio data today, this may prove problematic. Still, I would say that anything larger than a page (whatever the page size you are using, 4K for most cases) should be handed off to the memory mangler for lazy re-mapping - I'm think along the lines of mapping the two areas to a single set of read-only pages, then trapping attempted writes in such a manner that it forces copying of the altered page alone. Still, better to avoid the issue whenever possible.

_________________
Rev. First Speaker Schol-R-LEA;2 LCF ELF JAM POEE KoR KCO PPWMTF
Ordo OS Project
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 6 posts ] 

All times are UTC - 6 hours


Who is online

Users browsing this forum: Google [Bot] and 22 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group