OSDev.org

The Place to Start for Operating System Developers
It is currently Thu Apr 25, 2024 4:37 am

All times are UTC - 6 hours




Post new topic Reply to topic  [ 62 posts ]  Go to page Previous  1, 2, 3, 4, 5  Next
Author Message
 Post subject: Re: Optimized memory functions?
PostPosted: Thu Mar 19, 2009 7:25 am 
Offline
Member
Member
User avatar

Joined: Fri Jun 22, 2007 12:47 pm
Posts: 1598
Location: New Hampshire, USA
interesting.
I'll hopefully have some time today to test this out using the setup you posted.

Brendan: Thanks for the info. Direct cache utilization has always been a bit of a mystery to me. The revision just previous to the one posted didn't have PREFETCHNTA or CLFLUSH at all. So would you suggest just using plain ol' MOVDQA instead of its non-temporal storing brethren? I figured the non-temporal storing would improve cache usage later as it wouldn't be filling cache lines with one-shot data.

_________________
Website: https://Joscor.com


Top
 Profile  
 
 Post subject: Re: Optimized memory functions?
PostPosted: Thu Mar 19, 2009 8:56 am 
Offline
Member
Member
User avatar

Joined: Sat Jan 15, 2005 12:00 am
Posts: 8561
Location: At his keyboard!
Hi,

01000101 wrote:
So would you suggest just using plain ol' MOVDQA instead of its non-temporal storing brethren? I figured the non-temporal storing would improve cache usage later as it wouldn't be filling cache lines with one-shot data.


I'd suggest using "movdqa 0(%0), %%xmm1" then "movntdq %%xmm0, 0(%1)" then "clflush 0(%0)" (but not "clflush 0(%1)").


Cheers,

Brendan

_________________
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.


Top
 Profile  
 
 Post subject: Re: Optimized memory functions?
PostPosted: Fri Mar 27, 2009 9:27 pm 
Offline
Member
Member
User avatar

Joined: Fri Jun 22, 2007 12:47 pm
Posts: 1598
Location: New Hampshire, USA
I'm not sure if anyone here would be interested in this, but I created a wiki article about my optimizated library with full source code and info. It would be greatly appreciated if some of you that are interested to help out and make the code better. It's probably very incomplete at the time of you reading this, but I will be working on it.

http://wiki.osdev.org/User:01000101/optlib/

_________________
Website: https://Joscor.com


Top
 Profile  
 
 Post subject: Re: Optimized memory functions?
PostPosted: Fri Aug 09, 2013 4:32 am 
Offline
Member
Member

Joined: Sun Jun 16, 2013 4:09 am
Posts: 333
Hi guys, 2013 now, a few years since the last post.

This is mine:

As I am using AMD 64 bit I am going to convert the whiles to rep stosq/stosb

Code:
il void SetMem(void* pMem, byte To, dword Size)
{
   if (Size < 128)
   {
      byte* p = (byte*)pMem;
      byte* pEnd = (byte*)pMem + Size;
         
      while (p < pEnd)
         *p++ = To;

      return;
   }

   byte* p = (byte*)pMem;
   byte* pEnd = (byte*)pMem + Size - sizeof(qword);

   qword Toqword = CharsToqword(To, To, To, To, To, To, To, To);

   // Align to qword boundary
   switch (((qword)pMem) & 0x7)
   {
      case 7:      *p++ = To;
      case 6:      *p++ = To;
      case 5:      *p++ = To;
      case 4:      *p++ = To;
      case 3:      *p++ = To;
      case 2:      *p++ = To;
      case 1:      *p++ = To;
   }

   while (p < pEnd)
   {
      *(qword*)p = Toqword;
      p += sizeof(qword);
   }

   pEnd += sizeof(qword);
   while (p < pEnd)
   {
      *(byte*)p = To;
      p += sizeof(byte);
   }
}


Last edited by tsdnz on Fri Aug 09, 2013 5:12 am, edited 1 time in total.

Top
 Profile  
 
 Post subject: Re: Optimized memory functions?
PostPosted: Fri Aug 09, 2013 4:51 am 
Offline
Member
Member
User avatar

Joined: Wed Mar 21, 2012 3:01 pm
Posts: 930
You do realize that your function doesn't work correctly with non-zero To bytes with arrays below 128 bytes?


Top
 Profile  
 
 Post subject: Re: Optimized memory functions?
PostPosted: Fri Aug 09, 2013 5:04 am 
Offline
Member
Member

Joined: Sun Jun 16, 2013 4:09 am
Posts: 333
Awesome, it will soon. I did not notice that.

About to convert it to rep stosq/stosb.

Is there any way to tell the compiler to preserve registers used with a function?
Interrupts?

Currently I have a stub in my secondary bootloader that calls qword [n*8], n being the interrupt.
This points to my kernel interrupt handler which pushes all registers.
No bigger, just like it tidy.

Thanks again


Top
 Profile  
 
 Post subject: Re: Optimized memory functions?
PostPosted: Fri Aug 09, 2013 5:09 am 
Offline
Member
Member
User avatar

Joined: Wed Mar 21, 2012 3:01 pm
Posts: 930
That question is off-topic in this thread - make a new one. Additionally, I fail to comprehend exactly what you are asking. You cannot write interrupt handlers in C, you need to write a short stub in assembly that saves all the registers, calls the interrupt handler, and then reloads all the old registers.


Top
 Profile  
 
 Post subject: Re: Optimized memory functions?
PostPosted: Fri Aug 09, 2013 5:10 am 
Offline
Member
Member

Joined: Sun Jun 16, 2013 4:09 am
Posts: 333
Ok, thanks, just what I am doing.


Top
 Profile  
 
 Post subject: Re: Optimized memory functions?
PostPosted: Sun Aug 11, 2013 2:58 am 
Offline
Member
Member

Joined: Sun Jun 16, 2013 4:09 am
Posts: 333
Hi this is the memset in rep stos, qword aligned.
I had to use my inline proc to make it inline.

Code:
il void* SetMem(void* pMem, word To, dword Count)
{
   asm volatile(
      "cld\n\t"
      "jecxz 1f\n\t"
      "rep stosb\n\t"
      "1:\n\t"
      "movl %%edx, %%ecx\n\t"
      "test %%edx, %%edx\n\t"
      "jz 2f\n\t"
      "rep stosq\n\t"
      "2:\n\t"
      :
      :"D"(pMem),"c"(Count & 0xF),"d"(Count >> 4),"a"((((qword)To) << 0) | (((qword)To) << 8) | (((qword)To) << 16) | (((qword)To) << 24) | (((qword)To) << 32) | (((qword)To) << 40) | (((qword)To) << 48) | (((qword)To) << 56))
      );
   asm("": : :"%edi","%ecx","cc");

   return pMem;
}


memset:
extern "C" void* memset(void * s, int c, size_t count)
{
return SetMem(s, c, count);
}


This is the assembly, I have put **** to show the memset code.
Code:
1000000:   41 57                   push   r15
1000002:   48 b8 41 41 41 41 41    movabs rax,0x4141414141414141     ****
1000009:   41 41 41     
100000c:   ba 00 03 00 00          mov    edx,0x300   ****
1000011:   31 c9                   xor    ecx,ecx    ****
1000013:   48 bf 00 f0 02 01 00    movabs rdi,0x102f000    ****
100001a:   00 00 00
100001d:   41 56                   push   r14
100001f:   41 55                   push   r13
1000021:   41 54                   push   r12
1000023:   55                      push   rbp
1000024:   53                      push   rbx
1000025:   48 81 ec 38 03 00 00    sub    rsp,0x338
100002c:   fc                      cld     ****     
100002d:   67 67 e3 02             addr64 jecxz 1000033 <_Z11StartKernelv+0x33>   ****
1000031:   f3 aa                   rep stos BYTE PTR es:[rdi],al    ****
1000033:   89 d1                   mov    ecx,edx   ****
1000035:   85 d2                   test   edx,edx    ****
1000037:   74 03                   je     100003c <_Z11StartKernelv+0x3c>   ****
1000039:   f3 48 ab                rep stos QWORD PTR es:[rdi],rax   ****
100003c:   50                      push   rax
100003d:   51                      push   rcx


Top
 Profile  
 
 Post subject: Re: Optimized memory functions?
PostPosted: Sun Aug 11, 2013 5:17 am 
Offline
Member
Member

Joined: Thu Jul 05, 2012 5:12 am
Posts: 923
Location: Finland
Just a general note: I do not understand why to write code like this:

Code:
il void* SetMem(void* pMem, word To, dword Count)
{
   asm volatile(
      "cld\n\t"
      "jecxz 1f\n\t"
      "rep stosb\n\t"
      "1:\n\t"
      "movl %%edx, %%ecx\n\t"
      "test %%edx, %%edx\n\t"
      "jz 2f\n\t"
      "rep stosq\n\t"
      "2:\n\t"
      :
      :"D"(pMem),"c"(Count & 0xF),"d"(Count >> 4),"a"((((qword)To) << 0) | (((qword)To) << 8) | (((qword)To) << 16) | (((qword)To) << 24) | (((qword)To) << 32) | (((qword)To) << 40) | (((qword)To) << 48) | (((qword)To) << 56))
      );
   asm("": : :"%edi","%ecx","cc");

   return pMem;
}


Why not having pure assembly routines for functions like these? As a matter of fact, I have not used inline assembly at all. I think it is much more elegant to have "high-level code" and "low-level code" clearly separated. However, that is just my opinion.

_________________
Undefined behavior since 2012


Top
 Profile  
 
 Post subject: Re: Optimized memory functions?
PostPosted: Sun Aug 11, 2013 8:20 am 
Offline
Member
Member

Joined: Tue Nov 08, 2011 11:35 am
Posts: 453
Antti wrote:
Just a general note: I do not understand why to write code like this:
Code:
...


Why not having pure assembly routines for functions like these? As a matter of fact, I have not used inline assembly at all. I think it is much more elegant to have "high-level code" and "low-level code" clearly separated. However, that is just my opinion.


1) inline assembly helps compiler to inline code (less overhead because of such things as saving registers, doing call, doing return, doing restore of registers)
2) it's possible that compiler can better optimize blocks of code aroung this inline asm.

My note would be the following: why would anyone bother now with making hand-crafted memcpy/memset when compiler is able to emit better code (for example, replace memcpy-loop with some MOVes for short lengths, use special instructions for long && aligned blocks) ?


Top
 Profile  
 
 Post subject: Re: Optimized memory functions?
PostPosted: Sun Aug 11, 2013 1:59 pm 
Offline
Member
Member

Joined: Sun Jun 16, 2013 4:09 am
Posts: 333
Hi guys, I have to hand craft it.

There is a bug in gcc 4.8.1 (my version) where memset calls memset.
Resulting in an infinite loop.

http://www.marshut.com/qnktz/infinite-recursion-due-to-builtin-pattern-detection.html
http://forum.osdev.org/viewtopic.php?f=1&t=27016

Just found out that "-fno-tree-loop-distribute-patterns" removes the recursive loop.
I should have read the bug report fully.
And code is much better as below

Code:
extern "C" void* memset(void * s, int c, size_t count)
{
   byte* b = (byte*)s;

   while (count-- > 0)
      *b++ = c;

   return s;
}


Which results is 16 byte moves:
Code:
1000000:   48 ba 30 78 02 01 00    movabs rdx,0x1027830
1000007:   00 00 00
100000a:   48 b8 00 f0 02 01 00    movabs rax,0x102f000
1000011:   00 00 00
1000014:   48 b9 00 20 03 01 00    movabs rcx,0x1032000
100001b:   00 00 00
100001e:   66 0f 6f 02             movdqa xmm0,XMMWORD PTR [rdx]
1000022:   66 0f 7f 00             movdqa XMMWORD PTR [rax],xmm0
1000026:   48 83 c0 10             add    rax,0x10
100002a:   48 39 c8                cmp    rax,rcx
100002d:   75 f3                   jne    1000022 <_Z11StartKernelv+0x22>


Top
 Profile  
 
 Post subject: Re: Optimized memory functions?
PostPosted: Sun Aug 11, 2013 9:25 pm 
Offline
Member
Member

Joined: Sun Jan 14, 2007 9:15 pm
Posts: 2566
Location: Sydney, Australia (I come from a land down under!)
The pattern matching turning memset into a call for memset is hardly a bug. It is an optimisation that is quite sane for most cases, and if in the freestanding environment there is an option to pass when compiling your memset.c to disable it.

Also, using xmmN registers is nice, but should only ever be used in the kernel if you fully understand what you are required to do before using them (eg, save floating point state, make sure everything is ready for the instructions, make sure the running CPU supports the instruction, etc...) and are prepared to accept the extra overhead that comes as a result. For this reason, it's usually wise to make sure you compile your kernel code with a set of flags that disallows the compiler from emitting any SSE/MMX/3DNow/etc instructions, and writing code to use them yourself when (... if) the situation warrants.

I haven't outright said "none of that in the kernel" because your system could be a single-tasking system, it could be devoid of a userspace, and so on. As always your mileage may vary and there's no "one size fits all" solution.

_________________
Pedigree | GitHub | Twitter | LinkedIn


Top
 Profile  
 
 Post subject: Re: Optimized memory functions?
PostPosted: Sun Aug 11, 2013 10:32 pm 
Offline
Member
Member

Joined: Sun Jun 16, 2013 4:09 am
Posts: 333
Ok, good points.

My project is single task, no user space, cpu is set up for SSE/MMX


Top
 Profile  
 
 Post subject: Re: Optimized memory functions?
PostPosted: Mon Aug 12, 2013 2:18 am 
Offline
Member
Member
User avatar

Joined: Thu Jul 12, 2012 7:29 am
Posts: 723
Location: Tallinn, Estonia
Why not let gcc generate you a memset? Why do you have to write it yourself?

_________________
Learn to read.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 62 posts ]  Go to page Previous  1, 2, 3, 4, 5  Next

All times are UTC - 6 hours


Who is online

Users browsing this forum: No registered users and 114 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group