OSDev.org

The Place to Start for Operating System Developers
It is currently Thu Mar 28, 2024 12:08 pm

All times are UTC - 6 hours




Post new topic Reply to topic  [ 38 posts ]  Go to page Previous  1, 2, 3  Next
Author Message
 Post subject: Re: Optimization
PostPosted: Wed Oct 19, 2022 1:02 pm 
Offline
Member
Member

Joined: Fri Feb 11, 2022 4:55 am
Posts: 435
Location: behind the keyboard
I am using linked lists for heaps and threads.


Top
 Profile  
 
 Post subject: Re: Optimization
PostPosted: Wed Oct 19, 2022 1:39 pm 
Offline
Member
Member

Joined: Mon Mar 25, 2013 7:01 pm
Posts: 5099
Why do you need to search for allocated items in your heap?


Top
 Profile  
 
 Post subject: Re: Optimization
PostPosted: Wed Oct 19, 2022 2:22 pm 
Offline
Member
Member

Joined: Fri Feb 11, 2022 4:55 am
Posts: 435
Location: behind the keyboard
You need to maintain a list of heap entries so that the free function will only free a particular area of memory. And the allocate function will go search free memory in these heap entries ?


Top
 Profile  
 
 Post subject: Re: Optimization
PostPosted: Wed Oct 19, 2022 2:36 pm 
Offline
Member
Member

Joined: Mon Mar 25, 2013 7:01 pm
Posts: 5099
Why does the free function need to search the bitmap?


Top
 Profile  
 
 Post subject: Re: Optimization
PostPosted: Wed Oct 19, 2022 3:37 pm 
Offline
Member
Member

Joined: Tue Feb 18, 2020 3:29 pm
Posts: 1071
devc1 wrote:
I may create multiple bitmaps to fit the list in a page where I can benefit from the tlb cache, and I make sure that the list is page aligned.

What does the TLB have to do with this?

_________________
"How did you do this?"
"It's very simple — you read the protocol and write the code." - Bill Joy
Projects: NexNix | libnex | nnpkg


Top
 Profile  
 
 Post subject: Re: Optimization
PostPosted: Wed Oct 19, 2022 10:11 pm 
Offline
Member
Member

Joined: Wed Aug 30, 2017 8:24 am
Posts: 1593
devc1 wrote:
I guess the quicksort algorithm will be usefull when creating linked lists
Let me stop you there. No. Quicksort is horrible for lists. Mergesort is better suited for lists. However...
devc1 wrote:
with an ID for each item, so you can use this algorithm to get the Item from the ID faster ?
What you have described there in a roundabout way is a map. You want a structure in which it is easy to look up an item by its ID. A sorted list will not help you here, as you still need to iterate over the preceding elements to find out where each successor is. A better solution would be a hash map, if you know the maximum size the map can ever be, or else you don't need to resize it too often (that is the slowest operation in a hash map). Otherwise, a tree map can be used (i.e. a self-balancing binary tree data structure, like a red-black tree, or an Anderson tree).

Another difference is that certain types of hash maps are very cache friendly, whereas all self-referential data structures risk a cache miss on every dereference. And cache misses these days are about on par with NVMe accesses.

These are all Computer Science fundamentals. I suggest you learn a lot more about these topics before moving on. Just search for "data structures", and you should be inundated with stuff to read. This is fairly important, because most work in a kernel depends on good data structures. Indeed choosing the right data structure for the job can be the most important optimization you can make. And that does mean you actually need to learn about complexity theory while you are studying data structures.

_________________
Carpe diem!


Top
 Profile  
 
 Post subject: Re: Optimization
PostPosted: Thu Oct 20, 2022 5:45 am 
Offline
Member
Member

Joined: Fri Feb 11, 2022 4:55 am
Posts: 435
Location: behind the keyboard
Quote:
What does the TLB have to do with this?


If you read unaligned data (in page boundary) for example a quadword at 0x1fff, the cpu will need to read 2 times from the tlb or page tables, I tested that and found a 4 time slower performance when accessing page unaligned data.

Quote:
Why does the free function need to search the bitmap?


In order to allocate a heap entry structure in a linked list, instead of iterrating over each heap to check if its present (which accesses memory, which makes the function slower), I can just make a bitmap that represents each present item, and use "bsf" BitScanForward to scan for a free slot. It made my linked list implementation more than 20 times faster.

A list will be something like this :
Code:
typedef struct {
UINT64 Bitmap;
Entry entries[64];
list* Next;
} list;


Top
 Profile  
 
 Post subject: Re: Optimization
PostPosted: Thu Oct 20, 2022 8:01 am 
Offline
Member
Member

Joined: Tue Feb 18, 2020 3:29 pm
Posts: 1071
devc1 wrote:
If you read unaligned data (in page boundary) for example a quadword at 0x1fff, the cpu will need to read 2 times from the tlb or page tables, I tested that and found a 4 time slower performance when accessing page unaligned data.

OK, but I still don't know what that has to with anything. When allocating a page for the bitmaps, the pages have to be aligned. There's no other option.

As for general optimization, making good use of the CPU's caches is very important in micro-optimization.

And no, I'm not talking about the registers, I'm talking about a block of SRAM inside the CPU that is much faster to access than the system's DRAM. The CPU tries to keep the most useful information inside the cache. But programs / OSes can help the CPU a lot with this.

_________________
"How did you do this?"
"It's very simple — you read the protocol and write the code." - Bill Joy
Projects: NexNix | libnex | nnpkg


Top
 Profile  
 
 Post subject: Re: Optimization
PostPosted: Thu Oct 20, 2022 8:58 am 
Offline
Member
Member

Joined: Fri Feb 11, 2022 4:55 am
Posts: 435
Location: behind the keyboard
The L1 Cache is almost as fast as 2 register accesses, each cache line in the L1 and L2 is 64 bytes in size.
Unfortunatly, accessing linked lists would not benefit from any cache and I think i don't need to explain this.

Will the prefetch instruction help with something ?


Top
 Profile  
 
 Post subject: Re: Optimization
PostPosted: Thu Oct 20, 2022 10:10 am 
Offline
Member
Member

Joined: Tue Feb 18, 2020 3:29 pm
Posts: 1071
devc1 wrote:
Unfortunatly, accessing linked lists would not benefit from any cache and I think i don't need to explain this.

Exactly why you shouldn't be using a linked list.
devc1 wrote:
The L1 Cache is almost as fast as 2 register accesses, each cache line in the L1 and L2 is 64 bytes in size.

What are you trying to say by this?

_________________
"How did you do this?"
"It's very simple — you read the protocol and write the code." - Bill Joy
Projects: NexNix | libnex | nnpkg


Top
 Profile  
 
 Post subject: Re: Optimization
PostPosted: Thu Oct 20, 2022 8:42 pm 
Offline
Member
Member

Joined: Fri Feb 11, 2022 4:55 am
Posts: 435
Location: behind the keyboard
So what I use instead of a linked list ?

A binary tree ?


Top
 Profile  
 
 Post subject: Re: Optimization
PostPosted: Fri Oct 21, 2022 3:55 am 
Offline
Member
Member
User avatar

Joined: Thu Nov 16, 2006 12:01 pm
Posts: 7612
Location: Germany
Just as a note, AmigaOS Exec went the exact other way -- "everything" was a linked list. Kernel objects were basically all derived from "list node". Tasks, devices, memory hunks. All handled by the identical list handling code, which the kernel exposed to users as well so they could use that for their own data. As opposed to more efficient data structures, this held the code handling those data structures in the CPU cache, because it was the same code for "everything".

_________________
Every good solution is obvious once you've found it.


Top
 Profile  
 
 Post subject: Re: Optimization
PostPosted: Fri Oct 21, 2022 3:01 pm 
Offline
Member
Member
User avatar

Joined: Sat Mar 31, 2012 3:07 am
Posts: 4591
Location: Chichester, UK
I didn’t think the 68000 had any cache memory. Did the Amiga provide an external cache?


Top
 Profile  
 
 Post subject: Re: Optimization
PostPosted: Fri Oct 21, 2022 3:48 pm 
Offline
Member
Member

Joined: Tue Apr 03, 2018 2:44 am
Posts: 401
iansjack wrote:
I didn’t think the 68000 had any cache memory. Did the Amiga provide an external cache?


68020 had an instruction cache. I think that was available with the Amiga 1200.


Top
 Profile  
 
 Post subject: Re: Optimization
PostPosted: Fri Oct 21, 2022 11:31 pm 
Offline
Member
Member
User avatar

Joined: Sat Mar 31, 2012 3:07 am
Posts: 4591
Location: Chichester, UK
The 68EC020 in the Amiga 1200 did have a 256 byte instruction cache that wasn’t present in the original 68000 used in earlier Amigas.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 38 posts ]  Go to page Previous  1, 2, 3  Next

All times are UTC - 6 hours


Who is online

Users browsing this forum: No registered users and 85 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group