OSDev.org

The Place to Start for Operating System Developers
It is currently Fri Mar 29, 2024 6:58 am

All times are UTC - 6 hours




Post new topic Reply to topic  [ 5 posts ] 
Author Message
 Post subject: Single address space system and tlb caching
PostPosted: Sun Sep 20, 2015 7:17 am 
Offline
Member
Member

Joined: Mon Jul 20, 2015 3:32 pm
Posts: 62
Hi,

I was reading with a lot of interest that thread : http://forum.osdev.org/viewtopic.php?f=15&t=26098&start=0&hilit=context+switch

I can hear that we cannot avoid TLB thrashing. But what if the OS caches the used addresses (by the process X) in the process X's structure and prefetches them when switching?

We could have a fast tlb recovery.

I am interested in 64bits systems because obviously, 32 bits with single address space is a bit.. pretentious (read: way not enough) for a modern general purpose OS.

Yes it would still require some loads but not the page walk. That in itself is a huge gain (less cache misses/cpu stalls).

Is that idea a no go for some reason?

Bye (and thanks)


Top
 Profile  
 
 Post subject: Re: Single address space system and tlb caching
PostPosted: Sun Sep 20, 2015 8:37 am 
Offline
Member
Member
User avatar

Joined: Wed Oct 18, 2006 3:45 am
Posts: 9301
Location: On the balcony, where I can actually keep 1½m distance
A TLB miss costs a few cycles, and the processor does a lot to figure out which pages they are before it actually has to wait on it. Chances are that the gains you have from prefetching is less than the amount of losses you get from figuring what to prefetch in the first place.

Quote:
I can hear that we cannot avoid TLB thrashing.
Trashing implies excessive misses, which only really happens when a dataset is too large to fit in the TLB. Actual trashing is usually an algorithmical issue and not quite relevant for task-switching.

You can however avoid TLB flushing: The regular task switching method clears out the TLB just once instead of repeatedly per process. Using small address space (segmentation+paging) or single address space (bytecode) techniques gives you the process isolation that don't actually require emptying the TLB to switch tasks.

_________________
"Certainly avoid yourself. He is a newbie and might not realize it. You'll hate his code deeply a few years down the road." - Sortie
[ My OS ] [ VDisk/SFS ]


Top
 Profile  
 
 Post subject: Re: Single address space system and tlb caching
PostPosted: Sun Sep 20, 2015 9:19 am 
Offline
Member
Member

Joined: Mon Jul 20, 2015 3:32 pm
Posts: 62
Combuster,

Ok, so the cpu is really good at predictions (maybe). So even if it does so, it still has to pagewalk because at the scheduler level, so many things can happen that the cpu cannot predict, or predict too late.

My solution is to give the final answer without the cost of page-walking. Even done at a early stage, and quite in parallel, it is still slow because of the number of memory read ports (eg: Haswell has 2 unlike many cpus. The others only have 1 -> Agner Fog's instruction tables).

So the prefetch would be interesting since the cpu will reorder anyway but at the time we start the process again, the most needed memory would already be in the process of being in the TLB (or already be there) while the rest is reloading incrementaly without waiting for a walk and interfering with the process.

The addresses array would fit in contiguous memory, so cache line friendly.


Top
 Profile  
 
 Post subject: Re: Single address space system and tlb caching
PostPosted: Sun Sep 20, 2015 9:48 am 
Offline
Member
Member
User avatar

Joined: Sat Jan 15, 2005 12:00 am
Posts: 8561
Location: At his keyboard!
Hi,

AlexHully wrote:
I can hear that we cannot avoid TLB thrashing. But what if the OS caches the used addresses (by the process X) in the process X's structure and prefetches them when switching?


Prefetch what, and how?

If you try to prefetch everything you'll just slam into bus bandwidth limits and cripple performance. If you only prefetch a small number of things that you know will definitely be needed (specifically, the pages at user-space RIP and RSP) then it might be slightly beneficial sometimes, in theory.

Also note that (assuming 80x86); for software prefetching, if you ask to prefetch something but the TLB entry isn't present then the CPU ignores the prefetch (and doesn't load the data into TLB). The only way to avoid that (and force the TLB to be loaded) is something I call "pre-loading" (e.g. a dummy read, that isn't a prefetch, but where the data loaded isn't actually used and therefore doesn't cause a pipeline stall).

The problem is that there are other factors involved (e.g. serialising instructions, limits on the number of pending fetches, etc), and those "pre-loads" will cost something (even if it's just additional instruction fetch/decode costs); and there's no guarantee that the costs will be smaller than the benefits (for "worst case", "average case", or even for "best case").

The other problem is something Intel calls "prefetch scheduling distance". Essentially you need to prefetch early enough so that the data is present before it's needed (which means doing something else for maybe 1000 cycles after prefetching/preloading but before returning to user-space), but you also need to prefetch late enough to make sure that the data isn't fetched and then evicted again before it's used.

Note that recent Intel CPUs support a feature called "address space IDs"; which (if used properly) means that TLB entries for processes aren't flushed when a task switch occurs (and makes the entire idea of single address space pointless). Also; note that there are "fast TLB misses" (where the data the CPU needs is in caches) that might cost as little as 5 cycles (and be far cheaper than any prefetching/preloading scheme); and there are also "slow TLB misses" (where the CPU has to fetch the data for the TLB entry all the way from slow RAM chips) that might cost as much as 1000 cycles. This means that if you're task switching extremely frequently you'd probably want to minimise the number of cache lines you touch (to maximise the chance of "fast TLB misses"), which probably implies that you'd want to avoid touching more cache lines during some sort of TLB prefetching scheme.

The other thing that may be worth pointing out is that task switching extremely frequently is extremely stupid; and it's far smarter to avoid task switches instead (e.g. use asynchronous communication where task switches can be postponed/skipped until actually necessary; rather than synchronous communication where every single send/receive costs 2 task switches). In other words, in my opinion, avoiding task switches is far more important than doing task switches faster (e.g. 10 task switches per second at 500 cycles each, rather than 1000 task switches per second at 50 cycles each). ;)


Cheers,

Brendan

_________________
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.


Top
 Profile  
 
 Post subject: Re: Single address space system and tlb caching
PostPosted: Sun Sep 20, 2015 10:00 am 
Offline
Member
Member

Joined: Mon Jul 20, 2015 3:32 pm
Posts: 62
Hi Brendan,

I take your arguments (and Combuster's) and dismiss mine.

So, the only smart thing to do (wait, don't take my word too firmly on that) is to differentiate between computer tasks and user tasks.

One will need responsivity more than anything else (context_switches++), other pure speed (context_switches--).

Share them among the cores according to their respective ratios.
So at the end of the road, we reach a performant experience.

If there are computer tasks only (server..), we do our possible so that there are no more processes than the number of cpu cores that should be run. Again, we reach plenitude :)

if there are almost only user guided applications, we can have a huge number of them, since a human will treat one or two things concurrently. But as far as machine is involved, it doesn't matter, it is so much faster than us.

Is it much better that way?


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 5 posts ] 

All times are UTC - 6 hours


Who is online

Users browsing this forum: No registered users and 30 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group