Which page replacement algorithms are used in xv6 and Minix?

quadrant · **Joined:** Tue Apr 24, 2018 9:46 pm **Posts:** 74

Which page replacement aglorithms are used in xv6 and Minix?

I am trying to learn how page replacement is implemented. Both operating systems are written for educational purposes, so they would be great starting points for this.

Brendan · **Posted:** Sun Apr 29, 2018 5:45 pm

Hi,

quadrant wrote:

Which page replacement aglorithms are used in xv6 and Minix?

I am trying to learn how page replacement is implemented. Both operating systems are written for educational purposes, so they would be great starting points for this.

To be "perfect", an educational OS needs to be simple enough for beginners to learn from but also needs to be complex enough to cover all useful information. These requirements are mutually exclusive...

I wouldn't be too surprised if neither xv6 nor Minix supports swap space, and wouldn't be surprised if they both have no page replacement algorithm at all (especially if you mean "Minix 1.0").

In general; a "least recently used" algorithm is typically used as the basis for a page replacement algorithm. For this; you need some way to determine how recently used a page is. How this is done depends on the CPU. For 80x86 CPUs there are "accessed" and "dirty" flags in the page table entries that the CPU automatically sets for you; so you might scan the page table entries periodically and update the age of each accessed page. For other CPUs (e.g. those with "software TLB miss handling") you might update the age of each page that's accessed during page faults. Once you know (roughly) how recently each page was accessed you can determine which page/s were "least recently used", and you can send those to swap space (if you need to).

However; "pure least recently used" ignores things like the cost involved. For example, if a page was stored in swap space previously and hasn't been modified since, then you can free the page without storing it in swap space again (which would be faster). Also note that swap space isn't the only option - e.g. unmodified pages that are part of a memory mapped file can be freed and then fetched from the file if they're accessed again. Essentially, for each page you have how much time has passed since it was used last and you have a cost; and you can do "score = time * cost" and select the page/s with the best score, so that (in some cases) you improve performance by selecting pages that aren't the least recently used pages.

But...

"Least recently used" is just an estimate for "most likely to be used soon" that tends to get used for things like cache eviction and paging algorithms because it doesn't depend on other information (and it's hard to predict the future accurately anyway, so...). What if you had more information you could use? For example, if you knew that a task that is currently sleeping will wake up in 5 minutes, you could accurately predict that the task is going to use at least some of its pages when it wakes up. For another example; what if you kept track of past access patterns and knew that when the user presses there's a 50% probability that a web browser would access 20 specific pages and a 90% probability that a GUI would access 100 specific pages?

Let's make the model a little more flexible and say that for each page you have a prediction of how likely it is that it will be used soon (where the prediction may be based on "least recently used" if there's no other information, but could be from other information too), and each page has a cost; and you can do "score = prediction * cost" and select the page/s with the best score.

Now...

This is only looking at one side (selecting which pages to send to disk to free RAM). The other side is figuring out which pages to bring back into RAM. This can be split into 2 parts - prefetching data before it's accessed, and having to fetch immediately because you failed to prefetch. Figuring out what to prefetch can be very complicated (but very important for performance). The problem is that (if you're not careful) these systems can fight - you don't want the OS to prefetch pages and send them back to disk immediately, and you don't want to send pages to disk and prefetch them immediately.

To fix that you might want a more wholistic model. Specifically; ideally you'd want to know (estimate) what the best contents of RAM would be for the future, then find the difference between "estimated best contents for future" and the current contents of RAM, then do whatever you can (sending pages to disk and bringing them back from disk) to reduce the differences while taking cost into account. This kind of thinking leads beyond "cutting edge" into "research"; and (in terms of complexity) there's no restrictions.

Cheers,

Brendan

simeonz · **Joined:** Fri Aug 19, 2016 10:28 pm **Posts:** 360

I cannot find more authoritative sources, but according to this source:

Quote:

An important feature lacking in xv6 is the ability to swap out pages to a backing store. That is, at each moment in time all processes are held within the main (physical) memory.

And according to this source:

Quote:

Memory management in MINIX 3 is done by the Process Manager (PM)
• No swapping or paging: a process is created, gets its memory, and nothing changes to that until the process terminates.
• Simple first-fit memory allocation scheme, allocated memory never shrinks or grows

For other OSes, you might be interested to know that Windows NT 4.0 used some kind of local second-chance or clock on the working sets of its processes on uniprocessor systems, but random page replacement on multiprocessor ones, probably to avoid the cost of TLB shootdown when changing the access bits or present flag of PTEs. Source is here. I am not sure what newer versions use for SMP. In fact, I wonder if the TLB shootdown is really necessary, since having inconsistent TLB is only going to cause sporadic redundant evictions and soft-faults (edit: for the accessed flag that is).

Linux I think uses approximate global LRU, where an actual LRU becomes the victim cache behind a second-chance/clock page replacement. The clock policy evicts pages to the LRU (marking them not-present in the process) and the LRU later uses the arrival of pages into its list (which corresponds to the list order) for choosing pages to ultimately re-purpose to other processes, or to clean (i.e. flush to storage), etc. The combination results in lower-precision than true LRU, but with much better performance than actual LRU on real cpu hardware. Long-term recency is better accounted for than second-chance alone. If you are brave enough you might want to look at the Linux sources, although I personally find that this subsystem is a nightmare.

Although Brendan already commented on this, I would just summarize the two most common pathologies that you want to address - a large working set having no cache utilization (i.e. low hit percentage) due to self-eviction, and a small working set being starved by very intensive large working set process.

quadrant · **Joined:** Tue Apr 24, 2018 9:46 pm **Posts:** 74

Ah, thankyou! This doesn't look as straightforward as I was hoping it would be =/. It will take some time to chew on the information you've both shared.

The xv6 assignment link simeonz shared looks promising:

Quote:

Memory management is one of the most important features of any operating system. In this assignment we will examine how xv6 handles memory and attempt to extend it by implementing a paging infrastructure which will allow xv6 to store parts of the process' memory in a secondary storage.

zaval · **Posted:** Sat May 05, 2018 3:41 pm

Quote:

For other OSes, you might be interested to know that Windows NT 4.0 used some kind of local second-chance or clock on the working sets of its processes on uniprocessor systems, but random page replacement on multiprocessor ones, probably to avoid the cost of TLB shootdown when changing the access bits or present flag of PTEs. Source is here. I am not sure what newer versions use for SMP. In fact, I wonder if the TLB shootdown is really necessary, since having inconsistent TLB is only going to cause sporadic redundant evictions and soft-faults (edit: for the accessed flag that is).

From the "Windows Internals" by M. Russinovich and D. Solomon, 3d Edition, p. 464:

Quote:

On a single-processor Windows 2000 system and all systems running Windows XP and Windows
Server 2003, the working set manager tries to remove pages that haven’t been accessed
recently. It does this by checking the accessed bit in the hardware PTE to see whether the page
has been accessed. If the bit is clear, the page is aged, that is, a count is incremented indicating
that the page hasn’t been referenced since the last working set trim scan. Later, the age of
pages is used to locate candidate pages to remove from the working set.

Note On a Windows 2000 multiprocessor system, the working set manager mistakenly
didn’t check the access bit, resulting in pages being removed from the working set without
regard to the state of the accessed bit.

simeonz · **Joined:** Fri Aug 19, 2016 10:28 pm **Posts:** 360

zaval wrote:

From the "Windows Internals" by M. Russinovich and D. Solomon, 3d Edition, p. 464:

Quote:

On a single-processor Windows 2000 system and all systems running Windows XP and Windows
Server 2003, the working set manager tries to remove pages that haven’t been accessed
recently. It does this by checking the accessed bit in the hardware PTE to see whether the page
has been accessed. If the bit is clear, the page is aged, that is, a count is incremented indicating
that the page hasn’t been referenced since the last working set trim scan. Later, the age of
pages is used to locate candidate pages to remove from the working set.

Note On a Windows 2000 multiprocessor system, the working set manager mistakenly
didn’t check the access bit, resulting in pages being removed from the working set without
regard to the state of the accessed bit.

The book seems to suggest that Windows 2000 and later are probably (at least attempting) to use clock-style page replacement policy in SMP. Which brings me to another clarification that I wanted to make, but decided to abstain from making in order not to overextend the discussion. I mentioned earlier that Windows uses local clock page replacement, but (edit: what I failed to mention is that) the evicted pages are moved to global lists, which are then also processed LRU-style, which is also apparently indicated in the book. In that sense, Windows also works like linux, i.e. ultimately being quasi-LRU, with the difference being that the clock phase is applied to the processes' working sets individually and the LRU is global. This apparently enables per-process decisions and control, but can't recall ever reading how those operate precisely. It is however interesting to note that threads have "paging priority", so this could be involved.

OSDev.org

Which page replacement algorithms are used in xv6 and Minix?

Who is online