rdos wrote:
When I designed my current algorithm, the amount of linear memory was large and the amount of physical memory was small, so it made sense to create buffering based on linear memory. Today, it is more or less the opposite. A PC can have several GB of physical memory, and the kernel only has maybe 1/2 GB of linear memory for buffers. Basing the buffering algorithm on linear memory thus makes it impossible to use all of the spare physical memory for disc buffering.
Of course, with long mode this is a non-issue, but I have no plans to move to long mode right now. It's also a lot more challenging (and thus fun) to solve this for 32-bit protected mode.
So, if it is linear memory that is the scares resource, and physical memory that is the abundant, this should mean that buffers should be based on physical pages rather than memory-mapped linear memory. This also is advantageous with most modern disc-hardware, like AHCI that is based on physical memory in the schedules and not linear. The disc driver for AHCI actually don't need to map buffers to linear memory, rather works better with physical addresses only. So, buffers to modern disc drivers will work best if handed physical memory buffers.
Another way to increase the linear memory area available to filesystem drivers is to run the disc driver & file system driver as a separate process and then use the 3 GB of linear memory available to a user process for linear memory buffers. This will mean that 6 times larger linear buffers can be used.
When my OS grows up, it wants to use a similar scheme. I want to have at least filesystem drivers in user space, a'la FUSE. Once the OS has determined the page required is not in memory when resolving a page fault, you're already resigned to a slow device I/O path, so the added overhead of transitioning to user space is probably not that significant, and the complexity of filesystem code can be contained safely within user space making it easier to get the benefits of reduced complexity in kernel mode.
rdos wrote:
A problem with mapping & unmapping linear pages is that the TLB needs to be invalidated, and also that IPIs must be sent to all cores so they also invalidate their TLBs. However, if the OS can make sure that all threads in the disc process runs on the same core, then only the TLB on that core will need to be invalidated when mappings change.
Thoughts? Is this a usable idea?
If you have your filesystem at least in user space, you'll have less buffer pressure requiring less buffer reuse/unmapping, and less need for TLB shootdown.
In kernel space buffers, however, if each buffer has a dedicated linear address that is mapped on demand to the physical page as required, then so long as the buffer is locked by a thread to access that buffer memory, you'll have no need to issue TLB shootdown IPIs at all. Your buffer access would be something like:
Code:
struct buffer {
mutex lock;
page_t page;
size_t size;
void * p;
};
struct buffer * get_buffer(page_t page, size_t size)
{
/* Get an unused buffer from the pool - comes back locked */
struct buffer * buf = get_free_buffer();
buf->page = page;
buf->size = size;
/* Temporarily map the page(s) to the buffer virtual address */
arch_map(buf->page, buf->size, buf->p);
return buf;
}
void put_buffer(struct buffer * buf)
{
/* Unmap the temporary mapping */
arch_unmap_no_tlb_shootdown(buf->p, buf->size);
/* Put the still locked buffer back into the pool */
put_free_buffer(buf);
}
Thus, stale TLB entries from unlocked buffers are also guaranteed to be unused, and won't be referenced. Once a buffer is locked, the first thing the thread does is remap the buffer address, thus overwriting any stale TLB entries anyway. No TLB shoot down necessary.