OSDev.org

The Place to Start for Operating System Developers
It is currently Tue May 11, 2021 8:54 am

All times are UTC - 6 hours




Post new topic Reply to topic  [ 5 posts ] 
Author Message
 Post subject: How to implement efficient IPC for a server-base filesystem
PostPosted: Thu Apr 22, 2021 12:37 pm 
Offline
Member
Member

Joined: Wed Oct 01, 2008 1:55 pm
Posts: 2592
So, how would this be designed to maximize performance?

My first idea was to use my own IPC mechanism, but then I realized it uses too much of dynamic memory allocation and only delivers messages. Parsing texts for commands seems really inefficient.

Next, I allocated control blocks for the messages, put them into queues. I also added a default structure and thought I would just extend them based on which command I wanted to run. This proved to be problematic too, particulary since many messages were anticipated to be paths of varying size. It also was to much work with creating new structures for every message.

So, I thought, why not save the registers to fixed positions in the control block? Then I can "execute" it by loading the operation code in AX. At the other side, I load the registers again, and use the operation code through a table to get to the right function. When the server function has executed the command, the registers are saved in the control block again and then control is passed back to the "client" thread. The client thread will load the registers again. This has the big advantage that the server-side function appears to be executed locally and the results are returned in registers just as if it was called locally.

An addition to this mechanism is to allocate extra space in the control block for passing data to and from the server.

Another issue is that the client will execute in kernel space, but the server should execute in user space in it's own address space. This can be handled by passing a buffer from the server, and then copy the control block to this buffer. Then the registers are loaded in an assembly stub in the server (in user space) and saved again as the function is completed. Then they need to be copied again in kernel space of the server to the kernel-mode control block. An obvious drawback is too much copying and dynamic memory allocation.

Another, probably better, idea is to use a set of fixed buffers so they don't need to be allocated. If the buffer is a multiple of 4k then the buffers can be allocated in 4k chunks. By saving the physical address (or addresses) of the buffer, then registers & message context can be mapped into both kernel space and the server process so no copying is necessary. A fixed number of message buffers can be statically allocated, saving the physical address of the buffer in a circular buffer.


Top
 Profile  
 
 Post subject: Re: How to implement efficient IPC for a server-base filesys
PostPosted: Fri Apr 23, 2021 3:04 am 
Offline
Member
Member

Joined: Wed Oct 01, 2008 1:55 pm
Posts: 2592
I came up with a design that is completely static (no need to allocate memory) and zero-copy. I define a 16-byte IPC structure that contains an eight-byte physical address, a four-byte kernel-level linear address, and a four-byte server process linear address. The kernel-mode linear address will always be mapped to the physical address, while the server (user mode) linear address will only be mapped when the server is processing a message.

I then reserve space for 32 of these 16 byte IPC structures per partition server. To quickly allocate them, I add a 32-bit mask with unused entries and a 32-bit mask with allocated but free entries. Using bsf, btr and btc I can then implement a fast "allocator". Even better, it's possible to implement the allocator without blocking locks (using lock prefix for btr & btc will do).

To keep track of pending requests I add a circular queue with 32 entries (it can use byte indexes into the IPC structure array).

If there are more than 32 requests pending, then request threads will be blocked until an IPC structure becomes available.

The flow will be that the client in kernel space will allocate an IPC structure entry, fill in the registers & data. The entry is put into the pending queue and the server thread is signalled. The server thread (in kernel space) will take the entry from the pending queue, map the physical address to the server process linear address (and possibly allocate it if it's the first usage), and return back to userspace passing the buffer as a parameter. The server process will load the registers, perform the function and save the registers back to the IPC structure. It might also read or write data in the remaining space of the 4k block. After this is done, it returns to kernel space and signals back to the client thread that the operation is completed. The client thread reloads the registers from the IPC structure and returns to the caller.

Edit: The client thread needs to be saved somewhere, and it should not be in the data that is accessible to user-mode, and so it must be placed in the IPC structure. Also, it's better to access the kernel-mode data through a selector rather than a linear address, which would save two bytes and allow the two-byte thread selector to be saved too within 16 bytes.


Top
 Profile  
 
 Post subject: Re: How to implement efficient IPC for a server-base filesys
PostPosted: Fri Apr 23, 2021 6:31 am 
Offline
Member
Member

Joined: Tue Apr 03, 2018 2:44 am
Posts: 203
rdos wrote:
The flow will be that the client in kernel space will allocate an IPC structure entry, fill in the registers & data. The entry is put into the pending queue and the server thread is signalled. The server thread (in kernel space) will take the entry from the pending queue, map the physical address to the server process linear address (and possibly allocate it if it's the first usage), and return back to userspace passing the buffer as a parameter. The server process will load the registers, perform the function and save the registers back to the IPC structure. It might also read or write data in the remaining space of the 4k block. After this is done, it returns to kernel space and signals back to the client thread that the operation is completed. The client thread reloads the registers from the IPC structure and returns to the caller.


So, from a FS server POV, a request will be self contained in a paged size buffer? What will that buffer contain? Stuff like the file operation and details, file offset to read/write, size etc?.

How is the data buffer aligned in that case? A common use case would be paging in data, in which case you want the data buffer to be page aligned and sized, and the request data to be disjoint and pointing at the buffer to use. Your request buffer needs to be separate from your data buffer.

Fitting in with this, it might be better for the FS server to manage the lifecycle of the data buffers.

Consider a request to read in a page sized block of data (to fulfill a demand page request in the VMM):

- VFS is called with a getpage() request to return a page filled in with data from the given vnode/offset. In my VFS, getpage() supplies and returns the page filled in, it is not pre-allocated, as the vnode might be a device vnode to a memory mapped device, in which case it can just return the physical page from the MMIO region for the device.
- VFS sees this vnode is attached to a user space FS, and so formulates an IPC request to the FS server, queues the request, and waits for the response. (notice, no memory is allocated to be filled in.)
- FS server now runs, and returns the request to read from vnode/offset/size. At this point, there is no data buffer, so the FS server uses a paged aligned pre-allocated virtual buffer, and populates the buffer with the desired data.
- FS server responds to the request, with a list of page sized references to buffers in the FS server address space. These buffers could have been filled in on the FS server behalf by an underlying disk device driver, or they could be filled in on the fly by the FS server itself if it providing some sort of virtual filesystem (such as a transparent compression wrapper.)
- VFS gets the response, and immediately uses the buffers returned in the response to return the underlying physical page(s) from getpage(). It should steal these pages, unmapping them from the FS server at the same time, so the pages can't be subsequently referenced by the FS server after they're returned.

You'd need a similar mechanism for page writes. The FS server would need some way of getting the data to write mapped, perhaps the IPC mechanism can have some way of reserving page aligned buffers in the FS server address space, and the write request will just reference the existing buffer addresses in the FS server address space of where the written data can be found. The FS server can then either use those buffers as is and pass them off to the underlying device, or read the buffers and processing them in whatever way it sees fit (compressing the data, for example, in the case of transparent compression wrapper.)

Basically, you need to separate out the management of buffers from the management of IPC requests.

rdos wrote:
Edit: The client thread needs to be saved somewhere, and it should not be in the data that is accessible to user-mode, and so it must be placed in the IPC structure. Also, it's better to access the kernel-mode data through a selector rather than a linear address, which would save two bytes and allow the two-byte thread selector to be saved too within 16 bytes.


I'd keep the data simple, and avoid x86-isms like selectors. Just use pointers, and if your IPC structure isn't 16 bytes, so be it. You're not exporting the IPC structure to user space, so it doesn't matter if it changes.


Top
 Profile  
 
 Post subject: Re: How to implement efficient IPC for a server-base filesys
PostPosted: Fri Apr 23, 2021 7:54 am 
Offline
Member
Member

Joined: Wed Oct 01, 2008 1:55 pm
Posts: 2592
thewrongchristian wrote:
So, from a FS server POV, a request will be self contained in a paged size buffer? What will that buffer contain? Stuff like the file operation and details, file offset to read/write, size etc?.


The operation type has its own field, and most things will be passed & returned in x86 registers (eax, ebx, ecx, edx, esi and edi). The client-side, as well as the server kernel side, are in x86 assembly, and so using a register-based interface seems to be the best alternative. Even the user mode part decoding the requests is in x86 assembly and then interfaces with C/C++ using OpenWatcom register pragmas.

thewrongchristian wrote:
How is the data buffer aligned in that case? A common use case would be paging in data, in which case you want the data buffer to be page aligned and sized, and the request data to be disjoint and pointing at the buffer to use. Your request buffer needs to be separate from your data buffer.


Data will be passed as physical addresses to the objects in question. This will allow up to 510 physical pages to be passed from server to client. I will add a syscall that can extract physical addresses from server space and place them in the reply. For the client side, which runs in kernel mode, these functions are already accessible. Although, this needs to be carefully handled so the server process cannot get access to kernel. Perhaps the mapping should always be part of a call that returns the buffer back to the client so the server process cannot access the physical addresses. I think passing physical addresses from client to server should not be possible, and probably not needed either.

thewrongchristian wrote:
Consider a request to read in a page sized block of data (to fulfill a demand page request in the VMM):

- VFS is called with a getpage() request to return a page filled in with data from the given vnode/offset. In my VFS, getpage() supplies and returns the page filled in, it is not pre-allocated, as the vnode might be a device vnode to a memory mapped device, in which case it can just return the physical page from the MMIO region for the device.
- VFS sees this vnode is attached to a user space FS, and so formulates an IPC request to the FS server, queues the request, and waits for the response. (notice, no memory is allocated to be filled in.)
- FS server now runs, and returns the request to read from vnode/offset/size. At this point, there is no data buffer, so the FS server uses a paged aligned pre-allocated virtual buffer, and populates the buffer with the desired data.
- FS server responds to the request, with a list of page sized references to buffers in the FS server address space. These buffers could have been filled in on the FS server behalf by an underlying disk device driver, or they could be filled in on the fly by the FS server itself if it providing some sort of virtual filesystem (such as a transparent compression wrapper.)
- VFS gets the response, and immediately uses the buffers returned in the response to return the underlying physical page(s) from getpage(). It should steal these pages, unmapping them from the FS server at the same time, so the pages can't be subsequently referenced by the FS server after they're returned.


I've not yet completely decided how to implement this, but a guess is that the client will issue a file lock request and put the file position at EDX:EAX and the number of pages in ECX. The server will then return the physical addresses to the file data (which will typically reside in the disc cache). When the client no longer needs access to the file data it must unlock that part of the file with a new request.

Perhaps more interesting is that normal file-IO can be implemented by mapping file data into userspace and then let user-space read directly from these buffers. In this case, the lock requests will be sent as needed by the kernel when the application tries to read non-mapped data.

thewrongchristian wrote:
You'd need a similar mechanism for page writes. The FS server would need some way of getting the data to write mapped, perhaps the IPC mechanism can have some way of reserving page aligned buffers in the FS server address space, and the write request will just reference the existing buffer addresses in the FS server address space of where the written data can be found. The FS server can then either use those buffers as is and pass them off to the underlying device, or read the buffers and processing them in whatever way it sees fit (compressing the data, for example, in the case of transparent compression wrapper.)


Writes are a bit more complicated, but they will typically start with a read lock and then will be followed by write requests on modified pages. However, a complication is that if file data is not page-aligned in the FS, then things get more complicated. There is also a need to "collect" writes so that discs are not ruined by too many writes when these happen byte-by-byte.

thewrongchristian wrote:
Basically, you need to separate out the management of buffers from the management of IPC requests.


They basically are. The IPC structure is kernel-only while the 4k data block which contains registers & data is accessible to the user-mode FS server process.

I already implemented the first (and simplest) request. It's a request to return free space. The free clusters value is collected when the partition starts up, and so reading this out is as simple as returning it in EDX:EAX. It seems to work.

The next step is to cache directories and to implement opendir, readdir and closedir. In the current solution, I create a snap-shot of the directory in a memory object, and then readdir will return a particular entry. I think in the new design I will use version numbers of the directory. Since directories are meta-data that differs by FS, it will need to be saved in the server process user address space. This caching should be page-aligned. My idea is that I will put a path in the data area of the request and issue a lock directory request. The server will then return physical pages in the data area. ECX will be the number of pages, and EBX might be the version of the directory. The server will keep a lock count per version of the directory data and will delete them once they are not current and not used. The client will map the directory entries to a kernel area and then service readdir by reading particular directories. It's also possible that the entries should be mapped as read-only in user-mode and then implement readdir completely in user mode instead.


Top
 Profile  
 
 Post subject: Re: How to implement efficient IPC for a server-base filesys
PostPosted: Fri Apr 23, 2021 8:44 am 
Offline
Member
Member

Joined: Wed Oct 01, 2008 1:55 pm
Posts: 2592
I think I have a solution to the security issues of letting the userspace FS server pass physical addresses of objects.

I can add a buffer type field to the message, and then let the kernel syscall used to return the buffer to the client fill in this field. If the reply is constructed by the server only, then this field might be "server only", and so should not contain physical addresses. If the server uses a syscall to return a user-space object, the field would be coded as "server object", and then the data part will be filled with the number of physical addresses + the actual physical addresses. A third method the server might use is to use a syscall that references sector numbers from the disc cache, and then the same coding would be used but the actual addresses would be collected by locking these sectors in the disc cache and returning their physical addresses in the data part. The server might code which sectors to return by writing them to the data part of the reply and then the kernel side would lock the sectors and replace the sector # with the physical address of the sector. A similar method might be used for memory objects. The server would write the size and base of the memory object to the reply and then the kernel side would transform it to an array of physical addresses.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 5 posts ] 

All times are UTC - 6 hours


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group