thewrongchristian wrote:
So, from a FS server POV, a request will be self contained in a paged size buffer? What will that buffer contain? Stuff like the file operation and details, file offset to read/write, size etc?.
The operation type has its own field, and most things will be passed & returned in x86 registers (eax, ebx, ecx, edx, esi and edi). The client-side, as well as the server kernel side, are in x86 assembly, and so using a register-based interface seems to be the best alternative. Even the user mode part decoding the requests is in x86 assembly and then interfaces with C/C++ using OpenWatcom register pragmas.
thewrongchristian wrote:
How is the data buffer aligned in that case? A common use case would be paging in data, in which case you want the data buffer to be page aligned and sized, and the request data to be disjoint and pointing at the buffer to use. Your request buffer needs to be separate from your data buffer.
Data will be passed as physical addresses to the objects in question. This will allow up to 510 physical pages to be passed from server to client. I will add a syscall that can extract physical addresses from server space and place them in the reply. For the client side, which runs in kernel mode, these functions are already accessible. Although, this needs to be carefully handled so the server process cannot get access to kernel. Perhaps the mapping should always be part of a call that returns the buffer back to the client so the server process cannot access the physical addresses. I think passing physical addresses from client to server should not be possible, and probably not needed either.
thewrongchristian wrote:
Consider a request to read in a page sized block of data (to fulfill a demand page request in the VMM):
- VFS is called with a getpage() request to return a page filled in with data from the given vnode/offset. In my VFS, getpage() supplies and returns the page filled in, it is not pre-allocated, as the vnode might be a device vnode to a memory mapped device, in which case it can just return the physical page from the MMIO region for the device.
- VFS sees this vnode is attached to a user space FS, and so formulates an IPC request to the FS server, queues the request, and waits for the response. (notice, no memory is allocated to be filled in.)
- FS server now runs, and returns the request to read from vnode/offset/size. At this point, there is no data buffer, so the FS server uses a paged aligned pre-allocated virtual buffer, and populates the buffer with the desired data.
- FS server responds to the request, with a list of page sized references to buffers in the FS server address space. These buffers could have been filled in on the FS server behalf by an underlying disk device driver, or they could be filled in on the fly by the FS server itself if it providing some sort of virtual filesystem (such as a transparent compression wrapper.)
- VFS gets the response, and immediately uses the buffers returned in the response to return the underlying physical page(s) from getpage(). It should steal these pages, unmapping them from the FS server at the same time, so the pages can't be subsequently referenced by the FS server after they're returned.
I've not yet completely decided how to implement this, but a guess is that the client will issue a file lock request and put the file position at EDX:EAX and the number of pages in ECX. The server will then return the physical addresses to the file data (which will typically reside in the disc cache). When the client no longer needs access to the file data it must unlock that part of the file with a new request.
Perhaps more interesting is that normal file-IO can be implemented by mapping file data into userspace and then let user-space read directly from these buffers. In this case, the lock requests will be sent as needed by the kernel when the application tries to read non-mapped data.
thewrongchristian wrote:
You'd need a similar mechanism for page writes. The FS server would need some way of getting the data to write mapped, perhaps the IPC mechanism can have some way of reserving page aligned buffers in the FS server address space, and the write request will just reference the existing buffer addresses in the FS server address space of where the written data can be found. The FS server can then either use those buffers as is and pass them off to the underlying device, or read the buffers and processing them in whatever way it sees fit (compressing the data, for example, in the case of transparent compression wrapper.)
Writes are a bit more complicated, but they will typically start with a read lock and then will be followed by write requests on modified pages. However, a complication is that if file data is not page-aligned in the FS, then things get more complicated. There is also a need to "collect" writes so that discs are not ruined by too many writes when these happen byte-by-byte.
thewrongchristian wrote:
Basically, you need to separate out the management of buffers from the management of IPC requests.
They basically are. The IPC structure is kernel-only while the 4k data block which contains registers & data is accessible to the user-mode FS server process.
I already implemented the first (and simplest) request. It's a request to return free space. The free clusters value is collected when the partition starts up, and so reading this out is as simple as returning it in EDX:EAX. It seems to work.
The next step is to cache directories and to implement opendir, readdir and closedir. In the current solution, I create a snap-shot of the directory in a memory object, and then readdir will return a particular entry. I think in the new design I will use version numbers of the directory. Since directories are meta-data that differs by FS, it will need to be saved in the server process user address space. This caching should be page-aligned. My idea is that I will put a path in the data area of the request and issue a lock directory request. The server will then return physical pages in the data area. ECX will be the number of pages, and EBX might be the version of the directory. The server will keep a lock count per version of the directory data and will delete them once they are not current and not used. The client will map the directory entries to a kernel area and then service readdir by reading particular directories. It's also possible that the entries should be mapped as read-only in user-mode and then implement readdir completely in user mode instead.