OSDev.org

The Place to Start for Operating System Developers
It is currently Thu Mar 28, 2024 2:59 pm

All times are UTC - 6 hours




Post new topic Reply to topic  [ 39 posts ]  Go to page Previous  1, 2, 3
Author Message
 Post subject: Re: Zero-copy file-IO in userspace
PostPosted: Mon Apr 10, 2023 2:59 pm 
Offline
Member
Member

Joined: Wed Oct 01, 2008 1:55 pm
Posts: 3192
bellezzasolo wrote:
There's a reason that all the major OSes offer some form of Scatter-Gather API. MSDN offer the example of database applications - https://learn.microsoft.com/en-us/windows/win32/fileio/reading-from-or-writing-to-files-using-a-scatter-gather-scheme

GNU libc is less specific- https://www.gnu.org/software/libc/manual/html_node/Scatter_002dGather.html


Of course there is. The reason is that they implement the legacy read/write API for IO. This means that applications pass buffers that might span pages, and when these are passed to devices that use physical addresses, there is a need for scatter-gather.

Besides, reading the description of the Windows API, this is pretty similar to how my API works, except that in my implementation, the application will not provide the buffer, and alignment is handled by the OS and not by the calling application. I wouldn't call this "scatter-gather", rather it's a way to speed up file-IO by putting a lot of demands on the caller. So, yes, I more or less provide this support through the file map syscall. However, the file class is smart enough to always use this API for all file IO, without burdening the application with a lot of strange constraints.

bellezzasolo wrote:
My OS is 64 bit, so virtual memory is no issue. Given that this is a capability offered by a plethora of modern hardware, I'd certainly consider it desirable to offer the API. The advantage is that the kernel virtual-physical translation layer can be common and is pretty much free. Then your file cache is almost just a special device driver that works in system memory, moving around pages.

Applications doing silly things with small writes can be addressed with userspace buffering IMO, not hard to add to a libc.


I think you will discover that this will not provide you with decent filesysten performance, and writing a 64 bit OS won't change this.

My original filesystem implementation, which is stable and run on a lot of systems, is based on passing buffers. However, it also has caches both for disc sectors and file content. That was necessary to provide performance compatible to that of other OSes, and I'm pretty sure that both Windows & Linux has these caches.

Regardless how you implement an API by passing buffers, you will be unable to handle small requests without copying. Buffering in userspace means you need to guess on optimal read sizes and copying is necessary for your buffers.


Top
 Profile  
 
 Post subject: Re: Zero-copy file-IO in userspace
PostPosted: Mon Apr 10, 2023 3:51 pm 
Offline
Member
Member
User avatar

Joined: Sun Feb 20, 2011 2:01 pm
Posts: 110
rdos wrote:
bellezzasolo wrote:
There's a reason that all the major OSes offer some form of Scatter-Gather API. MSDN offer the example of database applications - https://learn.microsoft.com/en-us/windows/win32/fileio/reading-from-or-writing-to-files-using-a-scatter-gather-scheme

GNU libc is less specific- https://www.gnu.org/software/libc/manual/html_node/Scatter_002dGather.html


Of course there is. The reason is that they implement the legacy read/write API for IO. This means that applications pass buffers that might span pages, and when these are passed to devices that use physical addresses, there is a need for scatter-gather.

Besides, reading the description of the Windows API, this is pretty similar to how my API works, except that in my implementation, the application will not provide the buffer, and alignment is handled by the OS and not by the calling application. I wouldn't call this "scatter-gather", rather it's a way to speed up file-IO by putting a lot of demands on the caller. So, yes, I more or less provide this support through the file map syscall. However, the file class is smart enough to always use this API for all file IO, without burdening the application with a lot of strange constraints.

bellezzasolo wrote:
My OS is 64 bit, so virtual memory is no issue. Given that this is a capability offered by a plethora of modern hardware, I'd certainly consider it desirable to offer the API. The advantage is that the kernel virtual-physical translation layer can be common and is pretty much free. Then your file cache is almost just a special device driver that works in system memory, moving around pages.

Applications doing silly things with small writes can be addressed with userspace buffering IMO, not hard to add to a libc.


I think you will discover that this will not provide you with decent filesysten performance, and writing a 64 bit OS won't change this.

My original filesystem implementation, which is stable and run on a lot of systems, is based on passing buffers. However, it also has caches both for disc sectors and file content. That was necessary to provide performance compatible to that of other OSes, and I'm pretty sure that both Windows & Linux has these caches.

Regardless how you implement an API by passing buffers, you will be unable to handle small requests without copying. Buffering in userspace means you need to guess on optimal read sizes and copying is necessary for your buffers.

Surely the optimal read size issue could be addressed by passing the requisite information as part of the result from fopen()?

What the file cache does internally isn't really my area of expertise, as I haven't got round to writing one. It's just that I don't see the harm in exposing a scatter-gather interface to the application. It's not really zero copy if the database application has to copy around data...

_________________
Whoever said you can't do OS development on Windows?
https://github.com/ChaiSoft/ChaiOS


Top
 Profile  
 
 Post subject: Re: Zero-copy file-IO in userspace
PostPosted: Tue Apr 11, 2023 2:20 am 
Offline
Member
Member

Joined: Wed Oct 01, 2008 1:55 pm
Posts: 3192
bellezzasolo wrote:
Surely the optimal read size issue could be addressed by passing the requisite information as part of the result from fopen()?


You probably could, but then your fopen would be incompatible with fopen in POSIX. Besides, I don't think the application developer should need to consider internals in your filesystem implementation. Which is exactly what the MS scatter-gather API requires.

I do this by using the current file position and the size passed to read. If a read is not mapped, the requested position & size is passed to the filesystem. The filesystem then adjusts the size and position according to the actual file system used and it's alignment. For instance, it will always set the size to at least one 4k page. The only requirement of the filesystem is that data at the requested file position is read, but then data before and after can also be read and mapped to user space.

So, with my API, an application wanting more control of the mapping process can itself issue map requests with wanted start position and size. It then can use the mapped file info to figure out where different parts of the file are mapped. Using this method, it can achieve true zero-copy. An application wanting to do random accesses and not bother about mapping details, can use the file class and pass buffers and let the file class send appropriate mapping requests to the filesystem.


Top
 Profile  
 
 Post subject: Re: Zero-copy file-IO in userspace
PostPosted: Mon Apr 17, 2023 5:07 am 
Offline
Member
Member

Joined: Wed Oct 01, 2008 1:55 pm
Posts: 3192
I think I will add a new server function that allows better status & debugging functionality. The client would send a command to the server, the server would parse it just like the command line tool, and then return an answer in clear text. This could be run as a special tool (like ordinary ftp or telnet functions), or as part of the command shell.

I now also have a partition server, and a client could send commands like remove partition, create partition, init disc for MBR/GPT. It could also mount & unmount filesystems or show details about current partitions. This way I don't need to implement this as userlevel classes that are linked into partitioning tools, and I can hide this in the MBR and GPT partition servers. Actually, I could disallow direct disc access to vital filesystem data.

For the filesystem servers, I could ask for open files, which blocks they have mapped, how much memory the cache uses and so on.

I then don't need most of this functionality as syscalls, and it's easy to add new commands to the servers. I can also provide help commands that shows syntax.

Particularly for creating bootable discs and EFI system partitions with support files, I could provide these in resource files to the partition manager and create the correct structure on the partitions.

Unlike how Fuse is constructed, I would definitely want a format function in the filesystem server (this should be required), and optionally a "check disc" tool that can fix crosslinking issues and other filesystem problems. This should not be special application tools, rather should be integrated with the filesystem server. Both of these can be run using the command line interface.


Top
 Profile  
 
 Post subject: Re: Zero-copy file-IO in userspace
PostPosted: Wed Oct 25, 2023 12:57 pm 
Offline
Member
Member

Joined: Wed Oct 01, 2008 1:55 pm
Posts: 3192
The work has progressed a bit. I now have a working file-write API too. The file class will send a grow request to the file IO server, and the server will add clusters (FAT case) to the file, and then send a request to the disc server for the added sectors. The application will wait for completion of the request. The application will normally just write the data in userspace, and then the kernel side will check for modified pages and send write requests for these to the file IO server.

I also decided that unaligned file data could not be mapped for user space, and so requests without alignment will be mapped as kernel only (but still in user space). The file class will notice the request is not aligned, and use the kernel read/write handle syscalls instead of accessing data in user space. I've also implemented the kernel methods, more or less by duplicating the file class in kernel space. One difference is that it will directly notify areas written, and so these requests don't need to be scanned for modified pages.

For unaligned filesystems, file write is a bit of a challenge. Normally, all write requests to unaligned files would need to be preceeded by reads, but some smart coding has eliminated much of this on files that are not fragmented. However, all these writes must be done in kernel space since the files are not mapped with 4k alignment.

I've also linked the new filesystem API to the POSIX handle concept, which has some advantages.


Top
 Profile  
 
 Post subject: Re: Zero-copy file-IO in userspace
PostPosted: Tue Jan 09, 2024 8:28 pm 
Offline
Member
Member
User avatar

Joined: Mon Jun 05, 2006 11:00 pm
Posts: 2293
Location: USA (and Australia)
I did implement this.

Assuming all stars align:
  • The kernel can find you some physical memory in a low enough address space that DMA can write directly to it.
  • The address you want to read is sector aligned.
  • The length you want to read is a multiple of the sector size.
  • The physical area you want to write to is contiguous.

Typing this out, this sounds very rare, but is actually easy to accomplish with memory mapped IO.

Here is my loader where I memory map an ELF binary and parse it without copying (except to copy the sections into the final binary memory):
https://github.com/AndrewAPrice/Percept ... _loader.cc

_________________
My OS is Perception.


Top
 Profile  
 
 Post subject: Re: Zero-copy file-IO in userspace
PostPosted: Wed Jan 10, 2024 2:15 am 
Offline
Member
Member

Joined: Wed Oct 01, 2008 1:55 pm
Posts: 3192
I think this function is mostly useful for high speed disc hardware, and those are typically built on top of PCI using bus-mastering techniques (AHCI, NVMe). The disc hardware will typically support the physical memory available in the machine, and so there is no need to support DMA with a restricted address space. If such hardware is found, it's probably old and with poor performance, and then the disc driver can either use PIO or copy request data to proper buffers. A special case is USB drives. It would be possible to implement an interface for zero-copy in the USB stack if the USB hardware supports all physical addresses. However, at this point I decided to copy data and so when running on USB hardware, I don't currently support zero-copy (but could do in the future). The only hardware I support zero-copy on right now is NVMe.


Top
 Profile  
 
 Post subject: Re: Zero-copy file-IO in userspace
PostPosted: Sun Jan 14, 2024 10:28 am 
Offline
Member
Member

Joined: Sun Apr 21, 2019 7:39 am
Posts: 76
Octocontrabass wrote:
Almost all modern disks use 4096 bytes per physical sector.

Source?

_________________
Hey! I'm developing two operating systems:

NanoShell --- A 32-bit operating system whose GUI takes inspiration from Windows 9x and early UNIX desktop managers.
Boron --- A portable SMP operating system taking inspiration from the design of the Windows NT kernel.


Top
 Profile  
 
 Post subject: Re: Zero-copy file-IO in userspace
PostPosted: Sun Jan 14, 2024 4:37 pm 
Offline
Member
Member

Joined: Mon Mar 25, 2013 7:01 pm
Posts: 5100
iProgramInCpp wrote:
Source?

Mostly datasheets, but also documents like this one.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 39 posts ]  Go to page Previous  1, 2, 3

All times are UTC - 6 hours


Who is online

Users browsing this forum: No registered users and 23 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group