Hi,
smeezekitty wrote:
In short: When calling a blocking system call, how does one avoid blocking the kernel?
Assume you have some kind of device where the driver issues a request/command to the device (e.g. to read or write some sectors) and then a little later the device causes an IRQ to inform the device driver that its request has completed.
Now assume that your driver maintains some kind of list of pending requests; and when the device's IRQ occurs the driver:
- Gathers status from the "just completed" request, and informs something else (e.g. file system code)
- Finds the next "pending request" and asks the device to start doing that request
At any time you could add more requests to the driver's list of pending requests (and when this happens you check if there's no "currently in progress" request and tell the disk controller to start the request immediately).
Now assume that individual file systems (e.g. FAT, ext2, ISO9660, whatever) and the virtual file system layer all use similar techniques - e.g. they all work on "requests" and "replies", and they all keep track of "pending requests" and "in-progress requests". For example, from the VFS's perspective:
- something sends a request to VFS asking it to fetch 20 KiB from a file, VFS remembers the details
- VFS sends a request to FAT file system code asking it to fetch 20 KiB from a file
- Sooner or later VFS receives reply from FAT file system, figures out what it's about (the details it remembered earlier) and sends reply back to whatever send the initial request
And from the FAT file system's perspective:
- VFS sends a request to FAT file system fetch 20 KiB from a file, FAT file system remembers the details
- FAT file system sends a request to storage device driver asking it to fetch 20 KiB from a hard disk
- Sooner or later FAT file system receives reply from storage device driver, figures out what it's about (the details it remembered earlier) and sends reply back to VFS
And from the storage device driver's perspective:
- FAT file system sends a request to storage device driver to fetch 20 KiB from a hard disk, storage device driver remembers the details
- Sooner or later storage device driver sends a command to the disk controller asking it to fetch 20 KiB from a hard disk
- Sooner or later storage device driver receives an IRQ from disk controller, figures out what it's about (the details it remembered earlier) and sends reply back to FAT file system
Finally; assume that the kernel API has a "read()" function, which actually does "send request to VFS and block this task" and when the VFS sends a reply back the kernel unblocks the task again (so that the task doesn't get any CPU time while it's waiting for its "read()" to complete). Of course kernel can also have a "read_async()" function which sends a request to the VFS but doesn't block the task.
Now imagine that there's 54321 tasks that are all calling "read()" at various times (and blocking and unblocking); but the VFS is able to have many requests in various states at the same time, and the FAT file system is able to have many requests in various states at the same time, and storage device drivers are able to have many requests in various states at the same time.
Next; imagine that there's multiple file systems (FAT, ext2, ISO9660, etc) and multiple disk controller drivers and/or storage devices. At any point in time VFS might have 1234 requests that it's keeping track of; where FAT might have 234 of the requests, ext2 might have 600 of the requests and ISO9660 might have 400 of the requests; and where SATA disk controller might have 834 of the requests and USB CD-ROM driver might have 400 of the requests.
Also imagine that the VFS layer does file data caching. Whenever kernel/task sends a request to VFS asking to read some data from a file, VFS checks its cache and if the data is in the cache it sends a reply back immediately; and if the data isn't in the cache it sends a request to a file system, and when the file system replies the VFS grabs the data from the FS's reply and puts it in the cache (before VFS sends its own reply back to kernel/task that made the original request). Whenever kernel/task sends a request to VFS asking to write some data to a file, VFS stores the new data in its cache then sends a request to a file system.
Also imagine that all of these requests have an "IO priority"; and that each layer can take this into account when figuring out which request to do next; so that important things happen sooner/faster (and less important things happen later/slower). Then imagine that (for swap space) the kernel may send requests directly to the storage device drivers and that these requests are "very high priority" (so that swap space reads/writes happen very quickly even when there's 1234 lower/normal priority requests waiting to be done).
Note that for a micro-kernel (e.g. where all of these pieces are separate processes, and sending requests and replies is done using IPC/message passing) it should be relatively easy to see how all this works. For a monolithic kernel it's basically the same; except that instead of using IPC you use direct function calls and function pointers. For example, to send a request to a file system the VFS might do "fs_control_data->handle_read_request_function( info );" and to send a reply back to VFS the file system might do "VFS_handle_read_reply ( info );". When direct function calls are being used like this you can cheat a little by passing the same "request info structure" from VFS to FS to storage driver (as requests are made at each level) and then from storage driver to FS to VFS (as replies come back from each level), so VFS and file systems don't necessarily need to keep track of the requests while they're "in progress at a lower layer".
Cheers,
Brendan