OSDev.org

The Place to Start for Operating System Developers
It is currently Thu Mar 28, 2024 8:25 am

All times are UTC - 6 hours




Post new topic Reply to topic  [ 10 posts ] 
Author Message
 Post subject: Questions about userspace FS interaction
PostPosted: Sun Mar 21, 2021 5:27 am 
Offline
Member
Member

Joined: Sun Apr 05, 2020 1:01 pm
Posts: 182
Hi, I recently finished implementing an ahci driver and now starting to work on the VFS and FAT32 as the initial filesystem, and I have a few questions:

1. I already made it so that if a thread is blocked because of a disk read/write request if anyone tries to kill it, it's deferred until that thread gets unblocked.
I now realize that it might not be enough because if a thread tries to e.g write some file I might have to fetch some other unrelated parts of the filesystem
like the file allocation table for FAT32, so essentially one userspace request gets potentially broken down into multiple read/write requests, and during those
I would expect the thread to be invulnerable so that it can complete the requested transaction atomically. So essentially my question is how do other kernels
handle the case where someone tries to kill a thread that's currently executing an important syscall like writing a file that can also yield/get blocked multiple
times during the request? Since all of that is asynchronous the kernel/scheduler must somehow recognize that although the thread is technically dead it's still
kinda inside a "critical section" and must still be scheduled until it's out of the "critical section".

2. I've also been thinking about how to go about implementing the disk cache. So far i'm leaning towards implementing it inside the filesystem, but i'm not completely sure.
Maybe it should be cached on multiple layers, both disk and each filesystem? What do you think is the best way to go about this?

Would really appreciate any information I could get on this one, thanks :wink:


Top
 Profile  
 
 Post subject: Re: Questions about userspace FS interaction
PostPosted: Sun Mar 21, 2021 5:53 am 
Offline
Member
Member

Joined: Thu May 17, 2007 1:27 pm
Posts: 999
You simply do not immediately kill a thread while it is in kernel space. That can wreck all kinds of havoc, not only file system inconsistencies. For example, the thread might hold a lock that will never get released.

Instead, you can wake up the thread, and make the blocking operation return the kernel-equivalent of EINTR. This error can then be propagated up the call stack until you reach the syscall handler again, where you kill the thread.

_________________
managarm: Microkernel-based OS capable of running a Wayland desktop (Discord: https://discord.gg/7WB6Ur3). My OS-dev projects: [mlibc: Portable C library for managarm, qword, Linux, Sigma, ...] [LAI: AML interpreter] [xbstrap: Build system for OS distributions].


Top
 Profile  
 
 Post subject: Re: Questions about userspace FS interaction
PostPosted: Sun Mar 21, 2021 5:59 am 
Offline
Member
Member

Joined: Sun Apr 05, 2020 1:01 pm
Posts: 182
Korona wrote:
You simply do not immediately kill a thread while it is in kernel space. That can wreck all kinds of havoc, not only file system inconsistencies. For example, the thread might hold a lock that will never get released.

Instead, you can wake up the thread, and make the blocking operation return the kernel-equivalent of EINTR. This error can then be propagated up the call stack until you reach the syscall handler again, where you kill the thread.


I do understand that, I only kill a thread when it yields execution with state set to dead, holding a kernel lock also disables interrupts so it would never yield.
But when reading a disk it would yield multiple times while the request is being processed, so you have to somehow keep track of that, right?


Top
 Profile  
 
 Post subject: Re: Questions about userspace FS interaction
PostPosted: Sun Mar 21, 2021 6:06 am 
Offline
Member
Member

Joined: Thu May 17, 2007 1:27 pm
Posts: 999
Then you can simply add another way to disable (and re-enable) killing the thread.

I'd suggest to do it the other way around though: have killing disabled entirely in the kernel, and only kill at specific points (e.g., via a possible_thread_kill() function or whatever).

_________________
managarm: Microkernel-based OS capable of running a Wayland desktop (Discord: https://discord.gg/7WB6Ur3). My OS-dev projects: [mlibc: Portable C library for managarm, qword, Linux, Sigma, ...] [LAI: AML interpreter] [xbstrap: Build system for OS distributions].


Top
 Profile  
 
 Post subject: Re: Questions about userspace FS interaction
PostPosted: Sun Mar 21, 2021 6:15 am 
Offline
Member
Member

Joined: Sun Apr 05, 2020 1:01 pm
Posts: 182
Korona wrote:
Then you can simply add another way to disable (and re-enable) killing the thread.

I'd suggest to do it the other way around though: have killing disabled entirely in the kernel, and only kill at specific points (e.g., via a possible_thread_kill() function or whatever).


Could u elaborate a bit more on what you mean by killing at specific points?


Top
 Profile  
 
 Post subject: Re: Questions about userspace FS interaction
PostPosted: Sun Mar 21, 2021 6:16 am 
Offline
Member
Member

Joined: Tue Apr 03, 2018 2:44 am
Posts: 401
8infy wrote:
Hi, I recently finished implementing an ahci driver and now starting to work on the VFS and FAT32 as the initial filesystem, and I have a few questions:

1. I already made it so that if a thread is blocked because of a disk read/write request if anyone tries to kill it, it's deferred until that thread gets unblocked.
I now realize that it might not be enough because if a thread tries to e.g write some file I might have to fetch some other unrelated parts of the filesystem
like the file allocation table for FAT32, so essentially one userspace request gets potentially broken down into multiple read/write requests, and during those
I would expect the thread to be invulnerable so that it can complete the requested transaction atomically. So essentially my question is how do other kernels
handle the case where someone tries to kill a thread that's currently executing an important syscall like writing a file that can also yield/get blocked multiple
times during the request? Since all of that is asynchronous the kernel/scheduler must somehow recognize that although the thread is technically dead it's still
kinda inside a "critical section" and must still be scheduled until it's out of the "critical section".


If by kill, do you mean the equivalent of sending a SIGINT or SIGKILL under UNIX?

The standard method of handling that in a UNIX like system would be to check for pending signals just before returning to user mode, along with having multiple sleep states for sleeping processes.

When sleeping waiting for something, a process is considered either interruptible, or uninterruptible.

An interruptible sleep process can be woken from its sleep by a signal, which will be detected and whatever operation the process doing would be short circuited. An example of an interruptible process might be a process reading from a network socket. This is considered a 'slow' operation, as we never know when the data will be available, so the sleep waiting for data is an interruptible sleep. How this interruption is handled can become complex, as you might be part way through some operation, so you have to ensure either you can undo partial operations, or have all the resources you need without further sleeps before starting to change state. This is a similar problem to exception safety in languages like C++.

An uninterruptible sleeping process cannot be woken from its sleep by a signal. It will finish waiting for the resource it is sleeping on. An example of an uninterruptible sleep might be waiting for a disk read I/O to complete (such as in your example above), which is considered a 'fast' operation as we know the I/O device will complete or error within a bounded time. So in your case, the operation will do whatever it does in the filesystem code to completion, then once done, it will check for pending signals before returning from the write system call, and act appropriately.

8infy wrote:
2. I've also been thinking about how to go about implementing the disk cache. So far i'm leaning towards implementing it inside the filesystem, but i'm not completely sure.
Maybe it should be cached on multiple layers, both disk and each filesystem? What do you think is the best way to go about this?

Would really appreciate any information I could get on this one, thanks :wink:


In order to provide coherence with demand paged mmap data, almost any modern OS will cache file data at the page level. So in a UNIX like system, with files managed using vnodes in a VFS, pages will be cached using the vnode/offset as the key. Then, both the read/write system calls and the virtual memory subsystem, will reference data using the vnode/offset key, and see the same data, thus ensuring coherence between data mapped into address spaces and data read/written using file handles.

This wasn't always the case, with caching in early UNIX being at the device block level, with early MMAP based systems essentially duplicating the data cached at the device block level in user page mappings, the result being that data written using write system calls would end up in the device block buffer cache, but not in paged memory mappings of the same file.

So, in answer to you question, data caching is best handled in the VFS layer, where it can be managed on a per-vnode basis, using the vnode/offset as a key.

For filesystem meta-data, you also need a buffer mechanism that operates under this file cache layer. It can also, if you prefer, use the same vnode/offset cache as the data cache, which might be useful to avoid duplicating code, but you have to be careful about double mapping file data, so that it doesn't get cached at both the file vnode level and the device vnode level. Block devices also don't always use page sized blocks of data, so for that reason as well it might be worth having a different device block buffer interface distinct from the page cache interface.


Top
 Profile  
 
 Post subject: Re: Questions about userspace FS interaction
PostPosted: Sun Mar 21, 2021 6:27 am 
Offline
Member
Member

Joined: Sun Apr 05, 2020 1:01 pm
Posts: 182
thewrongchristian wrote:
8infy wrote:
Hi, I recently finished implementing an ahci driver and now starting to work on the VFS and FAT32 as the initial filesystem, and I have a few questions:

1. I already made it so that if a thread is blocked because of a disk read/write request if anyone tries to kill it, it's deferred until that thread gets unblocked.
I now realize that it might not be enough because if a thread tries to e.g write some file I might have to fetch some other unrelated parts of the filesystem
like the file allocation table for FAT32, so essentially one userspace request gets potentially broken down into multiple read/write requests, and during those
I would expect the thread to be invulnerable so that it can complete the requested transaction atomically. So essentially my question is how do other kernels
handle the case where someone tries to kill a thread that's currently executing an important syscall like writing a file that can also yield/get blocked multiple
times during the request? Since all of that is asynchronous the kernel/scheduler must somehow recognize that although the thread is technically dead it's still
kinda inside a "critical section" and must still be scheduled until it's out of the "critical section".


If by kill, do you mean the equivalent of sending a SIGINT or SIGKILL under UNIX?

The standard method of handling that in a UNIX like system would be to check for pending signals just before returning to user mode, along with having multiple sleep states for sleeping processes.

When sleeping waiting for something, a process is considered either interruptible, or uninterruptible.

An interruptible sleep process can be woken from its sleep by a signal, which will be detected and whatever operation the process doing would be short circuited. An example of an interruptible process might be a process reading from a network socket. This is considered a 'slow' operation, as we never know when the data will be available, so the sleep waiting for data is an interruptible sleep. How this interruption is handled can become complex, as you might be part way through some operation, so you have to ensure either you can undo partial operations, or have all the resources you need without further sleeps before starting to change state. This is a similar problem to exception safety in languages like C++.

An uninterruptible sleeping process cannot be woken from its sleep by a signal. It will finish waiting for the resource it is sleeping on. An example of an uninterruptible sleep might be waiting for a disk read I/O to complete (such as in your example above), which is considered a 'fast' operation as we know the I/O device will complete or error within a bounded time. So in your case, the operation will do whatever it does in the filesystem code to completion, then once done, it will check for pending signals before returning from the write system call, and act appropriately.


I think I understand what you mean, so if it's an interruptible sleep the scheduler would interrupt it on a signal, and if it's not the thread would check if it got an e.g SIGKILL while it was in an uninterruptible state and kill itself on return?


Top
 Profile  
 
 Post subject: Re: Questions about userspace FS interaction
PostPosted: Fri Mar 26, 2021 6:55 am 
Offline
Member
Member

Joined: Tue Apr 03, 2018 2:44 am
Posts: 401
8infy wrote:
thewrongchristian wrote:
8infy wrote:
Hi, I recently finished implementing an ahci driver and now starting to work on the VFS and FAT32 as the initial filesystem, and I have a few questions:

1. I already made it so that if a thread is blocked because of a disk read/write request if anyone tries to kill it, it's deferred until that thread gets unblocked.
I now realize that it might not be enough because if a thread tries to e.g write some file I might have to fetch some other unrelated parts of the filesystem
like the file allocation table for FAT32, so essentially one userspace request gets potentially broken down into multiple read/write requests, and during those
I would expect the thread to be invulnerable so that it can complete the requested transaction atomically. So essentially my question is how do other kernels
handle the case where someone tries to kill a thread that's currently executing an important syscall like writing a file that can also yield/get blocked multiple
times during the request? Since all of that is asynchronous the kernel/scheduler must somehow recognize that although the thread is technically dead it's still
kinda inside a "critical section" and must still be scheduled until it's out of the "critical section".


If by kill, do you mean the equivalent of sending a SIGINT or SIGKILL under UNIX?

The standard method of handling that in a UNIX like system would be to check for pending signals just before returning to user mode, along with having multiple sleep states for sleeping processes.

When sleeping waiting for something, a process is considered either interruptible, or uninterruptible.

An interruptible sleep process can be woken from its sleep by a signal, which will be detected and whatever operation the process doing would be short circuited. An example of an interruptible process might be a process reading from a network socket. This is considered a 'slow' operation, as we never know when the data will be available, so the sleep waiting for data is an interruptible sleep. How this interruption is handled can become complex, as you might be part way through some operation, so you have to ensure either you can undo partial operations, or have all the resources you need without further sleeps before starting to change state. This is a similar problem to exception safety in languages like C++.

An uninterruptible sleeping process cannot be woken from its sleep by a signal. It will finish waiting for the resource it is sleeping on. An example of an uninterruptible sleep might be waiting for a disk read I/O to complete (such as in your example above), which is considered a 'fast' operation as we know the I/O device will complete or error within a bounded time. So in your case, the operation will do whatever it does in the filesystem code to completion, then once done, it will check for pending signals before returning from the write system call, and act appropriately.


I think I understand what you mean, so if it's an interruptible sleep the scheduler would interrupt it on a signal, and if it's not the thread would check if it got an e.g SIGKILL while it was in an uninterruptible state and kill itself on return?


Exactly.

Interruptible system calls, like read(), will return EINTR if they're interrupted using the above mechanism (if the signal doesn't kill the process.)

On some systems, such as BSD, such system calls are restarted by default, so the calling code may never see the EINTR, and the system call would be called again after the signal has been delivered and handled in the calling process.

SysV, on the other hand, doesn't restart system calls by default, and returns EINTR.

In both, the behavior can be specified using the POSIX sigaction() and the SA_RESTART flag when specifying how to handle a signal.

POSIX also defines how partial operations are handled as well. Since POSIX.1-2001, if an interrupted read() had partially read something into the user provided buffer, then a short read is returned instead of EINTR. Prior to POSIX.1-2001, this behavior was left unspecified, and SysV would return EINTR and BSD would return a short read.


Top
 Profile  
 
 Post subject: Re: Questions about userspace FS interaction
PostPosted: Sat Mar 27, 2021 2:15 pm 
Offline
Member
Member

Joined: Wed Oct 01, 2008 1:55 pm
Posts: 3191
thewrongchristian wrote:
For filesystem meta-data, you also need a buffer mechanism that operates under this file cache layer. It can also, if you prefer, use the same vnode/offset cache as the data cache, which might be useful to avoid duplicating code, but you have to be careful about double mapping file data, so that it doesn't get cached at both the file vnode level and the device vnode level. Block devices also don't always use page sized blocks of data, so for that reason as well it might be worth having a different device block buffer interface distinct from the page cache interface.


This is a real problem. If you map device data for a file in user mode you are potentially exposing other information from the file system that doesn't belong to the file. The typical sector size is 512, while the page size is 4k, meaning that eigth sectors fit into a page. If you map the disc device on 4k boundaries in the device cache, then there is no garantee that file contents will start a offset 0 in a page. Also, with smaller cluster sizes, file data can be scattered in different clusters that are at arbitrary positions in 4k pages. Meaning that you actually cannot garantee that you can map file data in a continous data area without copying it. So if you want to create a zero-copy implementation, you will need to be able to handle the case where file fragments start & and end at arbitrary 4k page positions, and you get potential security problems by exposing non-file data to user mode. One possible measure to reduce this problem is to map file data as read-only (or copy-on-write), and then it is at least possible to stop user mode from writing to non-file data parts of pages.


Top
 Profile  
 
 Post subject: Re: Questions about userspace FS interaction
PostPosted: Sat Mar 27, 2021 2:23 pm 
Offline
Member
Member

Joined: Wed Oct 01, 2008 1:55 pm
Posts: 3191
8infy wrote:
Hi, I recently finished implementing an ahci driver and now starting to work on the VFS and FAT32 as the initial filesystem, and I have a few questions:


This is a bit backwards. You need to define how your disc driver interface (and disc buffering) works before you start to write drivers. AHCI is particularly interesting in this regard since it operates on physical memory addresses and not linear. I'm reimplmenting my disc interface & drivers so they use a physical memory interface rather than a linear.

8infy wrote:
1. I already made it so that if a thread is blocked because of a disk read/write request if anyone tries to kill it, it's deferred until that thread gets unblocked.
I now realize that it might not be enough because if a thread tries to e.g write some file I might have to fetch some other unrelated parts of the filesystem
like the file allocation table for FAT32, so essentially one userspace request gets potentially broken down into multiple read/write requests, and during those
I would expect the thread to be invulnerable so that it can complete the requested transaction atomically. So essentially my question is how do other kernels
handle the case where someone tries to kill a thread that's currently executing an important syscall like writing a file that can also yield/get blocked multiple
times during the request? Since all of that is asynchronous the kernel/scheduler must somehow recognize that although the thread is technically dead it's still
kinda inside a "critical section" and must still be scheduled until it's out of the "critical section".


You cannot kill threads that are in kernel. Actually, I don't support killing threads at all, rather the thread itself must determine that it should terminate.

8infy wrote:
2. I've also been thinking about how to go about implementing the disk cache. So far i'm leaning towards implementing it inside the filesystem, but i'm not completely sure.
Maybe it should be cached on multiple layers, both disk and each filesystem? What do you think is the best way to go about this?


I implement the disk cache as a separate module.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 10 posts ] 

All times are UTC - 6 hours


Who is online

Users browsing this forum: Bing [Bot], Google [Bot] and 51 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group