rod wrote:
When we have processes with multiple threads, one thread might alter the memory mapping (sbrk, etc.) mapping or unmapping some pages, and the kernel, and other threads concurrently running (using SMP) would need to know about the change. The change could originate from userspace and affect other userspace threads, and kernel threads, or originate from the kernel and affect both spaces likewise.
It sounds like you're implying that userspace threads have access to their page tables, but generally they shouldn't for security reasons. For example, a malicious process could map each physical page on the system into their address space in tern and search for suspicious data.
rod wrote:
E.g.: one thread is running the write() system call and the kernel has already validated the memory range, and starts to read data from userspace. Then another thread of the same process calls the sbrk() or similar system call and unmaps some pages that happen to be some or the ones holding the data of write().
Pages aren't just unmapped when they're freed with something like VirtualFree/sbrk. The memory manager will be constantly unmapping pages that haven't been accessed recently to reduce the size of the system's working set - how much RAM is in use. Once they're unmapped the memory manager can move them into the swap file/partition, and then the physical pages can be zeroed for reuse. When the process, or the kernel, tries to access the pages again a page fault will be generated and the data can be read back into RAM, and the pages will be mapped.
rod wrote:
Then the kernel, that was still copying the data, might get an exception or might be reading from physical memory that is already mapped to other processes, etc.
The kernel won't end up reading from physical memory mapped to another process, since it'll be reading from the calling process's address space. And page faults should be expected - mapping is not the same as allocating. As noted above, as the kernel accesses the buffer, memory might be read from secondary storage, demand zero pages may be mapped, etc.
In terms of allocating and freeing memory, there are only a few things that can happen:
- The original memory region is still allocated - intended buffer is read/modified by the kernel.
- The region is freed and nothing has been allocated in its place - the page faults cannot be resolved.
- The original region is freed and a new region overlaps it - the wrong buffer is read/modified by the kernel.
The only problematic case is the second, and there are 2 solutions I can see.
- Prevent any memory regions in use by system calls from being freed.
- When the kernel has an unresolvable page fault in a user's address space, it can longjmp out of the system call.
The first solution is likely a lot easier to implement, and it's what I do in my kernel. It also prevents the third case from above happening. Every time a memory region is referenced by a system call, a counter is incremented, and decremented when the call ends. If a userspace process attempts to free a region with a nonzero counter, it crashes (why would you want to free memory you've passed to a system call, anyway?). It also helps with asynchronous file IO, where you increment the counter when the request is placed, and decrement the counter once the request is finished and everything's copied into the user's address space.
rod wrote:
I think there can be variations about who initiates the change and whom it affects like: kernel-kernel, kernel-userspace, userspace-kernel, and userspace-userspace.
Userspace processes shouldn't be allowed to read from or write to anything in the kernel's address space, again for security reasons (except for a few special cases, maybe such as getting the time).
rod wrote:
Because I think that by writing directly to the page tables (especially when unmapping), other threads might not know about the change and might be using the old mapping.
This is a problem, but not for the reason you're thinking of. When you modify page tables you need to invalidate the TLB entry for the mapping on each processor. This is called a TLB shootdown. You should send an IPI to all processors that might have a TLB entry for the modified page mapping (be careful if you're using PCID) and get them to INVLPG on the changed virtual address. You should try to avoid sending unnecessary IPIs, such as when a page transitions from invalid to valid, since you can just invalidate the page in the page fault handler and return (some processors automatically invalidate the TLB entry for an invalid page and you'll never even get the page fault).