AndrewAPrice wrote:
This is super specific to my OS. I noticed switching from PIO to DMA didn't noticeably improve performance. I've built a microkernel so I added simple profiling a the syscall level. I discovered my bottleneck is mapping and unmapping shared memory into the driver.
My disc driver has the opposite problem. When I use PIO, I will need to map the physical address in the cache to the linear address space of the disc driver, while DMA and bus-mastering doesn't need any mapping. Thus, the best performance will be achieved with modern devices that use bus-mastering through PCI.
AndrewAPrice wrote:
Times when I can use pure DMA (write straight into the destination buffer without a temp buffer in-between), I can skip mapping at all. I still probably need to define an API to increment the reference count of the shared memory even if not being mapped into the driver, for the potential that an application could free the shared memory mid-read and it gets recycled and another program crashes because the physical memory gets filled with junk.
I keep reference counts along with the physical addresses in the disc cache.
AndrewAPrice wrote:
Times when I discovered I have to use a temp buffer:
- When the destination buffer can't be accessed within 32-bit space.
That might need copying. Another case is the USB disc driver. Unless it has a physical interface to the USB schedule, it too will need buffering. Currently, my USB driver will need copying the data. This is complicated by different USB hardware types having different support for 64-bit addresses.
AndrewAPrice wrote:
- When we're not reading an entire sector aligned chunk from the disk. (E.g. if I only want to read 1KB of a 2KB sector, I can't DMA into the user program's memory without overriding 1KB of data elsewhere.)
- When the sector would be copied across page boundaries. (My kernel doesn't guarantee user memory is contiguous in physical memory.)
These are both non-issues in my design since the application cannot control where or how much data is read from disk.
AndrewAPrice wrote:
Google Skia/fontconfig/libjpeg are all using memory mapped files (implementing this was a big performance improvement -
read about my journey here) and thankfully since page faults cause page aligned file reads, these are good candidates for DMA without buffering.
I have memory mapping files & executables in my "pipe-line", although this will only work effectively if the file systems are 4k aligned.