valou3433 wrote:
Hi !
I'm trying to 'merge' my i386 kernel and my ARM rapsberry pi kernel, and it's been really fun so far to support multiple architectures/platforms.
On both archs, i have paging enabled and kernel mapped in higher half.
I was wondering : on i386, you're allowed to have 4MiB pages directly in page directory (if PSE activated), and there is kind of the same 'feature' on ARM : you're allowed to have "sections" of 1MiB directly into your table.
From a performance point of view, is it worth to use only those instead of pages tables ?
Mapping is faster and resolving addresses should be too because there's one (or more) less level of indirection.
(unless caching makes all of that useless ? i don't really know, i think caching 1 big entry should still be faster, and on context switch we invalidate anyway...)
I can see why for processes you might want to alloc less memory (a daemon process or tiny process using 4 MiB RAM is big, because there might be an enormous amount of those)
but at least for the kernel : is it worth to map it using those "big" sections ? Memory is cheap those days and wasting 3 MiB seems worth it to me...
If someone has a kernel with multiple ports, i would love to hear from you if you've done memory allocation performance benchmark.
Anyway, i'm waiting on your comments on this matter (as my kernel is not too "evolved", mapping with "big" sections is completely ok for now, but maybe with multiple processes i'll get in trouble wasting too much RAM or anything...)
Thanks for reading !
It will be very much worth it, especially for static mappings such as the kernel text and data. The other places that benefit might be MMIO, such as PCI BAR or framebuffer mapping.
I would also say that it is probably not worth it for user paging, but I certainly haven't benchmarked it. The thinking is while memory is cheap, you might be wasting multiple MiB per process (each mapping that isn't a multiple of 4MB will waste on average 2MB.) A bash shell might have four mapping in different mods, such as the following from a shell on my machine:
Code:
$ cat /proc/$$/maps
559da3882000-559da38b1000 r--p 00000000 00:1b 378746 /usr/bin/bash
559da38b1000-559da3990000 r-xp 0002f000 00:1b 378746 /usr/bin/bash
559da3990000-559da39ca000 r--p 0010e000 00:1b 378746 /usr/bin/bash
559da39ca000-559da39ce000 r--p 00147000 00:1b 378746 /usr/bin/bash
559da39ce000-559da39d7000 rw-p 0014b000 00:1b 378746 /usr/bin/bash
559da39d7000-559da39e2000 rw-p 00000000 00:00 0
559da4118000-559da42c0000 rw-p 00000000 00:00 0 [heap]
7f0fb2627000-7f0fb2b98000 r--p 00000000 00:1b 371248 /usr/lib/locale/locale-archive
7f0fb2b98000-7f0fb2b9b000 rw-p 00000000 00:00 0
7f0fb2b9b000-7f0fb2bc7000 r--p 00000000 00:1b 367052 /usr/lib/x86_64-linux-gnu/libc.so.6
7f0fb2bc7000-7f0fb2d5b000 r-xp 0002c000 00:1b 367052 /usr/lib/x86_64-linux-gnu/libc.so.6
7f0fb2d5b000-7f0fb2daf000 r--p 001c0000 00:1b 367052 /usr/lib/x86_64-linux-gnu/libc.so.6
7f0fb2daf000-7f0fb2db0000 ---p 00214000 00:1b 367052 /usr/lib/x86_64-linux-gnu/libc.so.6
7f0fb2db0000-7f0fb2db3000 r--p 00214000 00:1b 367052 /usr/lib/x86_64-linux-gnu/libc.so.6
7f0fb2db3000-7f0fb2db6000 rw-p 00217000 00:1b 367052 /usr/lib/x86_64-linux-gnu/libc.so.6
7f0fb2db6000-7f0fb2dc3000 rw-p 00000000 00:00 0
7f0fb2dc3000-7f0fb2dd1000 r--p 00000000 00:1b 59814 /usr/lib/x86_64-linux-gnu/libtinfo.so.6.2
7f0fb2dd1000-7f0fb2ddf000 r-xp 0000e000 00:1b 59814 /usr/lib/x86_64-linux-gnu/libtinfo.so.6.2
7f0fb2ddf000-7f0fb2ded000 r--p 0001c000 00:1b 59814 /usr/lib/x86_64-linux-gnu/libtinfo.so.6.2
7f0fb2ded000-7f0fb2df1000 r--p 00029000 00:1b 59814 /usr/lib/x86_64-linux-gnu/libtinfo.so.6.2
7f0fb2df1000-7f0fb2df2000 rw-p 0002d000 00:1b 59814 /usr/lib/x86_64-linux-gnu/libtinfo.so.6.2
7f0fb2df2000-7f0fb2df4000 rw-p 00000000 00:00 0
7f0fb2e03000-7f0fb2e0a000 r--s 00000000 00:1b 373594 /usr/lib/x86_64-linux-gnu/gconv/gconv-modules.cache
7f0fb2e0a000-7f0fb2e0b000 r--p 00000000 00:1b 367038 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
7f0fb2e0b000-7f0fb2e33000 r-xp 00001000 00:1b 367038 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
7f0fb2e33000-7f0fb2e3d000 r--p 00029000 00:1b 367038 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
7f0fb2e3d000-7f0fb2e3f000 r--p 00032000 00:1b 367038 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
7f0fb2e3f000-7f0fb2e41000 rw-p 00034000 00:1b 367038 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
7ffd0df57000-7ffd0df78000 rw-p 00000000 00:00 0 [stack]
7ffd0df78000-7ffd0df7c000 r--p 00000000 00:00 0 [vvar]
7ffd0df7c000-7ffd0df7e000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 --xp 00000000 00:00 0 [vsyscall]
Each writeable mapping will potentially require a 4MB copy. 4MB vs 4KB granularity might well save you some TLB cache entries, but result in more memory pressure causing pages to spill to secondary storage. So while the CPU can lookup the page mapping much quicker with such big pages, the increased I/O might obviate all that advantage and then some.