Single address space in Long mode

quanganht · Post by **quanganht** » Wed Jan 06, 2010 6:46 am

Since Long mode requires Paging enabled, I guess I will have to Identity map the whole system memory. What do you think of it? Good of bad idea?

thepowersgang · Post by **thepowersgang** » Wed Jan 06, 2010 6:57 am

In my experience, unless you are doing an OS like a console kernel (single process only), doing complete identity mapping is a bad idea. Heck, even if there will only be one "address space", you would still want to have it be contiguous, without having to worry about the copious amount of holes in the x86 physical memory space.

So, in short, BAD IDEA!

AJ · Post by AJ » Wed Jan 06, 2010 7:02 am

/agree

Unless you have a really specific reason for doing so, I wouldn't identity map the whole of the physical memory.

Cheers,
Adam

quanganht · Post by **quanganht** » Wed Jan 06, 2010 7:25 am

Thank you all.
So are there any alternative way to implement flat memory model in Long mode?

thepowersgang · Post by **thepowersgang** » Wed Jan 06, 2010 10:03 am

Well, long mode is inherently "flat" (meaning there are no segments).

I think what you mean is contiguous (so there is memory from 0 to as far as you can go)
That should be relatively easy, just make a memory manager and the call MM_Allocate(0), MM_Allocate(0x1000), ... to fill the address space.

AndrewAPrice · Post by **AndrewAPrice** » Wed Jan 06, 2010 10:37 pm

All you have to do is identity map the amount of memory you have in the system (to avoid memory mapping the entire range and wasting space with unnecessary entries).

quanganht · Post by **quanganht** » Wed Jan 06, 2010 11:53 pm

MessiahAndrw wrote:All you have to do is identity map the amount of memory you have in the system (to avoid memory mapping the entire range and wasting space with unnecessary entries).

That's what I mean! LOL. In the first post, I said "the whole *system* memory", not the whole range of virtual memory.

So, any pointer made by any process will be globally correct, right?

quanganht · Post by **quanganht** » Thu Jan 07, 2010 8:25 am

I recently found an interesting point in AMD64 Manual - Vol.2:

Code: Select all

As a result, SYSCALL and SYSRET can take fewer than
one-fourth the number of internal clock cycles to complete than the legacy CALL and RET
instructions. SYSCALL and SYSRET are particularly well-suited for use in 64-bit mode, which
requires implementation of a paged, flat-memory model.

Are they really that fast? The manual also mentioned about "paged, flat memory model" in 64-bit.

Owen · Post by **Owen** » Thu Jan 07, 2010 11:16 am

I presume it means ones through call gates?

Brendan · Post by **Brendan** » Thu Jan 07, 2010 6:49 pm

Hi,

quanganht wrote:
MessiahAndrw wrote:All you have to do is identity map the amount of memory you have in the system (to avoid memory mapping the entire range and wasting space with unnecessary entries).
That's what I mean! LOL. In the first post, I said "the whole *system* memory", not the whole range of virtual memory.

I'd assume MessiahAndrew meant "contiguously mapped" (e.g. RAM from 0x000000 to EBDA mapped to virtual addresses 0x000000 to "x", RAM from 0x00100000 to the first hole mapped to virtual addresses "x" to "x + y", etc).

quanganht wrote:So, any pointer made by any process will be globally correct, right?

Yes. The problem is getting multiple processes to share an address space (without segmentation), which means using position independent code. If all RAM is contiguous in the virtual address space then there's also virtual address space fragmentation issues and problems implementing certain optimisations (swap space, memory mapped files, "copy on write", etc); and no easy way to implement protection/isolation (without something complex like software isolation; any process can trash any other process, and there's no way for a process to prevent other processes from accessing sensitive data like the user's passwords, etc).

quanganht wrote:I recently found an interesting point in AMD64 Manual - Vol.2:

Code: Select all

As a result, SYSCALL and SYSRET can take fewer than
one-fourth the number of internal clock cycles to complete than the legacy CALL and RET
instructions. SYSCALL and SYSRET are particularly well-suited for use in 64-bit mode, which
requires implementation of a paged, flat-memory model.

Owen wrote:I presume it means ones through call gates?

I'd assume it means call gates..

quanganht wrote:Are they really that fast? The manual also mentioned about "paged, flat memory model" in 64-bit.

Yes, and no.

The SYSCALL/SYSRET instructions themselves (for at least some CPUs in some situations) probably are 4 times faster than call gates (which are typically a little faster than software interrupts). This is because a call gate (or a software interrupt) involves fetching several entries from the GDT (and IDT for software interrupts) which adds extra overhead (and potential cache misses).

However, the number of cycles taken by the SYSCALL/SYSRET instructions alone isn't the entire story. Extra code (and additional overhead) may be needed to make SYSCALL/SYSRET as secure as a call gate or software IRQ. For a specific example (and a big warning), as far as I can tell it's possible for user level code to set RSP to "kernel space" before doing SYSCALL and tricking the kernel into trashing it's own data. For example, try doing "mov rsp,0xFEDCBA9876543210" or "mov rsp,8" before SYSCALL...

To avoid potential problems the kernel's SYSCALL handler would need to either check RSP before it uses the caller's stack, or switch to a "known good" stack at the start of the SYSRET handler. For an example (with numbers I made up), if a call gate costs 40 cycles, then SYSCALL might cost 10 cycles (4 times faster), and doing "mov rbp, rsp; mov rsp,[gs:safe_stack]" then "mov rsp,rbp" might add another 10 cycles (to make SYSCALL secure); so the end result is 20 cycles (only twice as fast as a call gate).

Cheers,

Brendan

quanganht · Post by **quanganht** » Fri Jan 08, 2010 8:35 am

Hi,

Brendan wrote:If all RAM is contiguous in the virtual address space then there's also virtual address space fragmentation issues and problems implementing certain optimisations (swap space, memory mapped files, "copy on write", etc); and no easy way to implement protection/isolation

Well, I agree about the position independent code thing, but for now, it shouldn't be a big problem. Modern CPUs are said to support it, and make it run just as if they are position dependent (in term of performance).

Fragmentation can be solved by using some kind of block(slab?). So, up on application's request, system allocator will give it a chunk of memory, say 1MB, or even 1GB. Then application is free to do anything with it. This is kind of similar to exokernel idea, where application have it's own memory allocator (internal allocator). It helps eliminate memory fragmentation and as application knows how to use memory in it's most efficient way, global performance is increased.

Another optimization that only SAS can have is shared memory. IPC, RPC, shared data is *very* efficient in SAS because data is placed in one place only, then owner can pass it's pointer to anyone

quanganht wrote:any pointer made by any process will be globally correct

Similar to this http://forum.osdev.org/viewtopic.php?f= ... 93#p170267

Plus, SYSCALL/SYSRET requires flat, paged memory model. That is doubled speed over call gate/SW interrupt. Well 20 cycles times some million calls is something really different

And about this

Brendan wrote:and no easy way to implement protection/isolation

I doubt it. IMO we only have to watch out page tables, and mark pages with permissions corresponding to applications.

Any ideas?

PS: Thanks Brendan for a very enthusiastic reply

Brendan · Post by **Brendan** » Sat Jan 09, 2010 4:16 am

Hi,

quanganht wrote:Fragmentation can be solved by using some kind of block(slab?). So, up on application's request, system allocator will give it a chunk of memory, say 1MB, or even 1GB. Then application is free to do anything with it.

That works perfectly fine for a normal OS, because a normal OS allocates space (and not RAM). For "contiguously mapped RAM" you can't give every process a large slab because you end up wasting heaps of RAM. For example, the computer I'm using now isn't doing too much, but there's about 50 processes running. If you give each process 1 GiB then you'll need to use significant amounts of swap space (but you can't implement swap space for "contiguously mapped RAM" either).

Probably the easiest way around the problem is to have (the equivalent of) a global heap; but then the RAM allocated by a single process will be scattered all over the place (not so good for cache/TLB locality). Of course that's assuming you can prevent heap corruption.

A better idea is to not use "contiguously mapped RAM" to begin with - that solves all the problems.

quanganht wrote:Another optimization that only SAS can have is shared memory. IPC, RPC, shared data is *very* efficient in SAS because data is placed in one place only, then owner can pass it's pointer to anyone

Boring old fashioned OSs (e.g. Unix clones) have been using shared memory (without SAS) for several decades.

The optimisation that you're thinking of means that you can pass a pointer to something in shared memory rather than passing an offset from the start of shared memory. For e.g. (for a more conventional OS) the first process can do "offset = pointer - address_of_shared_memory" and send the offset, and the second process (the receiver) can do "pointer = offset +address_of_shared_memory"; and instead of this a SAS OS can just send the pointer. It saves you one subtraction and one addition, or about 2 cycles on a modern CPU (until the receiver has to check if the pointer it received is sane, where you need to do 2 comparisons instead of one).

The main advantage (the only significant advantage) that SAS has is eliminating the cost of switching virtual address spaces (e.g. TLB misses). Unfortunately this advantage is often over-estimated.

The TLBs in a modern CPU aren't large enough to cover the entire virtual address space (even with "large pages" they aren't enough to cover 4 GiB of the virtual address space). The TLBs only cover the most recently used areas of the address space. This means a SAS OS runs one process (and the TLB fills with entries for that process), then the SAS OS switches to a different process (and almost all of the TLB entries for the old process and aren't used, and get replaced by TLB entries for the new process, and you get the same number of TLB misses). Of course that's a "worst case".

If the OS is constantly switching between "n" processes, and if those processes use less data than the TLBs cover, then a SAS OS does avoid lots of TLB misses. That's the "best case".

quanganht wrote:
quanganht wrote:any pointer made by any process will be globally correct
Similar to this http://forum.osdev.org/viewtopic.php?f= ... 93#p170267

Plus, SYSCALL/SYSRET requires flat, paged memory model. That is doubled speed over call gate/SW interrupt. Well 20 cycles times some million calls is something really different

SYSCALL/SYSRET doesn't a require flat paged memory model - for a 32-bit OS it'll work without paging and you can still use a limited amount of segmentation (e.g. for data segments). Long mode requires flat paged memory model (therefore a flat paged memory model is required for SYSCALL/SYSRET in long mode).

Of course "requires a paged flat memory model" only means you can't rely on segmentation and have to use paging (but it doesn't matter *how* you use paging - it could be a monolithic OS with a virtual address space for each process, or a SAS OS, or any other alternative).

quanganht wrote:And about this
Brendan wrote:and no easy way to implement protection/isolation
I doubt it. IMO we only have to watch out page tables, and mark pages with permissions corresponding to applications.

"Mark pages with permissions corresponding to applications."???

If you've got a single address space for all processes, then you could mark all "user-level pages" that the currently running process shouldn't be able to access as "not present" or "supervisor" (e.g. during the task switch); but if you do that you'll need to flush/invalidate all of the TLB entries that you modify (and if you need to flush all of the TLB entries then it's easier and faster to use multiple address spaces and just change CR3 instead).

For a SAS OS, the only sane options are:

are no security/protection at all - e.g. a games machine or something where no data matters much, or an embedded system where you can guarantee that all the code that can be run is "safe" (where all the code is extremely well tested and stored in ROM or something).
Software isolation. This could mean processes run inside a virtual machine - e.g. byte-code that is interpreted, or compiled while it runs (dynamic translation/JIT), or compiled when it's installed.

For these options, software isolation is probably the most interesting, but it's a massive amount of work for a small (potential) performance improvement (unless you can recycle someone else's work).

Cheers,

Brendan

jal · Post by **jal** » Tue Jan 12, 2010 7:36 am

Brendan wrote:For a specific example (and a big warning), as far as I can tell it's possible for user level code to set RSP to "kernel space" before doing SYSCALL and tricking the kernel into trashing it's own data. For example, try doing "mov rsp,0xFEDCBA9876543210" or "mov rsp,8" before SYSCALL...

Of course, no sane OS will put kernel data on the user stack, and a switch to kernel stack will always be the first thing the SYSCALL handler will do. SYSENTER has an explicit ESP stored in an MSR, but that means the scheduler must update the MSR when switching tasks.

JAL

Combuster · Post by **Combuster** » Tue Jan 12, 2010 7:50 am

jal wrote:but that means the scheduler must update the MSR when switching tasks.

Not necessarily. You can just have it contain a pointer to a valid temporary stack, then update ESP manually. (instead of finding out the value for ESP, then bothering the slow WRMSR with it)

quanganht · Post by **quanganht** » Tue Jan 12, 2010 8:13 am

Do you use SAS, Combuster?

OSDev.org

Single address space in Long mode

Single address space in Long mode

Re: Single address space in Long mode

Re: Single address space in Long mode

Re: Single address space in Long mode

Re: Single address space in Long mode

Re: Single address space in Long mode

Re: Single address space in Long mode

Re: Single address space in Long mode

Re: Single address space in Long mode

Re: Single address space in Long mode

Re: Single address space in Long mode

Re: Single address space in Long mode

Re: Single address space in Long mode

Re: Single address space in Long mode

Re: Single address space in Long mode