How do I keep track of which pages are free/used?

rdos · **Joined:** Wed Oct 01, 2008 1:55 pm **Posts:** 3192

ngx wrote:

Never thought of this, it is a pretty interesting idea that does not use any external data structures. But how do you find a free page, do you keep a pointer of some free page and increase it every time, as searching through the page tables could take a long time?

I have a counter for used pages, and I start the search process at that point (start address + used pages).

rdos · **Joined:** Wed Oct 01, 2008 1:55 pm **Posts:** 3192

nullplan wrote:

So if a program asks your OS for more memory, you iterate over the page tables to find a free spot in VM space to put the new memory? Seems inefficient, especially with address space as vast as it is on x86_64.

Not generally. It's typically the runtime library that allocates memory for a program, and it typically grabs some heap area from the OS and uses linked blocks for it's own purposes. Thus, malloc will not directly ask the OS for memory. This would also be wasteful if you want only a few bytes as page table allocation can only handle 4k granularity.

Thus, 4k page table allocation is a bit extraordinary for a normal application and so really doesn't affect performance to any large degree.

nullplan wrote:

I don't anticipate supporting fork() myself, I think it is a bad way to start a new process. All of its use cases can be handled by either pthread_create() or posix_spawn(). Well, almost all. But I am surprised to hear you of all people proclaim your support for fork().

I wanted to show that an OS can support both the CreateProcess interface and fork natively. I also wanted a way to exec() something within the same console and to share handles, and fork seemed useful for this. I added some optimizations so the combination of fork and exec will look like a chained CreateProcess. The exec also removes all the shared pages & reference counting stuff so I only need that between fork and exec.

nullplan wrote:

But since I do intend to allow shared memory, I shall have to implement reference counting for mapped files at least.

I once supported memory mapping files, but I'm moving away from this. It's impossible to support with a zero-copy VFS given that typical sector sizes are 512 bytes and the page size is 4k. Thus, if a file is not on continous sectors, it's possible it cannot be mapped in continous linear memory unless you copy the contents. Thus, I will memory map files in chunks based on where they reside on the physical disc.

nullplan · **Joined:** Wed Aug 30, 2017 8:24 am **Posts:** 1604

ngx wrote:

Thanks for your help. But can please point me to some resource or explain how storing the pages in a tree should work, as from your short explanation and my internet searches I haven't really understood how it could be used for managing pages?

It's just a normal binary tree, keyed by the starting address, only there is also a length. In a normal binary tree, you would search like this:

Code:

struct node *search(struct node *root, uintptr_t key) {
  while (root) {
    if (key < root->key)
      root = root->left;
    else if (key > root->key)
      root = root->right;
    else
      return root;
  }
  return 0;
}

Now in a range tree, you do it like this instead:

Code:

struct node *search(struct node *root, uintptr_t key) {
  while (root) {
    if (key < root->key)
      root = root->left;
    else if (key > root->key + root->length)
      root = root->right;
    else
      return root;
  }
  return 0;
}

This minor change affects all the other algorithms (insertion/deletion) as well. Not much more to it.

ngx wrote:

I haven't really understood what problems it could cause

Several orders of magnitude more work to do?

ngx wrote:

And also in the previous post you mentioned that you like the bitmap approach for the PMM, but how would I map the initial bitmap if I do not have a VMM yet(currently I use the linked list approach?

The VMM is only necessary when there is something to manage. Static allocations, or allocation made during early boot which remain unchanged for the entire run time, do not require the VMM. In this case, for a 64-bit kernel, it is easiest to just map all physical RAM linearly in early boot. VMM never even sees that area of VM space; it is entirely off limits to it. There's nothing to manage there. So, once you've found a physical RAM area large enough to hold the bitmap, you only add the base of the linear mapping to it, and there you go, that is your base address for the bitmap. Unless I am severely mistaken, a large enough physical memory area ought to exist unless physical memory is broken into more than 32768 pieces.

Even if you don't want to map all physical RAM (though I struggle to see why), you can just declare some virtual address to be the base of your PMM bitmap, and make it off-limits to the VMM. In 64-bit mode you just have so unbelievably much address space that you can just do that. If your design has a limit of 2^46 bytes (mine does, because the linear mapping can only take up half of kernel space at most), then the PMM bitmap would take up 2^31 bytes (get to reduce it by 12 to just use pages instead of bytes, and another three to use bits instead of bytes for recording the allocation status). So if you map that to 0xffff'c000'0000'0000, at most it would extend to 0xffff'c000'4000'0000, so you could just design your VMM to never consider addresses below 0xffff'c000'4000'0000 for allocations in kernel space.

AndrewAPrice · **Posted:** Thu Apr 01, 2021 8:23 pm

ngx wrote:

should I just search through the whole page table tree to find one that has the present bit off and then set the bit to on and map it to a physical address, but if a user needs several pages one after another, should I search for place where enough pages are free - considering that I am in long mode with PML4, it could take tons of time?

That's actually what I do right now! :lol:

When it becomes a noticable bottleneck I'm planning to migrate to a tree of free chunks, but for my first iteration I decided to brute force it and keep it simple and it's yet to be a problem. We'll see if I ever get Firefox ported over.

rpio · **Joined:** Sat Feb 20, 2021 3:11 pm **Posts:** 93

nullplan wrote:

It's just a normal binary tree, keyed by the starting address, only there is also a length. In a normal binary tree, you would search like this:

Code:

struct node *search(struct node *root, uintptr_t key) {
  while (root) {
    if (key < root->key)
      root = root->left;
    else if (key > root->key)
      root = root->right;
    else
      return root;
  }
  return 0;
}

Now in a range tree, you do it like this instead:

Code:

struct node *search(struct node *root, uintptr_t key) {
  while (root) {
    if (key < root->key)
      root = root->left;
    else if (key > root->key + root->length)
      root = root->right;
    else
      return root;
  }
  return 0;
}

This minor change affects all the other algorithms (insertion/deletion) as well. Not much more to it.

So if I understood correctly it is just a normal binary tree which contains a start and length field in each node, so each node could correspond to multiple pages. And at first the tree is empty, but with every allocation we add at which address and how many pages were allocated. So to find a new page which could be given out to the user, we just go from the root and then I can't understand - should we follow the decrease route or increase route(e.g. go to the branch which contains addresses that are lower or higher then current one each time)? And also should I use(return to the user) the page right after the last page in the route I am following ?

And also I wanted to ask - where should I store the initial VMM pages, because the pages I require at the start should map the kernel and I store kernel in the last 2GB, but mapping last 2 GB may be impossible or would take 2 MB in page tables. So is it a good idea to store the page tables in the data section like 512 x 8 sized arrays and also how could I not map the useless space if my kernel is less then 2GB (as I do not know the kernel size at runtime)?

nullplan · **Joined:** Wed Aug 30, 2017 8:24 am **Posts:** 1604

ngx wrote:

And at first the tree is empty, but with every allocation we add at which address and how many pages were allocated. So to find a new page which could be given out to the user, we just go from the root and then I can't understand - should we follow the decrease route or increase route(e.g. go to the branch which contains addresses that are lower or higher then current one each time)? And also should I use(return to the user) the page right after the last page in the route I am following ?

Binary trees have the property that an in-order walk of the tree returns all nodes in sorted order. Therefore you can do an in-order walk, recording the predecessor of each node. That way, you know exactly the size of the hole between the two allocations. which means it becomes easy to test if that hole fits the requirements of the mapping.

OK, so what do we need? First, the in-order walk function:

Code:

int walk_tree(struct node* root, int (*callback)(const struct node*, void*), void* ctx) {
  if (root->left)
    if (walk_tree(root->left, callback, ctx))
      return 1;
  if (callback(root, ctx))
    return 1;
  if (root->right)
    if (walk_tree(root->right, callback, ctx))
      return 1;
  return 0;
}

So far, so standard. The return value is only there to facilitate early return.

Now, what does the visitor have to do? If there is no predecessor, it has to test if the allocation fits between the minimum address and the start of the current VMA. Else it has to test if the allocation fits between the current and the previous node. So basically the same, but different address for "end of predecessor". If it does fit, it has to return the start address it found to the top of the caller and abort the walk. Else it has to continue the walk. If the current node has a start address past the maximum for this allocation, it can also abort the walk, since nothing fitting will be found anymore.

Also, we need to return the maximum allocated address to the caller, so the caller can test if maybe there is a suitable address in the space between the last allocation and the maximum address.

Putting it all together:

Code:

struct alloc_ctx {
  uintptr_t size, align, min, max;
  struct node *predecessor;
  uintptr_t ret_address;
};
static int alloc_visitor(const struct node* vma, void* _ctx)
{
  struct alloc_ctx *ctx = _ctx;
  uintptr_t end_of_pred = ctx->min;
  if (ctx->predecessor)
    end_of_pred = ctx->predecessor->start + ctx->predecessor->length;
  end_of_pred = (end_of_pred + ctx->align - 1) & -ctx->align;
  if (end_of_pred >= ctx->min && end_of_pred + ctx->size < vma->start && end_of_pred + ctx->size < ctx->max)
  {
    ctx->ret_address = end_of_pred;
    return 1;
  }
  ctx->predecessor = vma;
  if (vma->start + vma->length >= ctx->max)
    return 1;
  return 0;
}

Awesome. Now for the actual allocation function, we only need to initialize this algorithm, do the final check for memory at the end, and bind it all together:

Code:

static struct node *vma_root;
static uintptr_t find_free_vma(size_t size, size_t align, uintptr_t min, uintptr_t max)
{
  assert(align >= PAGE_SIZE); /* align is non-zero and at least PAGE_SIZE */
  assert(!(align & (align - 1))); /* align is also a power of two */
  size = (size + align - 1) & -align;
  struct alloc_ctx alloc_ctx = {size, align, min, max};
  walk_tree(vma_root, alloc_visitor, &alloc_ctx);
  if (alloc_ctx.ret_address)
    return alloc_ctx.ret_address;
  uintptr_t end_of_pred = min;
  if (alloc_ctx.predecessor)
    end_of_pred = alloc_ctx.predecessor->start + alloc_ctx.predecessor->length;
  end_of_pred = (end_of_pred + align - 1) & -align;
  if (end_of_pred >= min && end_of_pred + size < max)
    return end_of_pred;
  return 0;
}

OK, now the only thing left to do is to allocate a new node, fill it with information, and insert it into the tree. Which I shall leave to you.

ngx wrote:

And also I wanted to ask - where should I store the initial VMM pages, because the pages I require at the start should map the kernel and I store kernel in the last 2GB, but mapping last 2 GB may be impossible or would take 2 MB in page tables. So is it a good idea to store the page tables in the data section like 512 x 8 sized arrays and also how could I not map the useless space if my kernel is less then 2GB (as I do not know the kernel size at runtime)?

Not quite sure what you mean. Do you mean your initial page tables? Anyway, if static allocation makes you waste memory, the answer is dynamic allocation. I also allocate my page tables at run time. But that is more of a job for the PMM at that point. As for how to not map useless space: In my case, the kernel itself is an ELF binary, and gets its segments mapped like a userspace program by the stage 1 kernels, which are themselves loaded by the booting method of choice (i.e. GRUB or UEFI). I don't know how your memory layout works, you will have to figure that out yourself.

rpio · **Joined:** Sat Feb 20, 2021 3:11 pm **Posts:** 93

nullplan wrote:

Not quite sure what you mean. Do you mean your initial page tables? Anyway, if static allocation makes you waste memory, the answer is dynamic allocation. I also allocate my page tables at run time. But that is more of a job for the PMM at that point. As for how to not map useless space: In my case, the kernel itself is an ELF binary, and gets its segments mapped like a userspace program by the stage 1 kernels, which are themselves loaded by the booting method of choice (i.e. GRUB or UEFI). I don't know how your memory layout works, you will have to figure that out yourself.

Thanks for the algorithm, but I also have several questions about the page tables:

Every time I allocate a table, I call the palloc and it gives out a free frame, but that frame is not mapped into the virtual memory, so how should I change the page table(which should be stored in the frame) and add entries to it if it is unmapped which means that it couldn’t be accessed and so even the VMM could not be initialized to map it. Should I think of some certain location where they will be mapped and hard code it? - it seems like a bad approach to me, but that is the only thing I thought of.

Also I just though about it and understood that the addresses in the page tables entries that point to page tables of the lower level(e.g. PDT = PDPT[i]) are physical, which means I can’t just access them(as i have no idea where they are mapped to). So when I for example want to remap a page, how do I follow the tables, if I only have the virtual address of root PML4 table recorded then when I get to the next table I stop as it is the physical address can’t obtain values from physical address to go to the next/lower table. So how should I convert page table physical addresses into virtual to access them?

nullplan · **Joined:** Wed Aug 30, 2017 8:24 am **Posts:** 1604

ngx wrote:

Every time I allocate a table, I call the palloc and it gives out a free frame, but that frame is not mapped into the virtual memory, so how should I change the page table(which should be stored in the frame) and add entries to it if it is unmapped which means that it couldn’t be accessed and so even the VMM could not be initialized to map it. Should I think of some certain location where they will be mapped and hard code it? - it seems like a bad approach to me, but that is the only thing I thought of.

There really are only two solutions for this: Either you just map all physical RAM linearly from the start of kernel space (which I do), or you reserve some virtual address for use as temporary mapping space. The latter is less versatile: you always have to map the page, do something with it, unmap the page again. Never let it stay for too long, as you don't want the TLB to propagate to other processors. Because then you'd have to do a TLB shootdown, and those are expensive.

ngx wrote:

Also I just though about it and understood that the addresses in the page tables entries that point to page tables of the lower level(e.g. PDT = PDPT[i]) are physical, which means I can’t just access them(as i have no idea where they are mapped to). So when I for example want to remap a page, how do I follow the tables, if I only have the virtual address of root PML4 table recorded then when I get to the next table I stop as it is the physical address can’t obtain values from physical address to go to the next/lower table. So how should I convert page table physical addresses into virtual to access them?

Another problem solved with just a linear mapping of all RAM. Then translating between physical and virtual address is just an addition.

rpio · **Joined:** Sat Feb 20, 2021 3:11 pm **Posts:** 93

nullplan wrote:

There really are only two solutions for this: Either you just map all physical RAM linearly from the start of kernel space (which I do), or you reserve some virtual address for use as temporary mapping space. The latter is less versatile: you always have to map the page, do something with it, unmap the page again. Never let it stay for too long, as you don't want the TLB to propagate to other processors. Because then you'd have to do a TLB shootdown, and those are expensive.

I have also thought about it and probably it is the thing I will do - just have a 4096 bytes after the kernel for mapping page tables

nullplan wrote:

Another problem solved with just a linear mapping of all RAM. Then translating between physical and virtual address is just an addition.

I wouldn’t identity map all of the memory forever, but it would be a good idea for VMM initialization. The problem is that is that still I have no way to store initial page tables, except having them as array of addresses right in the kernel c code(in the other case they would just be unmapped). And if I have the initial page tables in the kernel, it would be much easier to do the mapping of kernel at higher half then identity mapping(or they will take equal amount of work). Or is there another solution and I forgot something?

nullplan · **Joined:** Wed Aug 30, 2017 8:24 am **Posts:** 1604

ngx wrote:

I wouldn’t identity map all of the memory forever, but it would be a good idea for VMM initialization. The problem is that is that still I have no way to store initial page tables, except having them as array of addresses right in the kernel c code(in the other case they would just be unmapped). And if I have the initial page tables in the kernel, it would be much easier to do the mapping of kernel at higher half then identity mapping(or they will take equal amount of work). Or is there another solution and I forgot something?

You can just allocate that memory dynamically. Your boot environment is likely either 32-bit non-paged or 64-bit identity mapped. In both cases, once you determine a physical address to be free, you can just start using it as a virtual address. You can allocate and initialize whatever page tables that way. That's how I initialize my page tables.

What I don't get is your aversion to adding a linear memory map. It solves so many problems, even with different hardware than just the CPU. I'm not asking you to keep the identity map around; you likely need that part of VM for something else, but a linear map really helps.

rpio · **Joined:** Sat Feb 20, 2021 3:11 pm **Posts:** 93

nullplan wrote:

Binary trees have the property that an in-order walk of the tree returns all nodes in sorted order. Therefore you can do an in-order walk, recording the predecessor of each node. That way, you know exactly the size of the hole between the two allocations. which means it becomes easy to test if that hole fits the requirements of the mapping.

Thanks for your help with algorithm, am I right that I should record the used pages in the tree(so every allocation I should add the pages there)? I should record the start of first allocated page and the end of last?

rdos · **Joined:** Wed Oct 01, 2008 1:55 pm **Posts:** 3192

nullplan wrote:

What I don't get is your aversion to adding a linear memory map. It solves so many problems, even with different hardware than just the CPU. I'm not asking you to keep the identity map around; you likely need that part of VM for something else, but a linear map really helps.

Are you suggesting that the kernel use identity mapped pages for MMIO devices so it can easily convert between physical and linear addresses? I would find that a bit of a nightmare. Now your kernel can thrash basically everything, even things that are isolated in other address spaces since you have direct access to all physical memory. And if you allocate pages for kernel, then you cannot use simple addition to convert to/from physical addresses anymore, rather will need to read page tables (linear to physical) or know which pages to scan (physical to linear).

nullplan · **Joined:** Wed Aug 30, 2017 8:24 am **Posts:** 1604

rdos wrote:

Are you suggesting that the kernel use identity mapped pages for MMIO devices so it can easily convert between physical and linear addresses?

No I am not. I am saying to use a linear mapping. Why do people always mix that up? I want to build a higher-half OS, so of course I'm going to need the lower half of VM space to be free.

rdos wrote:

I would find that a bit of a nightmare. Now your kernel can thrash basically everything, even things that are isolated in other address spaces since you have direct access to all physical memory.

It's the kernel, it has access to all physical memory, anyway. It also can thrash the page tables, and that is going to be fun to debug. Well, nobody ever said open-heart surgery was simple.

rdos wrote:

And if you allocate pages for kernel, then you cannot use simple addition to convert to/from physical addresses anymore, rather will need to read page tables (linear to physical) or know which pages to scan (physical to linear).

What? Physical to virtual is always going to be an addition. Always. Because all memory is mapped in the linear mapping. Virtual to physical might be more complicated, yes, but that direction is required less often.

bzt · **Joined:** Thu Oct 13, 2016 4:55 pm **Posts:** 1584

nullplan wrote:

It's the kernel, it has access to all physical memory, anyway.

That's right.

nullplan wrote:

Physical to virtual is always going to be an addition. Always.

Yes, that's correct too.

nullplan wrote:

Virtual to physical might be more complicated, yes, but that direction is required less often.

Not necessarily complicated. Some architectures (like the ARM for example) provide instructions to convert virtual to physical address for you in an absolutely non-complicated way. And yes, you probably only need that when you are freeing the page tables on process exit, because most of the time you can let the MMU do the conversion for you as you don't need to know the actual physical address (only the pointed data).

Cheers,
bzt

rdos · **Joined:** Wed Oct 01, 2008 1:55 pm **Posts:** 3192

nullplan wrote:

rdos wrote:

Are you suggesting that the kernel use identity mapped pages for MMIO devices so it can easily convert between physical and linear addresses?

No I am not. I am saying to use a linear mapping. Why do people always mix that up? I want to build a higher-half OS, so of course I'm going to need the lower half of VM space to be free.

Yes, you map all physical ram to linear addresses so you can go between the two, which means you expose all of physical memory to tampering.

nullplan wrote:

rdos wrote:

And if you allocate pages for kernel, then you cannot use simple addition to convert to/from physical addresses anymore, rather will need to read page tables (linear to physical) or know which pages to scan (physical to linear).

What? Physical to virtual is always going to be an addition. Always. Because all memory is mapped in the linear mapping. Virtual to physical might be more complicated, yes, but that direction is required less often.

Physical to linear will get you to the identity mapping area, not the linear address you are using providing your device is not using the identity mapping directly. So, it might solve the problem of reading physical memory, but it doesn't solve the problem of what virtual data structure it belongs to.

I have a special allocator for allocating MMIO addresses that records the physical & linear base and therefore quickly can convert both from physical to linear and linear to physical. It doesn't expose all of physical RAM so it can be read like in non-paging environments, and it's not even accessed using flat selectors.

OSDev.org

How do I keep track of which pages are free/used?

Who is online