ngx wrote:
Ok thanks. I thought of an algorithm and would be grateful if you could point out some things that I could change in it:
So I would have a pointer to the start of the heap(4096 aligned) right after the end of the executable and when a user requests for a page I would just check if the pointer points to a free page and if yes mark it as used and return or otherwise move forwards until a free page is found and do the same thing, if multiple pages are required then there will be a place found where all of them fit one after another. If a page is freed then it's present bit will be removed and it will become free and will be added to the linked list. The linked list of free pages will be used by the allocator to allocate a page if the user only requests for one page, otherwise the pages will be allocated starting from the heap pointer.
I still don't get it as you're mixing free pages with allocation. You shouldn't. So "free page" only exists in context of physical memory allocator, but not on other levels. At VMM, there's no free page, just mapped ones. At libc malloc level, you have allocated and free areas of arbitrary size (not page sized). If the application frees memory, then it is up to the allocator if the mapped pages are returned to the PMM or not. With dlmalloc, you cannot return the pages because you have free memory pointers on them. On the other hand if the application allocates again, there's no need to call the VMM and the PMM (which can be slow), because pages are already mapped into the address space, and the allocator does not need extra memory to keep track of memory because pointers are simply stored in free memory areas.
Here's how it's supposed to go:
1. at start, heap points to the end of the executable
2. the application requests (let's say) 256 bytes of memory
3. your allocator tries to allocate that on the heap, but since nothing is mapped after the executable, writing the end of free memory pointer causes a page fault
4. the page fault handler calls the PMM for a free physical page, and maps it to the address space after the executable
5. the execution is returned to the allocator, which knows nothing about the memory mapping
6. the allocator sets the free pointer to heap + 256 and returns the previous pointer (heap)
(Alternatively after step 2 the allocator could have checked that the end of the address space is at the end of the executable, so it could have called sbrk/mmap to map more memory and not relying on the page fault handler.)
Let's say the application allocates another 512 bytes:
1. the allocator adds 512 to the current free pointer (which is heap + 256), and returns the previous pointer (heap + 256)
2. there's no mapping involved, because page is already mapped after the executable
Now let's say the application frees the first 256 bytes:
1. the allocator places the current free pointer (heap + 256 + 512) at the area being freed
2. points the current free memory pointer to that area
3. page isn't unmapped, it's still mapped in the application's address space
Now memory looks like this: after the executable, there's a 256 bytes free memory, with a pointer to heap + 256 + 512. The next 512 bytes (not included in the free memory list) is allocated. The list's tail points to heap + 256 + 512, meaning there's no more free memory holes, just at the end of the heap.
If the application tries to allocate memory again, then the allocator would return the pointer in the first area (if that's big enough), and would set the current free pointer to the pointer in that area. If the allocation is bigger then the free area, then the allocator would allocate at the end, just like it did with the 512 bytes allocation.
The simple solution (walk the free list from the chain's head to the tail until you find a big enough area or add the allocation to the tail if not found) is O(n), the more areas you have, the slower it became. However you could keep track of the sizes in multiple lists instead, and then you'd only have to choose the list with big enough areas, or use the tail. This could be O(1) (because you always use the first entry in the list, no walking through the chain). So instead
Code:
head -> block at 0 size 256 -> block at 768 end of chain
You could have
Code:
head[size 8] -> end
head[size 16] -> end
head[size 32] -> end
...
head[size 256] -> block at 0 -> end
head[size 512] -> end
...
overall tail -> block at 768 end of chain
What data structure and algorithm you use is totally up to you. There's no good solution here, only compromises: whether you prefer fast alloc and slower free or not so fast alloc and not so slow free; use more memory but faster alloc or less memory but slower alloc etc. It all depends on your priorities and design decisions, what's the best for your design.
Cheers,
bzt