Hi,
tabz wrote:
From what you've all said, I'm assuming I should change my heap allocator to ensure that the addresses I return are a multiple of 16-bytes (or does the size of the memory allocated have to be a multiple of 16-bytes)? If so, then an issue I see is how you find the related header when freeing, as you don't know by how many bytes you had to push the address forward when allocating.
You should always be able to do "headerAddress = dataAddress - headerSize;" when freeing.
Note that "malloc()" was completely broken for alignment. For example, if you happen to be using SSE then you want 16-byte alignment; but if you're using AVX2 you want 32 byte alignment and if you're using AVX512 you want 64 byte alignment; and sometimes (e.g. arrays of larger data structures) you want "cache line size alignment" (likely to also be 64 byte); and sometimes (e.g. maybe for a large number of short strings where padding could double the amount of RAM used) you might not want alignment at all.
To fix this there were multiple non-standard messes, then POSIX defined a standard "
int posix_memalign(void **memptr, size_t alignment, size_t size);" function. If you support any flavour of "aligned malloc" then the old "malloc()" can just be a wrapper, like:
Code:
void *malloc(size_t size) {
void *address;
if(posix_memalign(&address, 16, size) == 0) return address;
return NULL;
}
Also; for large allocations modern implementations typically bypass the heap's pool and use the virtual memory manager's interface directly (e.g. when the size is larger than maybe 256 KiB, "malloc()" might just use "mmap()" without looking for a free block in its pool at all). This allows you do to things like large page optimisations (e.g. if the size is many MiB, "mmap()" might allocate 2 MiB pages).
Finally; if you're not hampered by C library compatibility there are multiple other problems with "malloc()" that you don't have to suffer from - e.g. you can have multiple pools to improve cache locality; and you can give each pool hints/attributes to control whether the physical pages should be encrypted or not, or tied to a NUMA domain, etc; and you can have some control of which allocation strategy (first fit, best fit, ...) should be used; and you can add information for debugging (e.g. give each allocated block an optional "pointer to name string" so you figure out the cause of memory leaks easily, etc).
Cheers,
Brendan