GhelloWorld wrote:
SpyderTL, you are completely right and with an alignment of 4 the driver works as it should, finally! And Ben, are you suggesting I should completely remove the malloc function and just only use the aligned version? And if so for all the memory or only the memory that has to do with hardware?
Thanks to both of you guys for the help
No, I have both, a regular malloc(sz) and an amalloc(sz, a). The first one allocates memory anyway it can while the second allocates memory with an alignment of at least 'a'. What I am saying is that your regular malloc(sz) should align all memory on a minimum of 16 bytes, period. This will cause speed increases on many levels.
For example, your malloc(sz) must have some sort of table/linked list/whatever to indicate if a block of memory is used or not, correct? Just make sure that each available block starts on a 16-byte boundary.
My heap allocator starts with a count of X sized blocks aligned on an X sized boundary with a minimum count of "(X*2) / X" blocks. Then the very next block is sized greater than X but an even divisible of 2, with the same count agreement, etc. This guaranties that all memory is aligned on its size. For example:
High Address:
[128k][128k][128k][128k][128k][128k][128k][128k][128k][128k][128k][128k]
...
[64k][64k][64k][64k][64k][64k][64k][64k][64k][64k][64k][64k][64k][64k][64k]
...
[32k][32k][32k][32k][32k][32k][32k][32k][32k][32k][32k][32k][32k][32k][32k]
...
[4096][4096][4096][4096][4096][4096][4096][4096][4096][4096][4096][4096]
...
[512][512][512][512][512][512][512] [512][512][512][512][512][512][512]
[256][256][256][256][256][256][256][256][256][256][256][256][256][256]
[128][128][128][128][128][128][128][128][128][128][128][128][128][128]
[64][64][64][64][64][64][64][64][64][64][64][64][64][64][64][64][64][64]
[32][32][32][32][32][32][32][32][32][32][32][32][32][32][32][32][32][32]
Low Address:
Each block is either used or free and a bitmap is used for this purpose. These blocks are aligned so that if I need an alignment of Y, any block that has a size of Y or greater will be guaranteed to be aligned on a Y boundary. It is up to your kernel to determine how many [32]'s, [64]'s, [128]'s, etc. to build for each task, obviously a lot more smaller blocks than larger blocks are needed. The task's info file (registry, .INF file, whatever) can store information about this so that next time it is loaded, it can allocate the count used. (i.e.: If more [64]'s were needed than allocated at load time, next time allocate more [64]'s and less of something else. Keep a running total. Most of the time a task will use the same amount of memory each time it is loaded. Mostly)
My allocator also keeps track of which block in each sized row was last unallocated (freed) to that I can allocate that one next. This way, that memory just might still be in the cache line, speeding up the next process to use that memory block.
This is all explained in one of my books (
http://www.fysnet.net/the_system_core.htm, Chapter 13).
When you are writing your heap allocator, remember that it takes a considerable amount of time to allocate memory if you don't do it correctly, and alignment has a lot to do with it.
Thanks,
Ben