RTL8139 Sending issues on real hw.

GhelloWorld · **Joined:** Thu Apr 19, 2018 5:31 am **Posts:** 27

Hello Everyone,

When I finally thought that my rtl8139 driver was working perfectly another and hopefully the last problem occurred. I decided to look at the issue a couple times before posting here because I had already posted a issue with this driver a couple weeks back, unfortunately I can not figure this one out. So the situation is as followed. The driver now receives properly on my real pc but it has a weird behavior when sending data. When I send for example a dhcp packet over the network there are 2 bytes added to the packet I send, let me explain.
This is what I suspect to be send:

Quote:

FF FF FF FF FF FF
45 00 01 48 00 01
40 00 FF FF FF FF...

But instead I capture this with wireshark:

Quote:

00 00 FF FF FF FF FF FF
45 00 01 48 00 01 40 00
FF FF FF FF...

I have no idea why this is happening, I have already tried to zero out the memory and tried a lot of different configurations for the device but that did not fix it. In the attachment you can see the received packets, the bottom 5 should have been dhcp, you can ignore the first 2 packets. I really hope that anyone can help me with this problem since I have had way to much issues with this so called "simple device".
My code for sending:

Code:

void RTL8139::SendData(uint8_t* data, uint32_t len)
{
    if(len > 1536)
    {
        printf("Packet to big to send!\n");
        while(1);
    }

    void* transfer_data = MemoryManager::activeMemoryManager->malloc(len);
    MemoryOperations::memset(transfer_data, 0, len);
    MemoryOperations::memcpy(transfer_data, data, len);

    // Second, fill in physical address of data, and length
    outportl(device->portBase + TSAD_array[tx_cur], (uint32_t)transfer_data);
    outportl(device->portBase + TSD_array[tx_cur++], len);
    if(tx_cur > 3)
        tx_cur = 0;
}

The driver code: https://github.com/Remco123/CactusOS/blob/master/src/system/drivers/rtl8139.cpp
And the repo: https://github.com/Remco123/CactusOS
Thank you in advance

BenLunt · **Posted:** Mon Sep 03, 2018 1:32 pm

Just out of curiosity, it has been a while since I have looked at that particular card, but does the buffer require a certain alignment?

For example, if your buffer is at a word boundary but the controller requires a dword boundary, it will probably ignore the bottom two bits for the address and this is where you will get the two bytes added, so to speak.

i.e.: let's say your buffer is at 0x12345676, if the controller requires a dword alignment, the actual address is now 0x12345674, two bytes before your buffer. The controller will start at these two bytes before your buffer.

Just a thought...

Ben
- http://www.fysnet.net/osdesign_book_series.htm

GhelloWorld · **Joined:** Thu Apr 19, 2018 5:31 am **Posts:** 27

Thank you Ben for your answer,

To be honest with you I was not familiar with the term memory alignment so I had to look it up. I think I now get the basic meaning of it. You were completely right the card indeed expects the memory to be double word aligned, I have probably read over it when I was reading the documentation. Using this guide (https://embeddedartistry.com/blog/2017/2/20/implementing-aligned-malloc I have created the following function for allocating aligned memory:

Code:

void* MemoryManager::aligned_malloc(common::size_t align, common::size_t size)
{
    void * ptr = NULL;

    //We want it to be a power of two since align_up operates on powers of two
    if(!(align & (align - 1)) == 0)
        return 0;

    if(align && size)
    {
        /*
         * We know we have to fit an offset value
         * We also allocate extra bytes to ensure we can meet the alignment
         */
        common::uint32_t hdr_size = sizeof(offset_t) + (align - 1);
        void * p = malloc(size + hdr_size);

        if(p)
        {
            /*
             * Add the offset size to malloc's pointer (we will always store that)
             * Then align the resulting value to the arget alignment
             */
            ptr = (void *) align_up(((common::uintptr_t)p + sizeof(offset_t)), align);

            //Calculate the offset and store it behind our aligned pointer
            *((offset_t *)ptr - 1) = (offset_t)((common::uintptr_t)ptr - (common::uintptr_t)p);

        } // else NULL, could not malloc
    } //else NULL, invalid arguments

    return ptr;

And when I tested it it actually worked!
For the first 2 packets

It managed to send a dhcp packet and successfully receive the answer from my router (thank god that works), but when I capture the packets directly I can see that the 3th and 4th packet is still broken. These are also dhcp since my laptop does not respond to the pc with my os, and I try to send a dhcp maximum 5 times before giving up. In the attachment there is again the capture file, the code for sending is the same only the malloc is replaced with the aligned_malloc function above. For clarity I call this function with a alignment of 2 (because of the double word right?). I hope that anyone can help me figure this out completely.

Thanks in advance
Remco

SpyderTL · **Joined:** Sun Sep 19, 2010 10:05 pm **Posts:** 1074

Apparently, no one can agree on the terminology, but in general, a byte is 1 byte, a word is 2 bytes, and a double-word is 4 bytes.

So, try 4.

BenLunt · **Posted:** Tue Sep 04, 2018 4:02 pm

Just a few thoughts for you.

1) as SpyderTL stated, a "double word" = "double 16-bit word" = "2 times 16-bit" = 32-bit...
2) no matter your memory allocation technique, it is wise to align all allocation on at least a 16-byte boundary. At least. No matter the usage of the memory.
3) when allocating memory for hardware, whether the specs specify it or not, you should always align all physical memory accesses to a minimum of 16-byte, 32-byte, or 64-byte boundary. However, most hardware specifications will indicate this, and will usually specify a page (4096-byte) alignment.

Hope this helps,
Ben

GhelloWorld · **Joined:** Thu Apr 19, 2018 5:31 am **Posts:** 27

SpyderTL, you are completely right and with an alignment of 4 the driver works as it should, finally! And Ben, are you suggesting I should completely remove the malloc function and just only use the aligned version? And if so for all the memory or only the memory that has to do with hardware?
Thanks to both of you guys for the help

BenLunt · **Posted:** Wed Sep 05, 2018 2:00 pm

GhelloWorld wrote:

SpyderTL, you are completely right and with an alignment of 4 the driver works as it should, finally! And Ben, are you suggesting I should completely remove the malloc function and just only use the aligned version? And if so for all the memory or only the memory that has to do with hardware?
Thanks to both of you guys for the help

No, I have both, a regular malloc(sz) and an amalloc(sz, a). The first one allocates memory anyway it can while the second allocates memory with an alignment of at least 'a'. What I am saying is that your regular malloc(sz) should align all memory on a minimum of 16 bytes, period. This will cause speed increases on many levels.

For example, your malloc(sz) must have some sort of table/linked list/whatever to indicate if a block of memory is used or not, correct? Just make sure that each available block starts on a 16-byte boundary.

My heap allocator starts with a count of X sized blocks aligned on an X sized boundary with a minimum count of "(X*2) / X" blocks. Then the very next block is sized greater than X but an even divisible of 2, with the same count agreement, etc. This guaranties that all memory is aligned on its size. For example:

High Address:
[128k][128k][128k][128k][128k][128k][128k][128k][128k][128k][128k][128k]
...
[64k][64k][64k][64k][64k][64k][64k][64k][64k][64k][64k][64k][64k][64k][64k]
...
[32k][32k][32k][32k][32k][32k][32k][32k][32k][32k][32k][32k][32k][32k][32k]
...
[4096][4096][4096][4096][4096][4096][4096][4096][4096][4096][4096][4096]
...
[512][512][512][512][512][512][512] [512][512][512][512][512][512][512]
[256][256][256][256][256][256][256][256][256][256][256][256][256][256]
[128][128][128][128][128][128][128][128][128][128][128][128][128][128]
[64][64][64][64][64][64][64][64][64][64][64][64][64][64][64][64][64][64]
[32][32][32][32][32][32][32][32][32][32][32][32][32][32][32][32][32][32]
Low Address:

Each block is either used or free and a bitmap is used for this purpose. These blocks are aligned so that if I need an alignment of Y, any block that has a size of Y or greater will be guaranteed to be aligned on a Y boundary. It is up to your kernel to determine how many [32]'s, [64]'s, [128]'s, etc. to build for each task, obviously a lot more smaller blocks than larger blocks are needed. The task's info file (registry, .INF file, whatever) can store information about this so that next time it is loaded, it can allocate the count used. (i.e.: If more [64]'s were needed than allocated at load time, next time allocate more [64]'s and less of something else. Keep a running total. Most of the time a task will use the same amount of memory each time it is loaded. Mostly)

My allocator also keeps track of which block in each sized row was last unallocated (freed) to that I can allocate that one next. This way, that memory just might still be in the cache line, speeding up the next process to use that memory block.

This is all explained in one of my books (http://www.fysnet.net/the_system_core.htm, Chapter 13).

When you are writing your heap allocator, remember that it takes a considerable amount of time to allocate memory if you don't do it correctly, and alignment has a lot to do with it.

Thanks,
Ben

OSDev.org

RTL8139 Sending issues on real hw.

Who is online