Size of Data

Columbus · **Joined:** Sat Sep 13, 2014 2:26 am **Posts:** 18

Why is the smallest addressable data always 1 Byte or 8 Bits wide?
Why hasn't someone extended it to 16 Bits?
Wouldn't that reduce the size and/or complexity of some mechanics.
Maybe one could introduce a 32 bit wide "Byte" (smalles addressable unit).

What may be the reasons, and how do you think about it?

NickJohnson · **Posted:** Sat Jan 17, 2015 12:34 pm

Assuming we're talking about x86 here, you can do memory operations at 8 bit, 16 bit, 32 bit, 64 bit, and 128 bit (using SSE), often with arbitrary alignment. There's a difference between the smallest addressable unit of memory and the only addressable unit of memory.

Columbus · **Joined:** Sat Sep 13, 2014 2:26 am **Posts:** 18

But as far as I understand, the CPU loads values byte by byte. (Or am I mistaken?)
So wouldn't it be an improvement to extend its size?

AndrewBuckley · **Joined:** Thu Jan 29, 2009 9:13 am **Posts:** 95

memory operations are not limited to 8 bits wide, cpus are very optimized to allow loading a full 32 bit or 64 bit number from memory at once. a "Byte" was not always 8 bits, but now that we have 45 years worth of code that have assumed an 8 bit Byte, any change to this would break everything. Multi-Byte wide math operations have rules for every architecture that say how to efficiently use them, and the compiler is the one that maps your code to them.

NickJohnson · **Posted:** Sat Jan 17, 2015 1:25 pm

Modern processors definitely don't load memory on a byte-by-byte basis; the reality is much more complex. At the very least, the smallest unit of data transfer between main memory and the last level of cache (L2 or L3) is the size of a last level cache line, which is (e.g. on Haswell) 64 *bytes* per line, or 512 bits. This ignores prefetching, unaligned accesses, and other latency-hiding mechanisms that might increase how much data is transferred.

So, in reality, the processor is not capable of doing single-byte transfers or even 128-bit transfers to/from memory. The real transfers are much larger.

alexfru · **Joined:** Tue Mar 04, 2014 5:27 am **Posts:** 1108

Columbus wrote:

Why is the smallest addressable data always 1 Byte or 8 Bits wide?

Why not? It's handy, not too small, not too large.

Columbus wrote:

Why hasn't someone extended it to 16 Bits?

How do you know? A number of Texas Instruments' DSP CPUs have their smallest addressable unit equal to 16 bits.

Columbus wrote:

Wouldn't that reduce the size and/or complexity of some mechanics.

In the CPU? Absolutely. Now take the software side. Suddenly you need to waste extra instructions and cycles to extract and pack 8-bit quantities out of and into 16-bit ones. Or you just waste half the available memory.

Columbus wrote:

Maybe one could introduce a 32 bit wide "Byte" (smalles addressable unit).

Maybe one could read up on CPUs and find out that such CPUs have existed in the past and exist today?

Columbus wrote:

What may be the reasons, and how do you think about it?

Is this your homework?

willedwards · **Joined:** Sat Mar 15, 2014 3:49 pm **Posts:** 96

The 8-bit byte is legacy from the IBM 360.

Before the 360 there were lots of other variations such as by-bit, decimal digits or word-addressable (for different word sizes).

Nearly 50 years is so much legacy that its not viable to challenge it: you have to interact with the rest of the world, which is firmly 8-bit.

Historically x86 and ARM CPUs have penalized non-aligned loads, but minimizing this penalty has been getting attention in recent years and the gap has narrowed.

Most new SIMD instruction sets require aligned loads.

Brendan · **Posted:** Sat Jan 17, 2015 8:53 pm

Hi,

NickJohnson wrote:

Modern processors definitely don't load memory on a byte-by-byte basis; the reality is much more complex. At the very least, the smallest unit of data transfer between main memory and the last level of cache (L2 or L3) is the size of a last level cache line, which is (e.g. on Haswell) 64 *bytes* per line, or 512 bits. This ignores prefetching, unaligned accesses, and other latency-hiding mechanisms that might increase how much data is transferred.

Yes - modern 80x86 typically loads cache lines.

NickJohnson wrote:

So, in reality, the processor is not capable of doing single-byte transfers or even 128-bit transfers to/from memory. The real transfers are much larger.

For "uncached" areas (e.g. memory mapped IO) the CPU will read/write individual bytes when software tells it to; including areas of RAM that are configured as "uncached" (e.g. the firmware's SMM area). In addition, it's possible to force the CPU to write a byte to RAM even for "write-back" cached areas (e.g. by using a MASKMOVDQU instruction where all bytes are masked except one that's followed by an SFENCE or MFENCE).

Columbus wrote:

Why is the smallest addressable data always 1 Byte or 8 Bits wide?
Why hasn't someone extended it to 16 Bits?

There are CPUs that don't allow misaligned loads, that are only capable of (e.g.) accessing 16 bits or 32 bits of data from RAM. The problem is that you end up emulating byte accesses in software (e.g. doing a "load, shift, mask" instead of a 1-byte load, and doing a "load, mask, shift, or, store" instead of a 1-byte store) and it ends up being significantly slower. The alternative is for software to never use anything smaller than the CPU's minimum access size (e.g. "CHAR_BITS == 16" in C) which can waste a lot of memory and make caches less efficient and it ends up being significantly slower.

Columbus wrote:

Wouldn't that reduce the size and/or complexity of some mechanics.
Maybe one could introduce a 32 bit wide "Byte" (smalles addressable unit).

It would reduce the complexity of the CPU a little (and make the CPU slower in practice). However, CPU manufacturer's are trying to do the opposite - they've got a budget of "many millions of transistors" and are trying to find ways of using those transistors to improve performance. For an example, Intel's Haswell CPUs are using around 1.4 billion transistors, and Apple's A8 chip (which contains an ARMv8 core) is using around 2 billion transistors. They can afford to use a few extra transistors to improve the performance of byte accesses.

Cheers,

Brendan

SpyderTL · **Joined:** Sun Sep 19, 2010 10:05 pm **Posts:** 1074

Also, I still use a lot of Boolean values when I'm coding, so I'm already wasting 7 bits every time I store a 0 or a 1 value in a byte. Larger "bytes" would just make the problem worse...

willedwards · **Joined:** Sat Mar 15, 2014 3:49 pm **Posts:** 96

Quote:

Also, I still use a lot of Boolean values when I'm coding, so I'm already wasting 7 bits every time I store a 0 or a 1 value in a byte. Larger "bytes" would just make the problem worse...

Well, if you are programming in C you can use bit width specifiers on integers e.g.

Code:

int my_bool:1;

However, conventional wisdom is its usually slower.

onlyonemac · **Joined:** Sat Mar 01, 2014 2:59 pm **Posts:** 1146

When I designed my own CPU archietechture, I did at one point consider using 16-bit addressable data units, but then I realised that I would still need to split them into bytes for comaptibility with e.g. the ASCII system. So basically, there are still too many things that still work with single-byte data units, so it would really just be too awkward to try to restrict a system to using larger data units only. It just wouldn't be feasible with the way that computing has developed to not be able to address single-byte data units. Plus what about the wasted disk space when storing small numbers?

(For the CPU I decided that memory accesses would access the given address for the low byte and the next address for the high byte, thus each memory byte can form either the low or the high byte of a 16-bit word.)

OSDev.org

Size of Data

Who is online