2021-04-17 - DataTypes: Bits

DataTypes

Binary Digits or Bits are the simplest data type used by computers. They are can either have a value of 0 or 1 and all digital data is based on them. How the data is actually stored depends on what you are storing it on. Inside of a computer bits are stored and transmitted using voltage levels. The actual voltages and which state represents which value are system and situation dependant but in all cases there are two states and one state is a 0 while the other is a 1. Hard drives, tape drives and floppy disks use magnetic polarity to encode bits. Optical media like CDs and DVDs use pits and the absence of pits to encode bits. As long as you have something that can have one of two states it can be used to store or transmit a bit.

but a bit on its own isn’t that useful as it only has two values so in most cases you have a series of bits. The combination of the states of these bits is used to encode data using a variety of formats. The number of possible states is calculated as 2 to the power of the number of bits you have. If you have 1 bit that’s 2 to the power of 1 or 2 states (0, 1). If you have 4 bits that’s 2 to the power of 4 or 16 states (0000, 00001, 0010, 0011, 0100, 0101, 0110, 0111, 1000, 1001, 1010, 1011, 1100, 1101, 1110, 1111). What meaning you give to these states depends on what you are using them to represent. We’ll get more into that in later parts. For now I want to talk about terms for groupings of bits.

Bytes

The meaning of a byte is determined by the system you are using but typically it’s the number of bits required to store a single character on the system and/or the minimum addressable number of bits. Typically on modern computers a byte is 8 bits but other systems may use different values. For example a large number of mainframe computers had 6-bit characters and so they used 6 bit bytes. The 8-bit byte comes from ASCII representations which use 8 bits and the use of 8-bit CPUs for early microcomputers.

The unambiguous term for 8 bits is an Octet

Words

Again the meaning of a word is determined by the system but it is typically the native size of the registers, single value memory locations, inside of the CPU. Usually but not always this is also the size of the data bus, circuit paths coming from the CPU used to send/receive data and the size of the address bus, circuit paths coming from the CPU used to specify which memory location is being written to or read from. For example modern 64-bit CPUs have 64-bit registers, excluding large multi-value registers, and 64-bit wide address and data buses. Although this isn’t universal, for example the Intel 8088 used in the original IBM PC has 16-bit registers but an 8-bit data bus and a 20-bit address bus.

The meaning of a Word can also be determined by the software environment you are running. For example in windows development a Word is always 16 bits even on 64-bit versions of the operating system. This is because windows started as a 16-bit OS and to maintain backwards compatibility the meaning hasn’t been updated.

Larger

Larger collections of bits are usually specified using prefixes although this can be confusing as historically two prefix schemes have been used.

The SI unit system uses a set of prefixes corresponding to powers of 10. k or kilo means 10^3 or 1000, M or Mega means 10^6 or 1,000,000, G or Giga means 10^9 or 1,000,000,000 etc. These prefixes with the standard meanings have been used for collections of bits and bytes but often a binary version is used. In the that version k = 2^10 or 1024, M = 2^20 or 1,048,576, G = 2^30 or 1,073,741,824 etc. Note that these values are close but not the same as their decimal counterparts. This can lead to confusion, for example Hard Drive manufactures often report sizes using decimal prefixes while windows reports them using binary prefixes. This is how a 250 GB hard drive can turn into a 232 GB drive.

To deal with this confusion an alternative prefix system has been developed that is exclusively binary. ki or kibi means 2^10, Mi or Mebi means 2^20, Gi or Gibi mean 2^30 etc. This system is slowly catching on as it removes confusion but it’s no where near universal.

When using abbreviated units a lowercase b means bits and an uppercase B means bytes. So MiB is a mebibyte while a Mib is a mebibit. You can multiply or divide by the size of a byte on your current system to convert between them.

2021-04-17 - DataTypes: Bits

DataTypes

Bytes

Words

Larger

Comments: