Skip to content

Latest commit

 

History

History
375 lines (337 loc) · 17 KB

Integers.md

File metadata and controls

375 lines (337 loc) · 17 KB

Integers

Table of Contents

  1. Integer Types
  2. Binary Addition
  3. Endianness
  4. Two's Complement
  5. Binary Subtraction

Python exercises:   PY1   PY2   PY3   PY4

Integer Types

The following portable integer types are defined by the stdint.h header file standardized by POSIX.1-2017 which is equivalent to IEEE Std 1003.1-2017 and The Open Group Base Specifications Issue 7, 2018 edition.

Std Type C Type Bits Range (MIN .. MAX)
uint8_t unsigned char 8 0 .. 255
int8_t char 8 -128 .. 127
uint16_t unsigned short/int 16 0 .. 65'535
int16_t short/int 16 -32'768 .. 32'767
uint32_t unsigned int/long 32 0 .. 4'294'967'295
int32_t int/long 32 -2'147'483'648 .. 2'147'483'647
uint64_t unsigned long long 64 0 .. 18'446'744'073'709'551'615
int64_t long long 64 -9'223'372'036'854'775'808 .. 9'223'372'036'854'775'807

The size of the C types unsigned int, int, unsigned long and long depends on the processor architecture (16, 32 or 64 bit). Therefore if exact size is an issue, the use of the standard types is strongly recommended.

Binary Addition

The ASCII graphics below shows the binary addition of the two uint8_t numbers x = 70 and y = 50 resulting in z = 120.

    7   6   5   4   3   2   1   0    binary digit index
  +---+---+---+---+---+---+---+---+
0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 |  carry bits
  +---+---+---+---+---+---+---+---+
  | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 |  x = 70
  +---+---+---+---+---+---+---+---+    +
  | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 |  y = 50
  +---+---+---+---+---+---+---+---+
  | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 |  z = x + y = 120
  +---+---+---+---+---+---+---+---+

Next we add the two uint8_t numbers x = 255 and y = 1 causing an overflow.

    7   6   5   4   3   2   1   0    binary digit index
  +---+---+---+---+---+---+---+---+
1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 |  carry bits
  +---+---+---+---+---+---+---+---+
  | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |  x = 255
  +---+---+---+---+---+---+---+---+    +
  | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |  y = 1
  +---+---+---+---+---+---+---+---+
  | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |  z = x + y mod 256 = 0
  +---+---+---+---+---+---+---+---+

Since the carry bit generated by the addition of the most-significant bits with index 7 that indicates an overflow is discarded, a modulo 2^N operation with N = 8 bits takes place so that z = 256 mod 256 = 0 results.

Python 1: In the following examples we explore the cyclic overflow behaviour of the standard unsigned integer types by using the Python ctypes library.

We add the two uint8_t numbers x = 70 and y = 50 resulting in z = 120

>>> from ctypes import *
>>> x = c_uint8(70)
>>> y = c_uint8(50)
>>> z = c_uint8()
>>> format(x.value, '08b')
'01000110'
>>> format(y.value, '08b')
'00110010'
>>> z.value = x.value + y.value
>>> format(z.value, '08b')
'01111000'
>>> print(z)
c_ubyte(120)

We cause an overflow by adding 1 to an uint8_t set to UINT8_MAX

>>> from ctypes import *
>>> x = c_uint8(255)       # uint8_t set to UINT8_MAX
>>> y = c_uint8(1)
>>> z = c_uint8()
>>> format(x.value, '08b')
'11111111'
>>> format(y.value, '08b')
'00000001'
>>> z.value = x.value + y.value
>>> format(z.value, '08b')
'00000000'
>>> print(z)
c_ubyte(0)

We cause an underflow by subtracting 1 from an uint16_t set to 0

>>> from ctypes import *
>>> x = c_uint16()          # uint16_t set to 0
>>> print(x)
c_ushort(0)                 # uint16_t maps to an unsigned short
>>> x.value -= 1
>>> print(x)
c_ushort(65535)
>>> format(x.value, '#06x')
'0xffff'

We cause an overflow by adding 10 to an uint32_t set to UINT32_MAX

>>> from ctypes import *
>>> x = c_uint32(0xffffffff)  # uint32_t set to UINT32_MAX
>>> print(x)
c_uint(4294967295)            # uint32_t maps to an unsigned int
>>> x.value += 10
>>> print(x)
c_uint(9)

We cause an overflow by squaring an uint64_t set to 2^32

>>> from ctypes import *
>>> x = c_uint64(0x100000000) # uint64_t
>>> print(x)
c_ulong(4294967296)           # uint64_t maps to an unsigned long
>>> x.value *= x.value
>>> print(x)
c_ulong(0)

Endianness

A 32 bit word used to store an unsigned or signed integer (uint32_t or int32_t, respectively) comprises four ordered bytes of memory:

word32 = byte3 * 2^24  +  byte2 * 2^16  +  byte1 * 2^8  +  byte0

byte0 is the Least Significant Byte (LSB) and byte3 is the Most Significant Byte (MSB). On computer platforms there are two fundamentally different ways how these bytes are addressed in memory:

Big-Endian

addr      addr+1    addr+2    addr+3
+---------+---------+---------+---------+
|  byte3  |  byte2  |  byte1  |  byte0  |
+---------+---------+---------+---------+
  MSB                            LSB

Big-endian order is equivalent to network order where the MSB byte is transmitted first over a network connection and the LSB last. Thus if a big-endian 32 bit word starts at address addr in memory then the MSB can be retrieved at the word address addr and the LSB with an offset of 3 bytes at addr+3.

Little-Endian

addr      addr+1    addr+2    addr+3
+---------+---------+---------+---------+
|  byte0  |  byte1  |  byte2  |  byte3  |
+---------+---------+---------+---------+
  LSB                            MSB

If a little-endian 32 bit word is stored at address addr in memory then the LSB can be retrieved at the word address addr and the MSB with an offset of 3 bytes at addr+3. If a little-endian multi-byte word has to be sent over a network connection in binary format then usually a conversion from host order to network order has to take place on the transmitting side and probably a conversion back to host order on the receiving side. The standard C library offers the following conversion functions:

uint32_t htonl(uint32_t hostlong);
uint16_t htons(uint16_t hostshort);

uint32_t ntohl(uint32_t netlong);
uint16_t ntohs(uint16_t netshort);

All Intel, AMD and RISC-V processors are little-endian whereas most ARM and Power PC processors are bi-endian, i.e. they can be configured by a software switch to work either in big-endian or little-endian mode.

Python 2: By using ctypes address pointers and the ctypes cast mechanism, we try to get at the individual bytes of word structures

             memory address  value  variable   cast 1     cast 2
             ---------------------+----------+----------+---------+
pointer -->  0x7f513d85ac48 | bb  |    x     | x16_p[0] | x8_p[0] |
                            |     |          |          + --------+
             0x7f513d85ac49 | aa  |          |          | x8_p[1] |
             ---------------------+----------+----------+---------+

We access the individual bytes of a uint16_t word and see that the storage order is little-endian. By exchanging LSB and MSB the uint16_t word is converted from host order to network order.

>>> from ctypes import *
>>> uint16_p = POINTER(c_uint16)    # define uint16_t pointer type
>>> uint8_p = POINTER(c_uint8)      # define uint8_t  pointer type
>>> x = c_uint16(0xaabb)            # set uint16_t word to 0xaabb
>>> x_addr = addressof(x)           # get memory address of x
>>> hex(x_addr)
'0x7f513d85ac48'                    # 48 bit memory address of x
>>> x16_p = cast(x_addr, uint16_p)  # cast address to uint16_t pointer
>>> hex(x16_p[0])                   # get first and only uint16_t array element
'0xaabb'
>>> x8_p = cast(x_addr, uint8_p)    # cast address to uint8_t pointer
>>> hex(x8_p[0])                    # get first uint8_t array element
'0xbb'                              # LSB is output
>>> hex(x8_p[1])                    # get second uint8_t array element
'0xaa'                              # MSB is output
>>> tmp = x8_p[0]                   # put LSB into temporary storage
>>> x8_p[0] = x8_p[1]               # MSB and LSB exchange positions
>>> x8_p[1] = tmp                   # conversion to network order completed
>>> hex(x16_p[0])
'0xbbaa'                            # LSB and MSB are not in host order anymore

We access the individual bytes of a uint32_t word and again Little-Endianness is observed.

             memory address  value  variable   cast
             ---------------------+----------+---------+
pointer -->  0x7f0a21408c48 | dd  |    x     | x8_p[0] |
                            |     |          + --------+
             0x7f0a21408c49 | cc  |          | x8_p[1] |
                            |     |          + --------+
             0x7f0a21408c4a | bb  |          | x8_p[2] |
                            |     |          + --------+
             0x7f0a21408c4b | aa  |          | x8_p[3] |
             ---------------------+----------+---------+
>>> from ctypes import *
>>> uint8_p = POINTER(c_uint8)      # define uint8_t  pointer type
>>> x = c_uint32(0xaabbccdd)        # set uint32_t word to 0xaabbccdd
>>> x_addr = addressof(x)           # get memory address of x
>>> hex(x_addr)
'0x7f0a21408c48'                    # 48 bit memory address of x
>>> x8_p = cast(x_addr, uint8_p)    # cast address to uint8_t pointer
>>> hex(x8_p[0])                    # get first uint8_t array element
'0xdd'                              # LSB is output
>>> hex(x8_p[1])                    # get second uint8_t array element
'0xcc'
>>> hex(x8_p[2])                    # get third uint8_t array element
'0xbb'
>>> hex(x8_p[3])                    # get fourth uint8_t array element
'0xaa'                              # MSB is output

The same happens when accessing the individual bytes of an uint64_t

             memory address  value  variable   cast
             ---------------------+----------+---------+
pointer -->  0x7f1740f59c48 | dd  |    x     | x8_p[0] |
                            |     |          + --------+
             0x7f1740f59c49 | cc  |          | x8_p[1] |
                            |     |          + --------+
             0x7f1740f59c4a | bb  |          | x8_p[2] |
                            |     |          + --------+
             0x7f1740f59c4b | aa  |          | x8_p[3] |
                            |     |          + --------+
             0x7f1740f59c4c | 44  |          | x8_p[4] |
                            |     |          + --------+
             0x7f1740f59c4d | 33  |          | x8_p[5] |
                            |     |          + --------+
             0x7f1740f59c4e | 22  |          | x8_p[6] |
                            |     |          + --------+
             0x7f1740f59c4f | 11  |          | x8_p[7] |
             ---------------------+----------+---------+
from ctypes import *
>>> uint8_p = POINTER(c_uint8)
>>> x = c_uint64(0x11223344aabbccdd)
>>> x_addr = addressof(x)
>>> >>> hex(x_addr)
'0x7f1740f59c48'
>>> x8_p = cast(x_addr, uint8_p)
>>> [hex(x8_p[i]) for i in range(0, 8)]
['0xdd', '0xcc', '0xbb', '0xaa', '0x44', '0x33', '0x22', '0x11']

Two's Complement

The representation of negative or signed numbers is a non-trivial task. Instead of keeping a separate sign flag to differentiate between positive and negative numbers, electronic computers use the two's complement to encode negative numbers.

The two's complement of an N-bit number is defined as its complement with respect to 2^N. For instance with N = 4, the two's complement of the four-bit number 0010 (+2) is 1110 (-2), because 0010 + 1110 = 10000 (16). Since the carry bit for the most-significant bit of an N-bit number is discarded after an addition, which is equivalent to a modulo 2^N operation, 10000 mod 2^N equals 0000.

Cyclic Addition

Two's Complement Rule: -x = ~x + 1 The negative value -x of a number x in binary two's complement representation is computed by bitwise inversion of x and then adding 1 to the inverted value.

Thus the bit inversion of 0010 (+2) results in 1101 (-3) and by adding 1 gives the final value of 1110 (-2). A special case is zero with its representation of 0000 that gets inverted to 1111 (-1) and by adding 1 becomes 0000 again.

Graphically the negation of a number can be interpreted as mirroring the circle position of x on the horizontal axis (thick blue line in the picture above) and then rotating the flipped value ~x by one position on the circle counter-clockwise.

Generally addition by one is equivalent to a counter-clockwise rotation of the number position on the circle whereas subtraction by one translates into a clockwise rotation.

When the largest representable positive value is exceeded during addition, an overflow occurs and the result takes on the largest representable negative value. E.g. adding 1 to 0111 (+7) results in 1000 (-8).

It is also evident from the figure above that the most-significant bit of a negative number in two's complement representation is always set to 1.

Python 3: In the following examples we explore the cyclic overflow behaviour of the standard signed integer types by using the Python ctypes library.

We cause an overflow by adding 1 to an int8_t set to INT8_MAX

>>> from ctypes import *
>>> x = c_int8(0b01111111)       # int8_t set to INT8_MAX
>>> print(x)
c_byte(127)
>>> x.value += 1
>>> print(x)
c_byte(-128)

We cause an underflow by subtracting 1 from an int16_t set to INT16_MIN

>>> from ctypes import *
>>> x = c_int16(0x8000)          # int16_t set to INT16_MIN
>>> print(x)
c_short(-32768)                  # int16_t maps to a short
>>> x.value -= 1
>>> print(x)
c_short(32767)

We add anint32_t set to INT32_MAX and an int32_t set to INT32_MIN to each other

>>> from ctypes import *
>>> x_max = c_int32(0x7fffffff)  # int32_t set to INT32_MAX
>>> x_min = c_int32(0x80000000)  # int32_t set to INT32_MIN
>>> print(x_max)
c_int(2147483647)                # int32_t maps to an int
>>> print(x_min)
c_int(-2147483648)
>>> y = c_int32()
>>> y.value = x_max.value + x_min.value
>>> print(y)
c_int(-1)

We add INT64_MIN = -2^63 to an int64_tvalue set to 2^62

>>> from ctypes import *
>>> x = c_int64(1 << 62)         # int64_t
>>> print(x)
c_long(4611686018427387904)      # int64_t maps to a long
>>> x.value += (1 << 63)
>>> print(x)
c_long(-4611686018427387904)

Binary Subtraction

On an electronic computer the binary subtraction z = x - y is implemented by adding the two's complement -y of the minuend y to the subtrahend x resulting in the difference z.

    7   6   5   4   3   2   1   0    binary digit index
  +---+---+---+---+---+---+---+---+
1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 |  carry bits
  +---+---+---+---+---+---+---+---+
  | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 |  x = 120
  +---+---+---+---+---+---+---+---+    -
  | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 |  y =  50
  +---+---+---+---+---+---+---+---+
  | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | -y = -50
  +---+---+---+---+---+---+---+---+
  | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 |  z = x + (-y) = 70
  +---+---+---+---+---+---+---+---+

Python 4: We show that binary substraction is equivalent to binary addition of the minuend's two's complement.

>>> from ctypes import *
>>> x = c_uint8(120)
>>> y = c_uint8(50)
>>> ym = c_uint8(-50)             # set two's complement -y
>>> z = c_uint8()
>>> format(x.value, '08b')
'01111000'
>>> format(y.value, '08b')
'00110010'
>>> format(ym.value, '08b')
'11001110'                        # two's complement of y
>>> z.value = x.value - y.value   # subtraction of y
>>> print(z)
c_ubyte(70)
>>> z.value = x.value + ym.value  # addition of -y
>>> print(z)
c_ubyte(70)

Author: Andreas Steffen CC BY 4.0