Hardware Language

To optimize and generate code for a specific system e.g. a subset of nodes from a cluster or a workstation, knowledge about the systems architecture is necessary. A scource-to-scource compiler does not always run on the maschine where the generated code executes. Furthermore, information on the network is also not necessarily available to the operating system. Thus, information on the maschines and the network (the system) must be provided differently.

The hardware language (HL) is a JSON file containing a set of parameters describing the properties of the system. An example for this file can be found in the benchmark folder under cluster.

Network

The network is the outermost part of the HL. The parameters for the network are the following:

Topology

The topology of the network is a string briefly describing the the network-topology of the cluster.

Connectivity-bandwidth

The connectivity-bandwidth is a matrix M containing the bandwidth between all nodes within the network. Given a set of nodes n_0,...,n_k, where the indices depict the order in which the nodes are defined in the network. The matrix elements M_j_i (form j to i) and M_i_j (form i to j) are the bandwidth in ./s between the nodes n_i and n_j with 0 <= i,j <= k. An example for the connectivity-bandwidth, for a system with 4 nodes can be seen below. The rows describe the sending nodes and the columns the receiving nodes. The values on the diagonal will be ignored.

"connectivity-bandwidth": [
   ["0", "100", "200", "300"],
   ["0", "100", "200", "300"],
   ["0", "100", "200", "300"],
   ["0", "100", "200", "300"]
 ]

Connectivity-latency

The connectivity-latency is a matrix M containing the latency between all nodes within the network. Given a set of nodes n_0,...,n_k, where the indices depict the order in which the nodes are defined in the network. The matrix elements M_j_i (form j to i) and M_i_j (form i to j) are the latency in µs between the nodes n_i and n_j with 0 <= i,j <= k. An example for the connectivity-latency, for a system with 4 nodes can be seen below. The rows describe the sending nodes and the columns the receiving nodes. The values on the diagonal will be ignored.

"connectivity-latency": [
   ["0", "100", "200", "300"],
   ["0", "100", "200", "300"],
   ["0", "100", "200", "300"],
   ["0", "100", "200", "300"]
 ]

Nodes

The network contains a list nodes defined by the key word: "nodes". The order of the nodes in this file defines the ranking of the nodes. E.g. the first node defined will get the MPI Rank 0, the second Rank 1 etc.

Node

The node describes a single machine within the system. The parameters are defined as follows:

Identifier

The identifier, keyword "identifier", is a unique string for each node. The identifier is used to name node.

"identifier": "Node1"

Address

The address, keyword "address", is a unique string for each node. The address defines where the specified node is located. The address must be understood by a MPI machinefile.

"address": "192.165.0.1.125"

or

"address": "login18-1.hpc.itc.rwth-aachen.de"

Templates

The first two parameters, identifier and address, must always be defined for each node individually. For all other parameters a template, keyword "template", can be defined. The template is a path to a predefined instantiation of the node. The path to the template file is relative to the file referencing the template. If a template file is provided within a folder "SampleFolder" within the same directory, you only need to specify the "SampleFolder" as your path. This can be utilized when defining a hardware language file with multiple instances of the same machine. If a template is defined all following parameters must be specified within the template.

"template": "./PathToMyTemplate/SampleNode.json"

Type

The type, keyword "type", of a machine defines some general properties of the machine e.g. if it is a numa node.

"type": "numa"

Connectivity-bandwidth

The connectivity-bandwidth is a matrix M containing the bandwidth between all devices within the node. Given a set of devices n_0,...,n_k, where the indices depict the order in which the devices are defined in the node. The matrix elements M_j_i (form j to i) and M_i_j (form i to j) are the bandwidth in ./s between the devices n_i and n_j with 0 <= i,j <= k. An example for the connectivity-bandwidth, for a system with 4 device can be seen below. The rows describe the sending devices and the columns the receiving devices. The values on the diagonal will be ignored.

"connectivity-bandwidth": [
   ["0", "100", "200", "300"],
   ["0", "100", "200", "300"],
   ["0", "100", "200", "300"],
   ["0", "100", "200", "300"]
 ]

Connectivity-latency

The connectivity-latency is a matrix M containing the latency between all devices within the node. Given a set of devices n_0,...,n_k, where the indices depict the order in which the devices are defined in the node. The matrix elements M_j_i (form j to i) and M_i_j (form i to j) are the latency in µs between the devices n_i and n_j with 0 <= i,j <= k. An example for the connectivity-latency, for a system with 4 devices can be seen below. The rows describe the sending devices and the columns the receiving devices. The values on the diagonal will be ignored.

"connectivity-latency": [
   ["0", "100", "200", "300"],
   ["0", "100", "200", "300"],
   ["0", "100", "200", "300"],
   ["0", "100", "200", "300"]
 ]

Devices

The devices, keyword "devices", specify the devices within a node. They are defined as a JSON Array and contain the definitions of the individual devices. The devices, especially GPUs must be in the same order as the CUDA runtime defines them, otherwise the execution could target the wrong device.

Device

The device describes a single compute device, e.g. a CPU or a GPU. The parameters are defined as follows:

Identifier

The identifier, keyword "identifier", is a string for each device. It is unique within each node. The identifier is used to name device.

"identifier": "CPU1"

Templates

The first parameter, identifier, must always be defined for each device individually. For all other parameters a template, keyword "template", can be defined. The template is a path to a predefined instantiation of a device. The path to the template file is relative to the file referencing the template. If a template file is provided within a folder "SampleFolder" within the same directory, you only need to specify the "SampleFolder" as your path. This can be utilized when defining a hardware language file with multiple instances of the same device. If a template is defined all following parameters must be specified within the template.

"template": "./PathToMyTemplate/SampleCPU.json"

Type

The type, keyword "type", of a node defines what kind of device is specified, e.g., a CPU, a GPU or a FPGA etc..

"type": "CPU"

Latency

The latency, keyword "latency", specifies the latency to the main memory connected to the device in µs.

"latency": "500"

Bandwidth

The bandwidth, keyword "bandwidth", specifies the bandwidth to the main memory connected to the device in ./s.

"bandwidth": "200000"

Maximum Bandwidth

The maximum bandwidth, keyword "max-bandwidth", specifies the highest possible bandwidth to the main memory connected to the device in ./s, this bandwidth considers the utilization of multiple memory controllers.

"max-bandwidth": "200000"

Memory Size

The memory size, keyword "size", specifies the size of the main memory connected to the device in Byte.

"size": "128000"

Cache Groupes

The cache groupes, keyword "cache-group", are defined by a JSON Array containing cache groups. The Array is defined in ascending order, the first group will be mapped to the first set of cores on the device.

Cache Group

The cache group describes a single team of threads sharing the highest level cache. The parameters are definend as follows:

Identifier

The identifier, keyword "identifier", is a string for each cache group. It is unique within each device. The identifier is used to name the cache group.

"identifier": "1"

Templates

The first parameter, identifier, must always be defined for each cache group individually. For all other parameters a template, keyword "template", can be defined. The template is a path to a predefined instantiation of a device. The path to the template file is relative to the file referencing the template. If a template file is provided within a folder "SampleFolder" within the same directory, you only need to specify the "SampleFolder" as your path. This can be utilized when defining a hardware language file with multiple instances of the same device. If a template is defined all following parameters must be specified within the template.

"template": "./PathToMyTemplate/SampleCG.json"

Cores

The cores, keyword "cores", specify the number of cores that are available in the cache group.

"cores": "24"

Frequency

The frequency, keyword "frequency", specifies the clock frequency of the cache group in MHz.

"frequency": "2400"

Arithmetic-Units

The arithmetic-units, keyword "arithmetic-units", specifies the number of arithmetic units per core.

"arithmetic-units": "4"

Warp-Size

The warp-size, keyword "warp-size", specifies the number of threads per warp for GPUs.

"warp-size": "32"

Vectorization

The vectorization, keyword "vectorization", specifies how the cache group can vectorize the execution (the size of the SIMD register), e.g., avx512.

"vectorization": "avx512"

Hyper-Threading

The hyper-threads, keyword "hyper-threads", specify the number of possible hyperthreads per core.

"hyper-threads": "2"

Caches

The caches, keyword "caches", are defined by a JSON Array containing caches. The Array is defined in ascending order, the closest cache is defined first.

Cache

The cache specifies an individual cache level on the device.