Skip to content

Latest commit

 

History

History
455 lines (360 loc) · 21.3 KB

YAML Quickstart.md

File metadata and controls

455 lines (360 loc) · 21.3 KB

Network Description YAML Files: A Quick Start Guide

November 13, 2023

This document is intended to help users to create network description YAML files corresponding to their network models.

Purpose of Network Description YAML files

It is a network description written in YAML, a simple markup language. Typically, the outcome of training a model is a checkpoint (PyTorch). Checkpoint files only include the weights, but not the network operators and layer sequence, and there is no scheduling of hardware resources. However, all of these are needed for the synthesis tool in order to program the MAX78000/MAX78002. The purpose of the YAML file is to describe the model in a hardware-centric fashion. The YAML file can also be viewed as a “sidecar” file to the checkpoint file.

Information needed to create YAML files

  1. The model that is used for training.
  2. Some basic knowledge about MAX78000/MAX78002 hardware.

Following the instructions below, you will be able to create a layer-by-layer description of your model in the YAML format. Once completed, copy the YAML file to ai8x-synthesis/networks and reference it in the synthesis script for your trained model.

YAML File Structure

The YAML file includes global configuration in the header, followed by a layer-by-lauerdescription which correspond to the layers in the model.

1. Global Configuration Section

Using the “arch“ and “dataset“ keywords, the global section must include the name of the model and dataset as used for training. These keywords are used to make sure a YAML file matches a trained model, and to pull in a sample data file for the known-answer test (KAT) that is auto-generated by the synthesis tool.

Example: MNIST

./train.py --epochs 200 --deterministic --compress schedule.yaml --model ai85net5 --dataset MNIST --confusion --param-hist --pr-curves --embedding --device MAX78000

# ------------   Global Configuration   ------------
arch: ai85net5  # Mandatory: use the name of the model as in training
dataset: MNIST  # Mandatory: use the name of the dataset as in training 

2. Layer Description Section

The layer descriptions section defines each layer as it appears in the model. This section starts with the “layers“ keyword, followed by the configuration of each layer.

Use “-” as the delimiter to start each layer. The order of keywords in the description of a layer is arbitrary.

# ------------   Layers   ------------
- pad: 0
  activate: ReLU
  out_offset: 0x2000
  processors: 0xffffffffffffffff
  data_format: HWC
  operation: Conv1d
  kernel_size: 1

The following figure shows how a layer in the model is mapped to YAML:

YAML mapping

The table below shows the description of frequently used keywords. For a complete list of keywords and more description, please check the main documentation README.md.

keyword Description Available options
data_format ONLY in first layer, specifies the organization of data in memory. The optimum choice depends on the data source (interleaved channels/HWC or channels in sequence/CHW). When there is no particular preference, HWC is recommended for input data of less than 90×91 per channel. CHW or HWC
op
operation
The layer operation as in the model. Use None or Passthrough for a pooling-only layer. Conv1d, Conv2d, ConvTranspose2d, None (or Passthrough), Linear (or FC or MLP), Add, Sub, Xor, Or
pad (optional) The padding of the layer in the model For Conv2d , this value can be 0 , 1 (the default), or 2 .
For Conv1d , the value can be 0 , 1 , 2 , or 3 (the default).
For Passthrough , this value must be 0 (the default).
activate (optional) The layer activation ReLU , Abs or None (the default).
max_pool (optional) The pool_size if the layer includes MaxPool 1 to 16
avg_pool (optional) The pool_size if the layer includes AvgPool 1 to 16
pool_stride (optional) The pool_stride if the layer includes MaxPool or AvgPool 1 to 16
kernel_size (optional) The kernel size For conv2d: 1 (1×1) or 3 (3×3) (default)
For conv1d: 1 (default) to 9
processors Each bit of this 64-bit processor map represents enabling of one of the 64 processors in CNN. The number of enabled processors should match the input channel count. If there are more than 64 channels, the number of processor is the largest integer division of the channel count up to 64, rounded to the next multiple of 4. For example, 60 processors (0x0fffffffffffffff) are specified for 120 channels, or 52 processors for 100 channels.
Note 1: If possible, try to use processors with non-overlapping memory instances (4 processors share the same memory instance) in consecutive layers (e.g. processors: 0x0000000000000001 in layer 1 and 0x00000000000ffff0 in layer 2) – this will make it easier to allocate large portions of memory.
Note 2: if CHW data_format is used, processors must be attached to different memory instances (e.g. for 3 input channels: 0x0000000000000111 in CHW, 0x0000000000000007 in HWC).
Note 3: For optimum efficiency, it is recommended to choose the number of channels as a multiple of 4 in each layer.
Note 4: In linear layers, the number of processors is the number of channels before flattening.
0x0000000000000001 to 0xffffffffffffffff
out_offset (optional) The relative offset inside the data memory to write the output data to.
Note 5: The input of each layer is taken from the output offset of the previous layer. To avoid overwriting an input that has not been consumed, use ping-ponging between out_offset=0 and half the memory (0x4000) or less in consecutive layers.
0x0000 to 0x8000
in_offset (optional) Specifies the offset into the data memory instances where the input data should be loaded from. When not specified, this key defaults to the previous layer’s out_offset, or 0 for the first layer. 0x0000 to 0x8000 (default: out_offset of last layer)
flatten (optional) Used in Linear layers to specify that 2D input data should be transformed to 1D data True, False (default)
output_width (optional) Specifies the output number of bits. It is used only in the last layer if there is no activation to specify 32-bit output of unclipped data in Q17.14 format.
Note 6: To use output_width: 32, the last layer in the model must be trained with wide=True
8 (default), 32
in_dim (optional) Specifies the dimensions of the input data. Automatically computed in most cases, but must be specified when changing from 1D to 2D data or vice versa [x, y]
streaming (optional) Specifies if the layer is using streaming (FIFO enabled). This is necessary when the input data dimensions exceed the available data memory (data greater than 90×91). When enabling streaming, all prior layers have to enable streaming as well. streaming is limited to 8 consecutive layers or fewer, and is limited to four FIFOs (up to 4 input channels in CHW and up to 16 channels in HWC format). See Example 3.
Note 7: The final streaming layer must use padding.
True, False(default)
name
(optional)
Optionally specifies the name for a layer so it can be referred to by name instead of only by number. string
in_sequences (optional) Specifies the layer that the input of current layer comes from. It can be used to point to the output of one or more arbitrary previous layers, for example when processing the same data using two different kernel sizes, or when combining the outputs of several prior layers. in_sequences can be specified as an integer or string (for a single input) or as a list (for multiple inputs). As a special case, -1 or input refer to the input data. The in_offset and out_offset must be set to match the specified sequence. See Example 5. [i,j] (default: last layer)
-1 or input for input data
write_gap (optional) specifies the number of channels that should be skipped during write operations (this value is multiplied with the output multi-pass, i.e., write every nth word where n = write_gap × output_multipass). This creates an interleaved output that can be used as the input for subsequent layers that use an element-wise operation, or to concatenate multiple inputs to form data with more than 64 channels.
Set write_gap to 1 to produce output for a subsequent two-input element-wise operation. See Example 5.
integer
eltwise (optional) element-wise operations can also be added “in-flight” to Conv2d. In this case, the element-wise operation is specified using the eltwise key. See Example 5.
Note 8: On MAX78000, this is only supported for 64 channels, or up to 128 channels when only two operands are used. Use a separate layer for the element-wise operation when more operands or channels are needed instead of combining the element-wise operator with a convolution.
none

Examples

Example 1: MNIST, CHW data format, 28x28x1

See the model equivalent of each layer:

arch: ai85net5
dataset: MNIST

# Define layer parameters in order of the layer sequence
layers:
# +++++++++++++++++++++ layer 0:   input 28x28x1:  ai8x.FusedConv2dReLU(1, 60, 3, padding=1)
- pad: 1
  activate: ReLU
  out_offset: 0x4000
  processors: 0x0000000000000001   # input channels: 1, start from first processor
  data_format: CHW
  op: conv2d
  kernel_size: 3x3
  
# +++++++++++++++++++++ layer 1: ai8x.FusedMaxPoolConv2dReLU(60, 60, 3, pool_size=2, pool_stride=2, padding=2)
- max_pool: 2
  pool_stride: 2
  pad: 2
  activate: ReLU
  out_offset: 0
  processors: 0xfffffffffffffff0  # input channels: 60, start from 5th processor in 2nd memory instance
  op: conv2d
  kernel_size: 3x3

# +++++++++++++++++++++ layer 2: ai8x.FusedMaxPoolConv2dReLU(60, 54, 3, pool_size=2, pool_stride=2, padding=1)
- max_pool: 2
  pool_stride: 2
  pad: 1
  activate: ReLU
  out_offset: 0x4000
  processors: 0xfffffffffffffff0 # input channels: 60
  op: conv2d
  kernel_size: 3x3

# +++++++++++++++++++++ layer 3: ai8x.FusedAvgPoolConv2dReLU(54, 12, 3, pool_size=2, pool_stride=2, padding=1)
- avg_pool: 2
  pool_stride: 2
  pad: 1
  activate: ReLU
  out_offset: 0
  processors: 0x0ffffffffffffff0 # input channels: 54
  op: conv2d
  kernel_size: 3x3
  
# +++++++++++++++++++++ layer 4:   ai8x.Linear(12*28*28, 10, bias=True, wide=True)
- op: mlp
  flatten: true
  out_offset: 0x1000
  output_width: 32  # model is trained with wide = True, we can get 32 bit output
  processors: 0x0000000000000fff # input channels before flatenning: 12

Example 2: KWS20_v1, HWC data format, 128x128x1

arch: ai85kws20net
dataset: KWS_20

# Define layer parameters in order of the layer sequence
layers:
# +++++++++++++++++++++ layer 0:   input 128x128:  ai8x.FusedConv1dReLU(128, 100, 1, stride=1, padding=0)
- pad: 0
  activate: ReLU
  out_offset: 0x2000
  processors: 0xffffffffffffffff # input channels: 128, use all processors
  data_format: HWC
  operation: Conv1d
  kernel_size: 1
  
# +++++++++++++++++++++ layer 1:  ai8x.FusedConv1dReLU(100, 100, 1, stride=1, padding=0)
- pad: 0
  activate: ReLU
  out_offset: 0x0000
  processors: 0x000fffffffffffff
  operation: Conv1d
  kernel_size: 1
  
# +++++++++++++++++++++ layer 2:   ai8x.FusedConv1dReLU(100, 50, 1, stride=1, padding=0) 
- pad: 0
  activate: ReLU
  out_offset: 0x2000
  processors: 0x000fffffffffffff
  operation: Conv1d
  kernel_size: 1
  
# +++++++++++++++++++++ layer 3:   ai8x.FusedConv1dReLU(50, 16, 1, stride=1, padding=0) 
- pad: 0
  activate: ReLU
  out_offset: 0x0000
  processors: 0x0003ffffffffffff
  operation: Conv1d
  kernel_size: 1
  
# +++++++++++++++++++++ layer 4:   ai8x.FusedConv2dReLU(16, 32, 3, stride=1, padding=1) 
# Conv 2D - 5 layers
- pad: 1
  in_dim: [16, 8]      # needed as moved from 1D to 2D
  activate: ReLU
  out_offset: 0x2000
  processors: 0x000000000000ffff
  operation: Conv2d
  kernel_size: 3x3    # default: optional
  
# +++++++++++++++++++++ layer 5:   ai8x.FusedConv2dReLU(32, 64, 3, stride=1, padding=1)
- pad: 1
  activate: ReLU
  out_offset: 0x0000
  processors: 0x0000ffffffff0000
  operation: Conv2d

# +++++++++++++++++++++ layer 6:   ai8x.FusedConv2dReLU(64, 64, 3, stride=1, padding=1)
- pad: 1
  activate: ReLU
  out_offset: 0x2000
  processors: 0xffffffffffffffff
  operation: Conv2d

# +++++++++++++++++++++ layer 7:   ai8x.FusedConv2dReLU(64, 30, 3, stride=1, padding=1)
- pad: 1
  activate: ReLU
  out_offset: 0x0000
  processors: 0xffffffffffffffff
  operation: Conv2d

# +++++++++++++++++++++ layer 8:   ai8x.FusedConv2dReLU(30, 7, 3, stride=1, padding=1)
- pad: 1
  activate: ReLU
  out_offset: 0x0000
  processors: 0xfffffffc00000000
  operation: Conv2d
  
# +++++++++++++++++++++ layer 9:   ai8x.Linear(7 * 128, 21, wide=True)
- flatten: true
  out_offset: 0x2000
  processors: 0x000000000000007f # input channels before flattening: 7
  operation: MLP
  output_width: 32

Example 3: FaceID, HWC data format 160x120x3, Streaming mode

arch: ai85faceidnet
dataset: FaceID

layers:
# +++++++++++++++++++++ layer 0: input 160x120x3: ai8x.FusedConv2dReLU(3, 16, 3, padding=1)
- out_offset: 0x1000
  processors: 0x0000000000000007
  operation: conv2d
  kernel_size: 3x3
  pad: 1
  activate: ReLU
  data_format: HWC
  streaming: true  # FIFO is used

# +++++++++++++++++++++ layer 1: ai8x.FusedMaxPoolConv2dReLU(16, 32, 3, pool_size=2, pool_stride=2,padding=1)
- max_pool: 2
  pool_stride: 2
  pad: 1
  operation: conv2d
  kernel_size: 3x3
  activate: ReLU
  out_offset: 0x2000
  processors: 0x00000000000ffff0
  streaming: true  # FIFO is used, last streaming layer, padding should be none-zero

# +++++++++++++++++++++ layer 2: ai8x.FusedMaxPoolConv2dReLU(32, 32, 3, pool_size=2, pool_stride=2,padding=1)
- max_pool: 2
  pool_stride: 2
  pad: 1
  operation: conv2d
  kernel_size: 3x3
  activate: ReLU
  out_offset: 0x0000
  processors: 0x00000000ffffffff
  
# +++++++++++++++++++++ layer 3: FusedMaxPoolConv2dReLU(32, 64, 3, pool_size=2, pool_stride=2,padding=1)
- max_pool: 2
  pool_stride: 2
  pad: 1
  operation: conv2d
  kernel_size: 3x3
  activate: ReLU
  out_offset: 0x2000
  processors: 0xffffffff00000000
  
# +++++++++++++++++++++ layer 4: ai8x.FusedMaxPoolConv2dReLU(64, 64, 3, pool_size=2, pool_stride=2, padding=1)
- max_pool: 2
  pool_stride: 2
  pad: 1
  operation: conv2d
  kernel_size: 3x3
  activate: ReLU
  out_offset: 0x0000
  processors: 0xffffffffffffffff
  
# +++++++++++++++++++++ layer 5: ai8x.FusedConv2dReLU(64, 64, 3, padding=1)
- pad: 1
  operation: conv2d
  kernel_size: 3x3
  activate: ReLU
  out_offset: 0x2000
  processors: 0xffffffffffffffff

# +++++++++++++++++++++ layer 6: ai8x.FusedConv2dReLU(64, 64, 3, padding=1)
- pad: 1
  operation: conv2d
  kernel_size: 3x3
  activate: ReLU
  out_offset: 0x0000
  processors: 0xffffffffffffffff

# +++++++++++++++++++++ layer 7: ai8x.FusedMaxPoolConv2d(64, 512, 1, pool_size=2, pool_stride=2, padding=0)
- max_pool: 2
  pool_stride: 2
  pad: 0
  operation: conv2d
  kernel_size: 1x1
  out_offset: 0x2000
  processors: 0xffffffffffffffff

# +++++++++++++++++++++ layer 8: ai8x.AvgPool2d((5, 3))
- avg_pool: [5, 3]
  pool_stride: 1
  operation: None  # a Pooling-only layer
  out_offset: 0x0000
  processors: 0xffffffffffffffff

Example 4: Cats and Dogs, CHW data format 64x64x3

arch: ai85cdnet
dataset: cats_vs_dogs

layers:
# +++++++++++++++++++++ layer 0: Input 64x64x3  ai8x.FusedConv2dReLU(3, 15, 3, padding=1)
- pad: 1
  activate: ReLU
  out_offset: 0x0000
  processors: 0x0000000100010001 # one processor per group for 3 inputs in CHW mode (only first layer)
  data_format: CHW
  operation: Conv2d

# +++++++++++++++++++++ layer 1:  ai8x.FusedMaxPoolConv2dReLU(15, 30, 3, pool_size=2, pool_stride=2,padding=1)
- max_pool: 2
  pool_stride: 2
  pad: 1
  activate: ReLU
  out_offset: 0x2000
  processors: 0x0007fff000000000
  operation: Conv2d

# ... similar for layers2-4 ...

# +++++++++++++++++++++ layer 5: ai8x.FusedConv2dReLU(30, 30, 3, padding=1)
- pad: 1
  activate: ReLU
  out_offset: 0x2000
  #output_width: 32
  processors: 0xfffffffc00000000
  operation: Conv2d
# +++++++++++++++++++++ layer 6:self.fc = ai8x.Linear(32*8*8, 2, bias=True)
- op: mlp
  flatten: true
  out_offset: 0x1000
  output_width: 32
  processors: 0x000000003fffffff

Example 5: Residual connection, element-wise, CHW data format 64x64x3

# Model: 
#    def forward(self, x):
#        x = self.conv1(x) 
#        x_res = self.conv2(x)      
#        x = self.conv3(x_res)     
#        x = self.add1(x, x_res)
#        x = self.conv4(x)
#        ...

# Layer 0:  self.conv1 = ai8x.FusedConv2dReLU(num_channels, 16, 3, stride=1, padding=1, bias=bias, **kwargs)
- out_offset: 0x2000
  processors: 0x7000000000000000
  operation: conv2d
  kernel_size: 3x3
  pad: 1
  activate: ReLU
  data_format: HWC
  
# Layer 1: self.conv2 = ai8x.FusedConv2dReLU(16, 20, 3, stride=1, padding=1, bias=bias, **kwargs)
- out_offset: 0x0000
  processors: 0x0ffff00000000000
  operation: conv2d
  kernel_size: 3x3
  pad: 1
  activate: ReLU

# Layer 2 - re-form layer 1 data with gap
- out_offset: 0x2000
  processors: 0x00000000000fffff
  output_processors: 0x00000000000fffff
  operation: passthrough
  write_gap: 1  # output is interleaved with 1 word gaps, i.e. 0x2000, 0x2008, ...
  name: res2

# Layer 3: self.conv3 = ai8x.FusedConv2dReLU(20, 20, 3, stride=1, padding=1, bias=bias, **kwargs)
- in_offset: 0x0000   # output of conv2, layer 1
  out_offset: 0x2004  # start output from 0x2004
  processors: 0x00000000000fffff
  operation: conv2d
  kernel_size: 3x3
  pad: 1
  activate: ReLU
  write_gap: 1 # output is interleaved with 1 word gap, i.e. 0x2004, 0x200C, ...
  name: res3

# Layer 4: self.add1 = ai8x.Add()
#          self.conv4 = ai8x.FusedConv2dReLU(20, 20, 3, stride=1, padding=1, bias=bias, **kwargs)
- in_sequences: [res2, res3] # get input from layer 2 and 3
  in_offset: 0x2000  # Layer 2 and 3 outputs are interleaved starting from 0x2000
  out_offset: 0x0000
  processors: 0x00000000000fffff
  eltwise: add   # element-wise add from output of layer 2 and 3 executed in the same layer as conv4
  operation: conv2d 
  kernel_size: 3x3
  pad: 1
  activate: ReLU
  

Common Pitfalls and Errors

The following table summarizes some common problems that cause synthesis errors.

Synthesis Error Example Resolution
1 ERROR: Layer 0 uses CHW input format, but multiple channels share the same memory instance. Modify the processor map for layer 0. - pad: 1
activate: ReLU
out_offset: 0x0000
processors: 0x0000000000000007
data_format: CHW
operation: Conv2d
change processor map to 0x0000000000000111 or use HWC data_format
2 ERROR: Layer 2 has 50 outputs with output expansion 1, threshold 50, but processor output map 0x0007ffffffffffff has 51 bits instead of the expected number of 50. - pad: 0
activate: ReLU
out_offset: 0x0000
processors: 0x0007ffffffffffff #0x0003ffffffffffff
operation: Conv1d
kernel_size: 1
change processor map to 0x0003ffffffffffff
3 ERROR: Processor 0: Layer 9 output for CHW=0,0,0 is overwriting input at offset 0x00400000 that was created by layer 8, CHW=0,0,0. - pad: 1
activate: ReLU
out_offset: 0x0000
processors: 0xfffffffc00000000
operation: Conv2d
- flatten: true
out_offset: 0x0000
processors: 0x000000000000007f
operation: MLP
output_width: 32
The out_offsets of back to back layers should “ping-pong”, change the last out_offset to 0x4000
4 ERROR: Input dimensions do not match in layer 0. Expected: 7x6, got 9x6. sample data shape was 9×7 Make sure C,W,H of sample data is correct (e.g. 7×9)
5 assert operands == data.shape[0] // input_size[0]
AssertionError
sample data shape was 1×9×7 Make sure C,W,H of sample data is correct (e.g. 7×9)