Skip to content
This repository has been archived by the owner on Oct 21, 2023. It is now read-only.

Commit

Permalink
Merge pull request #13 from FWDNXT/2021.2
Browse files Browse the repository at this point in the history
2021.2
  • Loading branch information
Andrechang committed Oct 13, 2021
2 parents 19d302a + 7d9eb55 commit f980919
Show file tree
Hide file tree
Showing 39 changed files with 486 additions and 216 deletions.
138 changes: 93 additions & 45 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ This SDK folder contains:
* [System requirements](#system-requirements)
* [Pico computing](#pico-computing)
* [Docker Image](#docker-image)
* [Manual Installation](#manual-installation)
* [Python package Install](#python-package-install)
- [2. Getting started with Deep Learning](#2-getting-started-with-deep-learning) : general information about deep learning
* [Introduction](#introduction)
* [PyTorch: Deep Learning framework](#pytorch-deep-learning-framework)
Expand All @@ -49,6 +49,10 @@ This SDK folder contains:
* [Multiple FPGAs with different models <a name="two"></a>](#multiple-fpgas-with-different-models)
* [Multiple Clusters with input batching <a name="three"></a>](#multiple-clusters-with-input-batching)
* [Multiple Clusters without input batching <a name="four"></a>](#multiple-clusters-without-input-batching)
* [Multiple Clusters with different models <a name="five"></a>](#multiple-clusters-with-different-models)
* [All Clusters with different models in sequence <a name="six"></a>](#all-clusters-with-different-models-in-sequence)
* [Multiple Clusters with even bigger batches <a name="seven"></a>](#multiple-clusters-with-even-bigger-batches)
* [Batching using MVs <a name="four"></a>](#batching-using-mvs)
- [6. Tutorial - PutInput and GetResult](#6-tutorial---putinput-and-getresult) : tutorial for using PutInput and GetOutput
- [7. Tutorial - Writing tests](#7-tutorial---writing-tests) : Tutorial on running tests
- [8. Tutorial - Debugging](#8-tutorial---debugging) : Tutorial on debugging and printing
Expand Down Expand Up @@ -114,8 +118,12 @@ lspci | grep -i pico
lsmod | grep -i pico
pico 3493888 12
```
After installing pico-computing, run install.sh to install the MDLA SDK

## Docker Image (optional)

## Docker Image

This step is optinal if you want to run as a docker image

If you want to use MDLA with docker, then you need to install [pico-computing](#pico-computing) and [docker](https://docs.docker.com/get-docker/).

Expand Down Expand Up @@ -176,6 +184,17 @@ root@d80174ce2995:/home/mdla#

Run the example code provided. Check sections [3](#3-getting-started-inference-on-micron-dla-hardware) and [4](#4-getting-started-inference-on-micron-dla-hardware-with-c).

## Python package Install (optional)

You can also install as a python package

`git clone https://github.com/FWDNXT/SDK`

Then inside SDK folder do

`python3 setup.py install --user`


# 2. Getting started with Deep Learning

## Introduction
Expand Down Expand Up @@ -355,20 +374,16 @@ numclus = 1
# Create Micron DLA API
sf = microndla.MDLA()
# Generate instructions
sf.SetFlag('nfpgas', str(numfpga))
sf.SetFlag('nclusters', str(numclus))
sf.Compile('resnet18.onnx', 'microndla.bin')
# Init the FPGA cards
sf.Init('microndla.bin')
sf.SetFlag({'nfpgas': str(numfpga), 'nclusters': str(numclus)})
sf.Compile('resnet18.onnx')
in1 = np.random.rand(2, 3, 224, 224).astype(np.float32)
input_img = np.ascontiguousarray(in1)
# Create a location for the output
output = sf.Run(input_img)
```

`sf.Compile` will parse the model from model.onnx and save the generated Micron DLA instructions in microndla.bin. Here numfpga=2, so instructions for two FPGAs are created.
`sf.Compile` will parse the model from model.onnx and save the generated Micron DLA instructions. Here numfpga=2, so instructions for two FPGAs are created.
`nresults` is the output size of the model.onnx for one input image (no batching).
`sf.Init` will initialize the FPGAs. It will send the instructions and model parameters to each FPGA's main memory.
The expected output size of `sf.Run` is twice `nresults`, because numfpga=2 and two input images are processed. `input_img` is two images concatenated.
The diagram below shows this type of execution:

Expand All @@ -387,13 +402,9 @@ sf1 = microndla.MDLA()
# Create second Micron DLA API
sf2 = microndla.MDLA()
# Generate instructions for model1
sf1.Compile('resnet50.onnx', 'microndla1.bin')
sf1.Compile('resnet50.onnx')
# Generate instructions for model2
sf2.Compile('resnet18.onnx', 'microndla2.bin')
# Init the FPGA 1 with model 1
sf1.Init('microndla1.bin')
# Init the FPGA 2 with model 2
sf2.Init('microndla2.bin')
sf2.Compile('resnet18.onnx')
in1 = np.random.rand(3, 224, 224).astype(np.float32)
in2 = np.random.rand(3, 224, 224).astype(np.float32)
input_img1 = np.ascontiguousarray(in1)
Expand Down Expand Up @@ -423,9 +434,7 @@ numclus = 2
sf = microndla.MDLA()
# Generate instructions
sf.SetFlag('nclusters', str(numclus))
sf.Compile('resnet18.onnx', 'microndla.bin')
# Init the FPGA cards
sf.Init('microndla.bin')
sf.Compile('resnet18.onnx')
in1 = np.random.rand(2, 3, 224, 224).astype(np.float32)
input_img = np.ascontiguousarray(in1)
output = sf.Run(input_img)
Expand All @@ -447,12 +456,9 @@ numfpga = 1
numclus = 2
# Create Micron DLA API
sf = microndla.MDLA()
sf.SetFlag('nclusters', str(numclus))
self.dla.SetFlag('clustersbatchmode', '1')
sf.SetFlag({'nclusters': str(numclus), 'clustersbatchmode': '1'})
# Generate instructions
sf.Compile('resnet18.onnx', 'microndla.bin')
# Init the FPGA cards
sf.Init('microndla.bin')
sf.Compile('resnet18.onnx')
in1 = np.random.rand(3, 224, 224).astype(np.float32)
input_img = np.ascontiguousarray(in1)
output = sf.Run(input_img)
Expand All @@ -465,6 +471,60 @@ The diagram below shows this type of execution:

<img src="docs/pics/2clus1img.png" width="600" height="550"/>

## Multiple Clusters with different models
The following example shows how to run different models using different clusters in parallel.
Currently, a cluster for each model is allowed. But different number of cluster per model is not allowed. For example, 3 clusters for a model and then 1 cluster for another.
The example code is in [here](./examples/python_api/twonetdemo.py)

```python
import microndla
import numpy as np
nclus = 2
img0 = np.random.rand(3, 224, 224).astype(np.float32)
img1 = np.random.rand(3, 224, 224).astype(np.float32)
ie = microndla.MDLA()
ie2 = microndla.MDLA()
ie.SetFlag({'nclusters': nclus, 'clustersbatchmode': 1})
ie2.SetFlag({'nclusters': nclus, 'firstcluster': nclus, 'clustersbatchmode': 1})
ie.Compile('resnet18.onnx')
ie2.Compile('alexnet.onnx', MDLA=ie)
ie.PutInput(img0, None)
ie2.PutInput(img1, None)
result0, _ = ie.GetResult()
result1, _ = ie2.GetResult()
```
In the code, you create one MDLA object for each model and compile them. For the first model, use 2 clusters together.
For the second model, assign the remaining 2 clusters to it. Use `firstcluster` flag to tell `Compile` which cluster is the first cluster it is going to use.
In this example, first model uses clusters 0 and 1 and second model uses clusters 2 and 3.
In `Compile`, pass the previous MDLA object to link them together so that they get loaded into memory in one go.
In this case, you must use `PutInput` and `GetResult` paradigm (this [section](#6-tutorial---putinput-and-getresult)), you cannot use `Run`.

<img src="docs/pics/2clus2model.png" width="600" height="550"/>

## All Clusters with different models in sequence

This example shows how to load multiple models and run them in a sequence using all clusters. This is similar to previous example, the only different
all clusters is used for each model. It uses same principle of creating different MDLA objects for each model and link different MDLAs in `Compile`.

```python
import microndla
import numpy as np
nclus = 2
img0 = np.random.rand(3, 224, 224).astype(np.float32)
img1 = np.random.rand(3, 224, 224).astype(np.float32)
ie = microndla.MDLA()
ie2 = microndla.MDLA()
ie.SetFlag({'nclusters': nclus, 'clustersbatchmode': 1})
ie2.SetFlag({'nclusters': nclus, 'clustersbatchmode': 1})
ie.Compile('resnet18.onnx')
ie2.Compile('alexnet.onnx', MDLA=ie)
result0 = ie.Run(img0)
result1 = ie2.Run(img1)
```

<img src="docs/pics/2clus2seqmodel.png" width="600" height="550"/>


## Multiple Clusters with even bigger batches

It's possible to run batches of more than than the number of clusters or FPGAs. Each cluster will process multiple images.
Expand All @@ -477,12 +537,9 @@ numfpga = 1
numclus = 2
# Create Micron DLA API
sf = microndla.MDLA()
sf.SetFlag('nclusters', str(numclus))
sf.SetFlag('imgs_per_cluster', '16')
sf.SetFlag({'nclusters': str(numclus), 'imgs_per_cluster': '16'})
# Generate instructions
sf.Compile('resnet18.onnx', 'microndla.bin')
# Init the FPGA cards
sf.Init('microndla.bin')
sf.Compile('resnet18.onnx')
in1 = np.random.rand(32, 3, 224, 224).astype(np.float32)
input_img = np.ascontiguousarray(in1)
output = sf.Run(input_img) # Run
Expand All @@ -502,13 +559,9 @@ numfpga = 1
numclus = 2
# Create Micron DLA API
sf = microndla.MDLA()
sf.SetFlag('nclusters', str(numclus))
sf.SetFlag('imgs_per_cluster', '16')
sf.SetFlag('mvbatch', '1')
sf.SetFlag({'nclusters': str(numclus), 'imgs_per_cluster': '16', 'mvbatch': '1'})
# Generate instructions
sf.Compile('resnet18.onnx', 'microndla.bin')
# Init the FPGA cards
sf.Init('microndla.bin')
sf.Compile('resnet18.onnx')
in1 = np.random.rand(32, 3, 224, 224).astype(np.float32)
input_img = np.ascontiguousarray(in1)
output = sf.Run(input_img)
Expand Down Expand Up @@ -594,8 +647,7 @@ result_pyt = result_pyt.detach().numpy()

Now we need to run this model using the accelerator with the SDK.
```python
sf.Compile('net_conv.onnx', 'net_conv.bin')
sf.Init("./net_conv.bin")
sf.Compile('net_conv.onnx')
in_1 = np.ascontiguousarray(inV)
result = sf.Run(in_1)
```
Expand Down Expand Up @@ -630,9 +682,7 @@ A debug option won't affect the compiler, it will only print more information. T

You can use `SetFlag('debug', 'b')` to print the basic prints. The debug code `'b'` stands for basic. Debug codes and option codes are letters (case-sensetive). For a complete list of letters refer to [here](docs/Codes.md).

Always put the `SetFlag()` after creating the Micron DLA object. If will print the information about the run. First, it will list all the layers that it is going to compile from the `net_conv.onnx` and produce a `net_conv.bin`.

Then `Init` will find an FPGA system, AC511 in our case. It will also show how much time it took to send the weights and instructions to the external memory in the `Init` function.
Always put the `SetFlag()` after creating the Micron DLA object. If will print the information about the run. First, it will list all the layers that it is going to compile from the `net_conv.onnx`.

Then `Run` will rearrange in the input tensor and load it into the external memory. It will print the time it took and other properties of the run, such as number of FPGAs and clusters used.

Expand Down Expand Up @@ -706,7 +756,7 @@ In this case, you can set 'V' in the options using `SetFlag` function before `Co
ie = microndla.MDLA()
ie.SetFlag('varfp', '1')
#Compile to a file
swnresults = ie.Compile('resnet18.onnx', 'save.bin')
swnresults = ie.Compile('resnet18.onnx')
```

**Option 2**: Variable fix-point can be determined for input and output of each layer if one or more sample inputs are provided.
Expand All @@ -726,7 +776,7 @@ for fn in os.listdir(args.imagesdir):
#Create and initialize the Inference Engine object
ie = microndla.MDLA()
#Compile to a file
swnresults = ie.Compile('resnet18.onnx', 'save.bin', samples=imgs)
swnresults = ie.Compile('resnet18.onnx', samples=imgs)
```

After that, `Init` and `Run` runs as usual using the saved variable fix-point configuration.
Expand Down Expand Up @@ -784,11 +834,9 @@ mnist = tf.keras.datasets.mnist
x_train, x_test = x_train / 255.0, x_test / 255.0
ie = microndla.MDLA()
swnresults = ie.Compile('28x28x1', 'mnist', 'save.bin')
ie.Init('save.bin', '')
result = np.ndarray(swnresults, dtype=np.float32)
ie.Compile('mnist.onnx')
for i in range(0, 10):
ie.Run(x_test[i].astype(np.float32), result)
result = ie.Run(x_test[i].astype(np.float32))
print(y_test[i], np.argmax(result))
```
Expand Down
8 changes: 4 additions & 4 deletions api.h
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
#ifndef _IE_API_H_INCLUDED_
#define _IE_API_H_INCLUDED_

static const char *microndla_version = "2021.1.0";
static const char *microndla_version = "2021.2.0";
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
Expand Down Expand Up @@ -46,7 +46,7 @@ int IECOMPILER_API set_external_wait(void *cmemo, bool (*wait_ext) (int));
/*!
Allow to pass externally created thnets net into node list
*/
void IECOMPILER_API ext_thnets2lst(void *cmemo, void* nett, char* image, int limit, int batch);
void IECOMPILER_API ext_thnets2lst(void *cmemo, void* nett, char* image, int batch);

/*!
Create an Inference Engine object
Expand Down Expand Up @@ -82,7 +82,7 @@ Run static quantization of inputs, weight and outputs over a calibration dataset
*/
void IECOMPILER_API *ie_compile_vfp(void *cmemo, const char *modelpath, const char* outbin, const char *inshapes,
unsigned *noutputs, unsigned **noutdims, uint64_t ***outshapes,
const float * const *inputs, const uint64_t *input_elements, unsigned ninputs);
const float * const *inputs, const uint64_t *input_elements, unsigned ninputs, void *cmemp);

/*!
Compile a network and produce a .bin file with everything that is needed to execute in hardware.
Expand All @@ -97,7 +97,7 @@ In this case, ie_compile is necessary, ie_init with a previously generated bin f
@param outshapes returns a pointer to noutputs pointers to the shapes of each output
@return context object
*/
void IECOMPILER_API *ie_compile(void *cmemo, const char *modelpath, const char *outbin, const char *inshapes, unsigned *noutputs, unsigned **noutdims, uint64_t ***outshapes);
void IECOMPILER_API *ie_compile(void *cmemo, const char *modelpath, const char *outbin, const char *inshapes, unsigned *noutputs, unsigned **noutdims, uint64_t ***outshapes, void *cmemp);
/*!
Load a .bin file into the hardware and initialize it
@param cmemo pointer to an Inference Engine object, may be null
Expand Down
26 changes: 9 additions & 17 deletions docs/C_API.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,15 +37,18 @@ Frees the network
******
## void *ie_compile

Parse an ONNX model and generate Inference Engine instructions
Parse an ONNX/NNEF model and generate Inference Engine instructions

***Parameters:***

void IECOMP char *modelpath, const char *outbin, const char *inshapes, unsigned *noutputs, unsigned **noutdims, uint64_t ***outshapes, void *cmemp);


`void *cmemo`: pointer to an Inference Engine object, may be 0

`const char *modelpath`: path to a model file in ONNX format

`const char* outbin`: path to a file where a model in the Inference Engine ready format will be saved
`const char* outbin`: path to a file where a model in the Inference Engine ready format will be saved. If this param is used then Init call is needed afterwards

`const char *inshapes`: shape of the inputs in the form size0xsize1xsize2...; more inputs are separated by semi-colon; this parameter is optional as the shapes of the inputs can be obtained from the model file

Expand All @@ -55,6 +58,8 @@ Parse an ONNX model and generate Inference Engine instructions

`uint64_t ***outshapes`: returns a pointer to noutputs pointers to the shapes of each output

`void *cmemp`: MDLA object to link together so that models can be load into memory together

***Return value:*** pointer to the Inference Engine object or 0 in case of error

******
Expand Down Expand Up @@ -85,22 +90,9 @@ choosing the proper quantization for variable-fixed point, available with the VF

`unsigned ninputs`: number of inputs, must be a multiple of the inputs expected by the network

***Return value:*** pointer to the Inference Engine object or 0 in case of error

******
## void *ie_loadmulti

Loads multiple bitfiles without initializing hardware
`void *cmemp`: MDLA object to link together so that models can be load into memory together

***Parameters:***

`void *cmemo`: pointer to an Inference Engine object

`const char* const *inbins`: array of pathnames to the bitfiles to load

`unsigned count`: number of bitfiles to load

***Return value:*** pointer to an Inference Engine object to pass to ie_init
***Return value:*** pointer to the Inference Engine object or 0 in case of error

******
## void *ie_init
Expand Down
1 change: 1 addition & 0 deletions docs/Codes.md
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,7 @@ following characters:

**no_rearrange**: Skip output rearrangement

**heterogeneous**: Run DLA-unsupported layers on CPU also in the middle of the network

*****
## GetInfo
Expand Down
Loading

0 comments on commit f980919

Please sign in to comment.