Skip to content
This repository was archived by the owner on Jul 17, 2024. It is now read-only.

Darknet- and converted ELL-model give different inference results #138

Open
sdudeck opened this issue Apr 5, 2018 · 10 comments
Open

Darknet- and converted ELL-model give different inference results #138

sdudeck opened this issue Apr 5, 2018 · 10 comments

Comments

@sdudeck
Copy link

sdudeck commented Apr 5, 2018

Hello,
I am trying to convert a small darknet-based cnn (originating from https://github.com/ashitani/darknet_mnist, working on the mnist dataset) to ELL.
I trained the darknet-model and afterwards followed the tutorial on this page for converting darknet models to ELL. After training and before converting I removed the cost-layer and the dropout-layer from the original darknet-model before converting, as they are used for training only as far as I have understood. (I did this because at first the darknet cost layer gave me a warning message during conversion - "sse not known" or something like that - and the dropout layer also seemed not to be converted into the ELL model).

After figuring out that I need to feed the mnist images not in the color channel range [0..1] (as in the darknet-framework) but [0..255] (as the ELL model automatically includes a scaling layer), I run the model on the same mnist images in the darknet- and ELL-framework. I checked that both models get the same array / vector (2352 float values, 28x28x3 values) of values in the same order (beside the scaling mentioned above).

The problem is, that I get very different prediction results from the models. E.g the darknet model gives me on one image a 93% for the most probably class (which is the right one), whereas the converted ELL-model gives only 17% for that class - it is still the most probable, but I would expect to get prediction results which are much closer to each other, as the model structure and weights should be (nearly) the same?

result of darknet-model:
data/mnist/images/v_01862_c4.png: Predicted in 0.054000 seconds.
c4: 0.933212
c9: 0.046099
c8: 0.008588
c5: 0.003347
c7: 0.002880

result of model converted to ELL:
D:\Crest\Libs\darknet_mnist\data\mnist\images\v_01862_c4.png
(17%) 4 (16%) 1 (12%) 9 (11%) 5 (11%) 3
Mean prediction time: 7ms/frame

Right now I have no idea what can cause this huge difference and where to look further.

I have attached the original darknet-cfg file (mnist_lenet.cfg) as well as the one used for converting and doing the inference in darknet (mnist_lenet.nodropout_nocost.cfg) and the converted ELL-file (I have removed the weights from that file, otherwise it would have 40 MB).

mnist_lenet.cfg.txt
mnist_lenet.nodropout_nocost.cfg.txt
mnist_lenet_woweights.ell.txt

Thank you very much,
Sven

@byronChanguion
Copy link
Contributor

I noticed that in your ELL file, the first FullyConnectedLayer is correctly followed by a Bias, but is missing the ReLUActivationLayer. I imported both your configs, and the resulting ELL files all have an activation layer between the last two FullyConenctedLayers i.e. the end of the network should look like:

    {
      "_type": "FullyConnectedLayer<float>",
      "_version": "0",
      "inputPaddingScheme": 0,
      "inputPaddingSize": 0,
      "outputShape": [1, 1, 1024],
      "outputPaddingScheme": 0,
      "outputPaddingSize": 0,
      "weights_rows": 1024,
      "weights_columns": 3136,
      "weights_values": [#deleted#]
    }, 
    {
      "_type": "BiasLayer<float>",
      "_version": "0",
      "inputPaddingScheme": 0,
      "inputPaddingSize": 0,
      "outputShape": [1, 1, 1024],
      "outputPaddingScheme": 0,
      "outputPaddingSize": 0,
      "bias": [#deleted#]
    }, 
    {
      "_type": "ActivationLayer<float,ReLUActivation>",
      "_version": "0",
      "inputPaddingScheme": 0,
      "inputPaddingSize": 0,
      "outputShape": [1, 1, 1024],
      "outputPaddingScheme": 0,
      "outputPaddingSize": 0
    }, 
    {
      "_type": "FullyConnectedLayer<float>",
      "_version": "0",
      "inputPaddingScheme": 0,
      "inputPaddingSize": 0,
      "outputShape": [1, 1, 10],
      "outputPaddingScheme": 0,
      "outputPaddingSize": 0,
      "weights_rows": 10,
      "weights_columns": 1024,
      "weights_values": [#deleted#]
    }, 
    {
      "_type": "BiasLayer<float>",
      "_version": "0",
      "inputPaddingScheme": 0,
      "inputPaddingSize": 0,
      "outputShape": [1, 1, 10],
      "outputPaddingScheme": 0,
      "outputPaddingSize": 0,
      "bias": [#deleted#]
    }, 
    {
      "_type": "SoftmaxLayer<float>",
      "_version": "0",
      "inputPaddingScheme": 0,
      "inputPaddingSize": 0,
      "outputShape": [1, 1, 10],
      "outputPaddingScheme": 0,
      "outputPaddingSize": 0
    }],
    "output": [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
  }

Can you confirm whether importing using the bits in master now produces a model which includes the correct activation layer?

@jesuspicazo
Copy link

Same thing happens to me. After importing my Darknet trained network to ELL it gives very bad results compared to the tests I have performed using just Darknet. What could be the reason?

@byronChanguion
Copy link
Contributor

Can you share your Darknet config and weights files so we can try to import and reproduce the problem?

@sdudeck
Copy link
Author

sdudeck commented Apr 20, 2018

Hello,
find attached the two files with the mnist-darknet model.

mnist_lenet.cfg.txt
mnist_lenet.weights.txt

I tried two things yesterday:

  1. I just inserted the missing activation layer in the ELL-file as in the snippet above and recompiled it. This new compiled model gave slightly different results but the quality didn't really improved.
  2. I updated my local files from the git-hub repository and recompiled the ELL stuff. Now the darknet-to-ELL import does not work anymore (ImportError: cannot import name 'MapCompilerOptions', tried it on two different darknet models).
    Full message:
(py36-ell-env) D:\Crest\DarknetModels>python ./../libs/ell/tools/importers/darknet/darknet_import.py mnist_lenet.cfg mnist_lenet.weights
Traceback (most recent call last):
  File "./../libs/ell/tools/importers/darknet/darknet_import.py", line 22, in <module>
    import darknet_to_ell
  File "D:\Crest\libs\ell\tools\importers\darknet\darknet_to_ell.py", line 22, in <module>
    import ell
  File "D:\Crest\libs\ell\build\interfaces\python\package\ell\__init__.py", line 22, in <module>
    from . import model
  File "D:\Crest\libs\ell\build\interfaces\python\package\ell\model\__init__.py" , line 9, in <module>
    from ..ell_py import \
ImportError: cannot import name 'MapCompilerOptions'

Thanks for helping,
Sven

@jesuspicazo
Copy link

I have trained a cnn using Darknet to dintinguish between 3 classes of robots. I need ELL to implant this network in a raspberry pi who actuallly is on board of another robot. So the thing is that when I test the network as it gets out of darknet, it reaches like 90-95% accuracy. I import the network as indicated in the tutorial and everything seems to be fine, but when I try it the percentages I obtain are almost always the same and are wrong and they are not similar to the results obtained when testing using darknet whatsoever.
I'm attaching the cfg and weights files as required.

robotsGardenCressDoubleFC.cfg.zip

robotsGardenCressDoubleFC.weights.zip

Thank you so much

for this amazing tool and your dedication.

@byronChanguion
Copy link
Contributor

Thanks for the model .cfg and .weights files! I was able to reproduce the problem and found what was causing the errors:

  1. ReLU activation was being skipped by the importer in [connected] layers.
  2. The weights of the [connected] layer need to be transposed by the importer.
    After fixing those, I get the same results as Darknet. I'll run a few more tests and push a fix within in the next couple days. As a temporary work-around, try replacing the ELL/tools/importers/Darknet/darknet_to_ell.py file with darknet_to_ell.zip

@jesuspicazo
Copy link

Thank you very much for your response. I've tried replacing darknet_to_ell.py but it isn't working. In fact, in this case, the predictions are always the same. No matter how different the test images are.
I'll be waiting for your fix update.
Thank you again.

@sdudeck
Copy link
Author

sdudeck commented Apr 25, 2018

Thanks a lot.

With the workaround py-file I get the ReLu-layer inserted in the ELL-file and the inference output values are different but still do not match the darknet values. So I will wait as well for the complete fix.

P.S.: I got rid of the 'MapCompilerOptions'-error mentioned above by pulling a clean version of the current ELL-repository and compiling it again.

@jesuspicazo
Copy link

Hi,
I'm just relaunching this issue because I'm still not able to import a Darknet-trained CNN properly using the import tool from ELL.
To make this problem easily reproducible, I have trained a very simple CNN which is the one that is in the Darknet tutorial for training a classifier on CIFAR-10 dataset.
https://pjreddie.com/darknet/train-cifar/
After training the network, I import the model exactly the way it is explained in the ELL c++ tutorial but when I try to recognize the images of the cifar-10 test set I obtain the following:
-nan -nan -nan
I also have observed that this happens when I set the activation of the convolutional layers as 'leaky'. When I turn them into 'relu' it doesn't give '-nan' but the results are very bad and they are always the same even with very different test images. Here I attach the .cfg and .weights files so the problem can b tested:
cifar_small.cfg.zip
cifar_small.weights.zip

For more details, I've tested this network using Darknet and it works as expected so I don't know whether I'm making any kind of mistake when using the darknet_import.py tool. Because I don't know what else it could be.
Please, I've been dealing with this issue for like 2 months now and any help would be highly appreciated.

Thanks a lot in advance.

Cheers.

@lovettchris
Copy link
Member

Thanks for the bug report, I have filed this internally to make sure we take a look and fix it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants