Skip to content
This repository has been archived by the owner on Jan 7, 2025. It is now read-only.

Accuracy show the wrong result on graph #625

Closed
ssilphf opened this issue Mar 10, 2016 · 17 comments
Closed

Accuracy show the wrong result on graph #625

ssilphf opened this issue Mar 10, 2016 · 17 comments
Labels

Comments

@ssilphf
Copy link

ssilphf commented Mar 10, 2016

Hi,
I have a problem when I use digits3.3dev.
I train a mode using GoogLeNet.
The graph show that accuracy go to 100% ,but when I use the "Test a list of images" on the same job's val.txt file. Result is all class in one category (my mode have 4 categories ,so the accuracy must be about 25%).

And I also use the same dataset and same parameters on digits2.3dev ,it go well. I don't know what is the different of digits2.3dev and digits3.3dev.
Can anyone help me ,please?

I have used "classification" under ~/digits/example/ to test, the result is different with "Test a list of images".

@lukeyeager
Copy link
Member

Hi @ssilphf, I can't reproduce your problem yet.

I just merged #608, which should give you some more information when you classify a list of images. Try it out and see if you can get any more information.

Also, if you can give me some standard information about your machine I may be able to spot something fishy:

  • DIGITS version (use git describe)
  • Caffe version
  • cuDNN version
  • CUDA version
  • NVIDIA driver version
  • Operating system

@ssilphf
Copy link
Author

ssilphf commented Mar 10, 2016

Thank you for speedy re-comment.
That is my information:
Digits version = 3.3 (downloaded yesterday)
Caffe version = 0.14 (download 2016.02.03)
cuDNN version = 4.0
CUDA version = 7.5
NVIDIA driver version = 352.79
OS version = ubuntu14.04.4

This is my training graph, that shows accuracy go to about 100%.
screenshot from 2016-03-11 08 39 08

and this is my result when using "Test a list of images"
screenshot from 2016-03-11 08 39 23

then, this is my result when using "/digits/example/classfication/use_archive.py"
I use the same picture in "val.txt", but result is not same as "Test a list of images"
screenshot from 2016-03-11 08 35 05

@ssilphf ssilphf changed the title accuracy show the wrong result on graph Accuracy show the wrong result on graph Mar 10, 2016
@lukeyeager lukeyeager added the bug label Mar 11, 2016
@gheinrich
Copy link
Contributor

Hi @ssilphf can you be more specific on the versions you are using:

  • DIGITS: go to your DIGITS directory and do git describe
  • Caffe: go to your Caffe directory and do git describe
  • CuDNN: did you install from deb package? If so, do dpkg -s libcudnn4. Otherwise let us know the file name of your libcudnn.so.4.0.x library.

@ssilphf
Copy link
Author

ssilphf commented Mar 14, 2016

I'm sorry, but maybe because I didn't use the comment git clone to get the file from github (I just use download ZIP), when I use git describe, that told my this:

fatal: Not a git repository (or any of the parent directories): .git

CuCNN: the file name is libcudnn.so.4.0.7

I find that the problem maybe not from DIGITS, and maybe from caffe or torch or itorch.
so I decide to reinstall all of my PC.

@ssilphf
Copy link
Author

ssilphf commented Mar 18, 2016

I had reinstall my all system and my new version is:

DIGITS:v3.2.0-55-ga13fdf8
Caffe:v0.14.3
CuDNN:libcudnn.so.4.0.7

but the problem still here.

And I use DIGITS2.3dev(other all is same as DIGITS3) on the same dataset,
then I got a different result.

@rajulanand
Copy link

rajulanand commented May 3, 2016

DIGITS: v3.1.0-49-g17668e3
Caffe: v0.14.2
CuDNN: libcudnn.so.4.0.4

I have encountered similar issue where one of my model trained using GoogLeNet (I modified the network to optimize the top 2 categories instead of top 5) with higher epochs shows high top2 and top1 accuracy(96.25,91.67). However, when I test a list of images, it shows results similar to posted above; top 1 prediction(and rest of them too) are same for all images(Image 1).

One other bug that i just noticed that was not all the images were picked up from the list, rather the results contained same image being predicted again and again in the results. This was the case even in other model where the prediction output was good(Image 2).
image
image

@gheinrich
Copy link
Contributor

Hello, can you try to upgrade to more recent software? There is a bug in
CuDNN 4.0.4 that causes issues when doing batched inference (
#536 (comment)) so you
should upgrade to version 4.0.7.

On Tue, May 3, 2016 at 1:11 PM, rajulanand notifications@github.com wrote:

DIGITS: v3.1.0-49-g17668e3
Caffe: v0.14.2
CuDNN: libcudnn.so.4.0.4

I have encountered similar issue where one of my model trained using
GoogLeNet (I modified the network to optimize the top 2 categories instead
of top 5) with higher epochs shows high top2 and top1
accuracy(96.25,91.67). However, when I test a list of images, it shows
results similar to posted above; top 1 prediction(and rest of them too) are
same for all images. One other bug that i just noticed that was not all the
images were picked up from the list, rather the results contained same
image being predicted again and again in the results. This was the case
even in other model where the prediction output was good.
[image: image]
https://cloud.githubusercontent.com/assets/11766583/14981581/af7f1c0a-114d-11e6-9e24-7c0ed022aa69.png
[image: image]
https://cloud.githubusercontent.com/assets/11766583/14981596/cfdbc412-114d-11e6-90d6-f6e12120c88e.png


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#625 (comment)

@rajulanand
Copy link

rajulanand commented May 6, 2016

Can you please suggest me how to upgrade cudnn to 4.0.7? I broke my existing digits, nv-caffe installation while trying to do that? I am on 15.04 so I have to build everything from source (I can't get to install 14.04 on my machine for reasons unknown to me). The cuda-7.5 installation contains default 4.0.4 and even after install 4.0.7 from deb file, the lib64 only contains files from 4.0.4. cmake of nv-caffe installations doesn't seem to find cudnn and variables CUDA_cublas_LIBRARY, CUDA_curand_LIBRARY are also set to not found. Any suggestions?

Update: I was able to resolve cudnn to latest version, however now I can't compile nv-caffe despite satisfying all the dependencies mentioned on github for compling nv-caffe.

Update II: I was able to compile nv-caffe but now digits fail with Intel MKL FATAL ERROR: Cannot load libmkl_avx2.so or libmkl_def.so

Update III: I was able to resolve all the issues and having working version of digits, nv-caffe with latest cudnn.

@rajulanand
Copy link

rajulanand commented May 9, 2016

@gheinrich
Caffe: v0.14.4-4-ge67ce3a
DIGITS: v3.3.0-14-gb989c7b
CuDNN: 4.0.7

After upgrading and everything, I am still getting the wrong prediction result on that particular model.
Note that, wrong prediction result happened in only one of the models so far which I trained very heavily on GoogLeNet ( few images, large epochs).
image

@gheinrich
Copy link
Contributor

Can you tell us what the validation accuracy was for this model and what the confusion matrix looks like when you are doing "classify many"? To get a confusion matrix you may use the val.txt file from your dataset job folder. The Top-1 accuracy you get when doing "classify many" should be exactly the same as the validation accuracy you get during training. Also let us know if you are using batch normalization in your model. Thanks!

@gheinrich
Copy link
Contributor

Closing due to inactivity.

@samansarraf
Copy link

Dear Greg, I am getting a significant difference between the accuracy shown on the plot / in the final log file and the one from Classify Many module. Specially, this discrepancy happens in the GoogleNet trained models. I see this issue in the LeNet models as well but can be tolerated. If you have found the solution , please advise as I really need to get this job done.
FYI: I installed the latest version of DIGITS via this instruction https://github.com/NVIDIA/DIGITS/blob/master/docs/UbuntuInstall.md and please let's assume I'm not a beginner.
To solve the above issue, I even resized my samples before using Classify Many but it still the results don't match. I also resized all my samples and retrained a new model from scratch and then used Classify Many module but didn't work neither. I am pretty sure there is something wrong in the feedforward network or the parameters sent to the feedforward network by DIGITS.
Your prompt help is much appreciated!

@gheinrich
Copy link
Contributor

Does "classify one" show the expected result? Can you give numbers (accuracy on validation set v.s. accuracy during inference)?

@samansarraf
Copy link

Greg - regarding the "classify one" and "classify many" , I have too many samples so that I tested a few of them randomly and the results are identical. It seems both produce the same accuracy.
I attached the accuracy I have on the plot and what I get from classify many. There is a huge difference. I believe what is on the plot is correct because I have trained the model with the original Caffe and pretty much the same number. During training, the accuracy is 98.49% but in the classify many results that is 83.14%.
(for some reason I remove the class and sample names)

googlenet_03_05_validation
googlenet_03_05_classifymany

@samansarraf
Copy link

@gheinrich Any update to your end ?

@samansarraf
Copy link

@gheinrich @rajulanand The issue came from cropping images by Caffe or Digits. For GoogleNet model, when you use 256x256 images, Caffe can implicitly handle cropping during training and validation somehow. But when you use the same samples for testing / predicition , it doesn't work. There are a few sources of the issue, in addition to the potential problem of cropping, the mean of training samples of 256x256 is different with the cropped images at 224 or cropping and it might not be fully correct. Because the mean is calculated for samples of 256 but is used for samples of 224.
I tested several combinations of training and testing samples and the only way works perfectly fine was to train the GoogleNet by already cropped / resized images at 224x224 and to test by simliar images (cropped or resized at 224x224).
This is the only approach that ensures you to have a correct mean of traning data and identical image preprocessing (sample dimension: 224x224). At least worked for me hope it helps. I suggest to DIGITS folks to update their documentation or double check the codes.

@gheinrich
Copy link
Contributor

I am surprised that this has such a dramatic impact. That indicates that your network might not generalize well to new examples. Don't you have too much correlation between your training and validation sets? Either way, you might want to use mean pixel subtraction instead of mean image subtraction.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

5 participants