Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training #25

Open
rustagiadi95 opened this issue Oct 19, 2018 · 9 comments
Open

Training #25

rustagiadi95 opened this issue Oct 19, 2018 · 9 comments

Comments

@rustagiadi95
Copy link

Can u tell me exact steps to train the model?
with all the datasets and upto what extent it should be trained along with learning rates and all...plz help me put brother

@Bartzi
Copy link
Owner

Bartzi commented Oct 19, 2018

Which dataset do you want to do experiments on? What did you try until now?

@rustagiadi95
Copy link
Author

https://drive.google.com/open?id=16PwjdAR7UWrovgHNumuLE_9u6Q7uyVj9
https://drive.google.com/open?id=1gkORLYpovnIQ2FNSD6YfPhBYzmhqsMID

These are the two versions of the net you created on the stn-ocr paper. Thy are practically same. You can make open either of them.
I am working on all of the datasets. Both the 32x32 without bonding box and only label dataset and the variable size multiple bounding box dataset. I have extracted the dataset out of the second one successfully too. Next I want to work upon the fsns dataset that you mentioned.
I tried to train the net on the 32x32 svhn dataset and the training losses are not good. I understand it is the first dataset that this net has encountered, I have used only 20000 images of this dataset and with 5 epochs. The learning rate range(0.00001 - 0.0000005) and the optimizer(SGD) u asked to work with has not shown me the results till this point. I am just really curious that if I trained it on the full training dataset (~73K images) of this dataset. Will it improve? and if I am gonna work, how may epochs should I use?
It is requiring a lot of computing power, that is why I am very cautious about this.
Secondly, what should I do to make it almost completely accurate?
I know these are a lot of questions, but I think ur research is really commendable and it should be get appreciation. Plz help out.

@Bartzi
Copy link
Owner

Bartzi commented Oct 22, 2018

Hmm,
looking at your code I can only say the following:

  • try to use a lower learning rate like 0.0001 or even 0.00001
  • increase your batch size! Will never work, because the network uses BatchNorm. A batch size of 32 should work quite nicely
  • try to use Adam instead of SGD. Adam converges more quickly.
  • try to create a similar tool like the BBoxPlotter that I created (you can find it in the insights folder). This tool lets you observe the progress of the training. It does so by using the network to do a prediction on a given image for each iteration of the training. This image is then saved to the hard disk, so you can inspect the state of the network at a given time step. With such a tool you can very quickly determine whether the network diverges or not. This is something you can not directly see from the loss values. So I highly recommend doing this!

@rustagiadi95
Copy link
Author

rustagiadi95 commented Oct 27, 2018 via email

@Bartzi
Copy link
Owner

Bartzi commented Oct 29, 2018

Yes I can have a look at some sample data, but you'll need to attach them 😉

@rustagiadi95
Copy link
Author

Sorry for that...i mailed you the data that time....i was thinking...that whether can we train the recognition part of net individually? without the localisation net?

@Bartzi
Copy link
Owner

Bartzi commented Nov 2, 2018

Oh you send me a mail with the data? I think I did not receive such a mail...
Could you send it again?
Of course you can train the recognition part without the localization part, but then your model will not be different from other recognition models. Or do I get you wrong?

@ArtifIQ
Copy link

ArtifIQ commented Nov 2, 2018

You got me right.
regarding the data, there is no need to disturb you with all the hassle of going through the data. I understand that that my model will not be different than any other model, but in my situation, I am already getting the localized images, not at the character level, but at the word level among the whole image.
But i still think that I would need the localization part if I wanna get the individual characters within localized word.
Anyways, i have some questions which i think i know the answer of but I wanna hear your answers on those questions...
q1) How the LSTM network in the localization net, will be able to distinguish that whether it has detected the same character/word in the previous timesteps or not, coz it is important to choose number of timesteps one would think will be needed in the image?
q2) Will the WHOLE model would work on char74k dataset?

@Bartzi
Copy link
Owner

Bartzi commented Nov 5, 2018

Okay, let me try to answer your questions:

  1. You cannot be entirely sure that the LSTM is able to distinguish that it already detected a character in a previous timestep, because there is no inhibition of return mechanism. We do know, however, that the LSTM is trained under a very harsh constraint. The loss for the whole network is the recognition loss of the network. In the case of locating and recognizing single characters from an already cropped text line, we explicitly tell the network to use another character if we train the system with SoftmaxCrossEntropy, if we use CTC Loss this constraint is not that harsh, the network only learns that it should span its localizations over the text region. So the number of timesteps is actually a hyperparameter. It depends on the languge you are dealing with. That's all I can tell your right now...
  2. The whole model should work on the char74K dataset. If you use one timestep for the localization network and only predict one character, it should be able to zoom into a single character and maybe increase recognition accuracy, but I'm not sure it will make a huge difference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants