Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loss and scoring changes #14

Merged
merged 7 commits into from
Jun 9, 2017

Conversation

leaprovenzano
Copy link
Contributor

re part 1 and 6 of #13 ..

part1 loss

I have added the optimize for validation loss. devol.run now has a param metric which defaults to accuracy but can be set to loss. In which case the devol.set_objective will set it's properties accordingly to use a minimizing objective in various parts of the code as follows:

  • when initializing a Population, the min flag is passed and after min/max scaling we flip the values. So we don't need to make changes to the other Population methods or worry too much about operators.
        scores = fitnesses - fitnesses.min()
        scores /= scores.max()
        if obj is 'min':
            scores = 1 - scores
  • dvol.bssf is now in the format model, loss, accuracy. loss, accuracy is the way keras returns from model.evaluate() so this seemed like a logical choice. The demo notebook has been updated to reflect this.

  • Obviously run uses the loss to evaluate best model and tqdm output in tells us if it's measuring loss or accuracy

the readme has been updated to mention this and i have tried to add some docstring into run while I was at it.


part2 changes to population scoring.

as mentioned in #13 . I have removed the default scoring method from the Population class in favor of just min-max scaling the fitnesses scores (losses or accuracies). As before the devol.run accepts a fitness parameter defaulting to None and passes that to Population.__init__ which will effect the scores if provided. The big change worth mentioning here is that the fitnesses passed to Population.__init__ are now a numpy array. I did this for a couple reasons:

  1. a user created scoring function that only looks at a single value may be less useful than one looking at the entire score space.
  2. most basic math stuff can be called on an array just as easily as on a single value.

I have mentioned the fact that any scoring function given to devol.run should expect a numpy array in the docstring i wrote.

also..

I also updated the notebook as it was not checking the backend before setting the image shape. I have noticed there are also no backend checks for BatchNormalization in genome_handler.py which could cause issues for theano people but I will address that in a seperate commit.

Copy link
Owner

@joeddav joeddav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I commented with a few minor revisions on comments, readme, etc., but the code looks good overall.

I'm also going to experiment a little and see how the different scoring method affects results. If you have any experience with that already I'd love to hear it.

readme.md Outdated
@@ -1,4 +1,4 @@
# DEvol - Deep Neural Network Evolution
f# DEvol - Deep Neural Network Evolution
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you've got a typo here

devol.py Outdated
def run(self, dataset, num_generations, pop_size, epochs, fitness=None):
# Returns best model found in the form of (model, loss, accuracy)
def run(self, dataset, num_generations, pop_size, epochs, fitness=None, metric='accuracy'):
"""run genetic search on dataset given number of generations and population size
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These comments are great - it'd probably be good to get rid of the preexisting ones on line 43-44 since these take their place

demo.ipynb Outdated
"outputs": [],
"source": [
"# set image format so that it's compatable with backend\n",
"image_format = (28, 28, 1) if K.image_data_format() is 'channels_last' else (1, 28, 28 )\n",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be cleaner here to just go K.set_image_data_format("channels_first") and keeping (1, 28, 28) rather than changing the shape of the images based on their current setting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow I was unaware K.set_image_data_format() actually changes the fact what tf or theano expects, I was under the impression sets the format in keras. but i can see that it works* in tf.

I have to very very respectfully put my two cents in here and say that generally I think people use one main backend and are used to seeing dimensions ordered that way. I personally think a backend check is more polite to other backend users, and it's also the way it's handled in keras examples, I think i'd find it a bit annoying if someone changed my expected dim ordering.

But of course it's just a demo and if theano dims are your preference that's totally fine. Maybe we should just comment that we're changing the dims to theano style if we do that, particularly since it's a one liner it could be missed and could be a faff for someone used to using tf and who wants to play with the notebook and import some of their own images etc... also for new users this backend dimensions thing can be confusing.

#setting backend dims to theano format because ...
K.set_image_data_format("channels_first")

what do you think?
if you want me to just shut up and do K.set_image_data_format("channels_first") that fine.

*I am concerned about the batch normalization though for theano since it should be checking backend in genome_handler and setting axis to 1 for theano users since it's using the default which is -1 for tensorflow. Since you seem to use theano I would think this would be messing up your results on batch normalized networks? Have you had any issues?

I can open a separate PR for this... but I'm sure you'll agree that while changing dim ordering is fine in demo we should not be changing backend dim ordering in actual library since it would really mess people up and need to implement a proper check and a global or something for bn axis.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, K.set_image_data_format doesn't permanently change your backend configuration in your keras.conf file or anything - it just changes it within the scope that you're in, so I don' think it'll be a problem. If people prefer it the other way then they can just swap them. I'm more concerned about the code looking simple and straightforward, and inferring the backend is much less conventional and straightforward looking than setting it. That said, it may be best to set it to channels_last as I believe that's tensorflow's default, and therefore more people will be using it that way. A comment is fine if you'd like, but I don't think it's needed.

Until you mentioned it, I've never heard of problems with BN with Theano... Is it not supported or something? It doesn't say anything like that in the Keras docs. I've used Theano and Tensorflow and have never had problems (at least not that I was aware of) with it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool I shall update to channels_last this as you say.

Re BN keras default axis for bn is -1 (channels_last) so if you're using Theano and channels_first config it should be set to 1 to normalize along the feature axis.

see docstring at BatchNormalization

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow. Great catch. Yeah, we should add that in.

"source": [
"num_generations = 10\n",
"population_size = 20\n",
"num_epochs = 1\n",
"\n",
"devol = DEvol(genome_handler)\n",
"model, accurracy = devol.run(dataset, num_generations, population_size, num_epochs)\n",
"print model.summary()"
"model, accurracy, loss = devol.run(dataset, num_generations, population_size, num_epochs)\n",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also needs to be added to demo.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

silly me, I didn't even look in there! I have updated it for the return values of devol.run and also updated the print function to be python3 compatible. What I haven't done is the backend thing... I'll await your response to above and make those changes whatever changes you think are right.

@leaprovenzano
Copy link
Contributor Author

I'm also going to experiment a little and see how the different scoring method affects results. If you have any experience with that already I'd love to hear it.

I'm also going to experiment a little and see how the different scoring method affects results. If you have any experience with that already I'd love to hear it.

I haven't experimented too much with different scoring functions but would be interested to hear your results. I was thinking last night it might be good to be able to access loss training loss/acc to do some heuristic based on overfitting... but this would require some changes etc and im not sure it's worth it.

basically for me using the loss made a big difference especially as i was using a smaller dataset with imbalanced classes, and a small number of labels where accuracy tended not to change very much but loss had much more movement.

for a while I was kind of thinking that maybe the min max scaling is a bit drastic as it essentially kills a member of the population, but it's probably for the best.... and certainly speeds up search I would think.

After some changes in the GenomeHandler to use the functional API (see last comment in #13), i have prelu & leaky and I've been messing around a lot with network structure. I have added different kernel sizes, and factorized convs and i've found this gives me good results.

I'm wondering if it would be worth trying a version of the net that works with conv blocks in place of layers , particularly since implementing factorized convs has done so well. ...like a parameter for convs_per_block (like [1 -3]). this would also allow sticking more complex stuff like residual blocks or even inception blocks in as long as they are identified in some parameter mapping so i may try it out.

@joeddav
Copy link
Owner

joeddav commented Jun 7, 2017

I've messed around with it a little bit, and I think there is some difference in the function I used, but I'm not necessarily convinced that the way we did it before was better. I like your approach (with the option of providing a heuristic scoring function) better.

I'm 100% cool with using the functional API, especially if it does make it easier to add advanced activations. I think that it'll be necessary to do that eventually anyway honestly. That'd probably be good in a separate PR, however. This one's already already pretty big.

If you can resolve the dimension ordering stuff, we should be good to merge I believe.

@leaprovenzano
Copy link
Contributor Author

hey, i've sorted the last bits with tf channels etc.... I'll do a new PR for stuff in genome_handler with the func api and advanced activations, global pooling etc as soon as i get a chance, but busy for a bit so it might be a couple days.

@joeddav joeddav merged commit 29328e6 into joeddav:master Jun 9, 2017
@leaprovenzano leaprovenzano deleted the loss_and_scoring_changes branch June 9, 2017 06:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants