-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add learning rate schedulers #605
Conversation
So I think we'll want to identify the scheduler using a string e.g. "cosine" from argparse, and inside the schedulers file should |
Note there's also a previous PR on learning rate schedulers, including one that is trapezoidal. |
Argh apologies i saw your todo list and just picked it up assuming no one has done it, plus it's such an orthogonal contribution i thought it would have been merged by now - will check the PRs before submitting the next time :) |
#508 <- is this the one? |
Yes i think it was mine, but your PR is much nicer nw aha. It can be nice to identify the schedule from argparse, and if we use trapezoidal, you also have to define the number of decay steps that you want. I can close mine and we can add this features to this one? |
Cool! I can quickly implement the identification via a string. And happy to add trapezoidal as well. Let's see what Andrej thinks about the next steps. |
d430c24
to
c23ad26
Compare
Pushed a very ugly switching between the schedulers based on the input argument. In order to make this closer to what we're used to in Python/C++ we basically have to implement a mini virtual function pointer logic in order to allow for object oriented features. Happy to do it, and won't be hard, just wanted to double check the decision makes sense because we'll be dealing with func pointers and i don't think we've done previously in the codebase. cc: @karpathy |
It would look something along the lines of:
too much pointers? :') |
@@ -1546,6 +1552,28 @@ int main(int argc, char *argv[]) { | |||
Tokenizer tokenizer; | |||
tokenizer_init(&tokenizer, "gpt2_tokenizer.bin"); | |||
|
|||
// set up learning rate scheduler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could move this new block inside the schedulers file. and the block below as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, simplified the code, it's ready.
llmc/schedulers.h
Outdated
|
||
#include <assert.h> | ||
#include <math.h> | ||
#include <cstring> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
string.h, if you stick with C headers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
// switch to the appropriate learning rate scheduler | ||
float get_learning_rate(LRSchedulerType lr_scheduler_type, LearningRateScheduler *scheduler, int step) { | ||
float step_learning_rate; | ||
if (lr_scheduler_type == LR_SCHEDULER_COSINE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
switch/case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think there is any advantage using it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with switch, you usually get IDE warnings if you forget one enum value, not sure if you get the same with ifs. Otherwise, its pretty much equivalent.
c19abeb
to
716808d
Compare
forked this work into new PR |
Refactored the learning rate schedulers code - we'll keep all definitions inside "schedulers.h" as per our offline discussion.
Supported LR schedulers: