Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add learning rate schedulers #605

Closed
wants to merge 9 commits into from

Conversation

gordicaleksa
Copy link
Contributor

@gordicaleksa gordicaleksa commented Jun 17, 2024

Refactored the learning rate schedulers code - we'll keep all definitions inside "schedulers.h" as per our offline discussion.

Supported LR schedulers:

@karpathy
Copy link
Owner

So I think we'll want to identify the scheduler using a string e.g. "cosine" from argparse, and inside the schedulers file should strcmp and build the right one.

@karpathy
Copy link
Owner

Note there's also a previous PR on learning rate schedulers, including one that is trapezoidal.

@gordicaleksa
Copy link
Contributor Author

Argh apologies i saw your todo list and just picked it up assuming no one has done it, plus it's such an orthogonal contribution i thought it would have been merged by now - will check the PRs before submitting the next time :)

@gordicaleksa
Copy link
Contributor Author

#508 <- is this the one?

@eliebak
Copy link
Contributor

eliebak commented Jun 19, 2024

Yes i think it was mine, but your PR is much nicer nw aha. It can be nice to identify the schedule from argparse, and if we use trapezoidal, you also have to define the number of decay steps that you want. I can close mine and we can add this features to this one?

@gordicaleksa
Copy link
Contributor Author

Cool!

I can quickly implement the identification via a string. And happy to add trapezoidal as well. Let's see what Andrej thinks about the next steps.

@gordicaleksa
Copy link
Contributor Author

Pushed a very ugly switching between the schedulers based on the input argument.

In order to make this closer to what we're used to in Python/C++ we basically have to implement a mini virtual function pointer logic in order to allow for object oriented features.

Happy to do it, and won't be hard, just wanted to double check the decision makes sense because we'll be dealing with func pointers and i don't think we've done previously in the codebase.

cc: @karpathy

@gordicaleksa
Copy link
Contributor Author

gordicaleksa commented Jun 19, 2024

It would look something along the lines of:

typedef struct LRScheduler {
    void (*initialize)(struct LRScheduler* self, double initial_rate);
    void (*step)(struct LRScheduler* self);
    double (*get_rate)(struct LRScheduler* self);
    double current_rate;
} LRScheduler;

too much pointers? :')

@@ -1546,6 +1552,28 @@ int main(int argc, char *argv[]) {
Tokenizer tokenizer;
tokenizer_init(&tokenizer, "gpt2_tokenizer.bin");

// set up learning rate scheduler
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could move this new block inside the schedulers file. and the block below as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, simplified the code, it's ready.


#include <assert.h>
#include <math.h>
#include <cstring>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

string.h, if you stick with C headers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

// switch to the appropriate learning rate scheduler
float get_learning_rate(LRSchedulerType lr_scheduler_type, LearningRateScheduler *scheduler, int step) {
float step_learning_rate;
if (lr_scheduler_type == LR_SCHEDULER_COSINE) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

switch/case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think there is any advantage using it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with switch, you usually get IDE warnings if you forget one enum value, not sure if you get the same with ifs. Otherwise, its pretty much equivalent.

@karpathy
Copy link
Owner

forked this work into new PR
#627
made simplifications
added wsd
closing this one

@karpathy karpathy closed this Jun 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants