Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training on Titanic has a NaN loss #615

Closed
JulienVig opened this issue Feb 1, 2024 · 6 comments · Fixed by #616
Closed

Training on Titanic has a NaN loss #615

JulienVig opened this issue Feb 1, 2024 · 6 comments · Fixed by #616
Assignees
Labels
bug Something isn't working discojs Related to Disco.js
Milestone

Comments

@JulienVig
Copy link
Collaborator

When training on the titanic task (and potentially other tasks), the loss is NaN at every epoch. The output of model.predict is only composed of NaN values. Therefore are no accuracy improvements throughout the epochs.
This happens for every training format: from the browser UI, local, federated etc.

@JulienVig JulienVig self-assigned this Feb 1, 2024
@JulienVig JulienVig added bug Something isn't working discojs Related to Disco.js labels Feb 1, 2024
@JulienVig
Copy link
Collaborator Author

Here is a code snippet returning a NaN loss during training:

const serverUrl = new URL('http://localhost:8080/')
const tasks = await fetchTasks(serverUrl)
const task = tasks.get('titanic') as Task
const dataset = await loadTitanicData(task)

const model = await getModel()
model.compile({
    optimizer: 'sgd',
    loss: 'categoricalCrossentropy',
    metrics: ['accuracy']
});
model.fitDataset(dataset.train.preprocess().batch().dataset, {epochs:1})

@JulienVig
Copy link
Collaborator Author

This commit refactored the data preprocessing structure and dropped support for tabular data preprocessing.

@martinjaggi
Copy link
Member

though the text loader is based on the csv (tabular) loader, shouldn't it still work?

@JulienVig
Copy link
Collaborator Author

I didn't see any dependency between text and tabular preprocessing, from what I've found the text preprocessing is standalone (it only tokenizes and handles padding) while the tabular preprocessing doesn't have any function implemented (while text and image have some).

I managed to solve the issue by implementing a very temporary preprocessing to handle missing values which were causing the NaNs. After the last preprocessing refactor, preprocessing now only handles data row by row which makes it very impractical to drop data or use overall aggregations to standardize for example.
I'm thinking of leaving the implementation of an actual preprocessing for later and address bugs first.

The model training caused weights to diverge to NaNs in some cases due to the default learning rate being too high (lowered it from 0.01 to 0.001).

@peacefulotter
Copy link
Collaborator

peacefulotter commented Feb 3, 2024

Could this be related to the issue I had when training on bigger gpt models, such as gpt2 and above, or did the sanitization preprocessing step entirely fixed it? Even outside of Disco, I would consistently have NaN loss values..

@JulienVig
Copy link
Collaborator Author

@peacefulotter the preprocessing was not enough, I also had to decrease the learning rate which was making the weights diverge. Have you tried fine-tuning the learning rate? I saw papers estimating the learning rate proportionally to the model's number of weights

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working discojs Related to Disco.js
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants