-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training on Titanic has a NaN loss #615
Comments
Here is a code snippet returning a NaN loss during training: const serverUrl = new URL('http://localhost:8080/')
const tasks = await fetchTasks(serverUrl)
const task = tasks.get('titanic') as Task
const dataset = await loadTitanicData(task)
const model = await getModel()
model.compile({
optimizer: 'sgd',
loss: 'categoricalCrossentropy',
metrics: ['accuracy']
});
model.fitDataset(dataset.train.preprocess().batch().dataset, {epochs:1}) |
This commit refactored the data preprocessing structure and dropped support for tabular data preprocessing. |
though the text loader is based on the csv (tabular) loader, shouldn't it still work? |
I didn't see any dependency between text and tabular preprocessing, from what I've found the text preprocessing is standalone (it only tokenizes and handles padding) while the tabular preprocessing doesn't have any function implemented (while text and image have some). I managed to solve the issue by implementing a very temporary preprocessing to handle missing values which were causing the NaNs. After the last preprocessing refactor, preprocessing now only handles data row by row which makes it very impractical to drop data or use overall aggregations to standardize for example. The model training caused weights to diverge to NaNs in some cases due to the default learning rate being too high (lowered it from 0.01 to 0.001). |
Could this be related to the issue I had when training on bigger gpt models, such as gpt2 and above, or did the sanitization preprocessing step entirely fixed it? Even outside of Disco, I would consistently have NaN loss values.. |
@peacefulotter the preprocessing was not enough, I also had to decrease the learning rate which was making the weights diverge. Have you tried fine-tuning the learning rate? I saw papers estimating the learning rate proportionally to the model's number of weights |
When training on the titanic task (and potentially other tasks), the loss is NaN at every epoch. The output of
model.predict
is only composed of NaN values. Therefore are no accuracy improvements throughout the epochs.This happens for every training format: from the browser UI, local, federated etc.
The text was updated successfully, but these errors were encountered: