-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Running vak 1.0.0a1 with device set to CPU crashes #687
Comments
Quoting myself from the forum post:
From the traceback above I can see that |
Thank you for the nice solution, this will be very easy to implement. |
Thank you @JacquelineGoe for reporting this bug on the forum! |
@all-contributors please add @JacquelineGoe for bug |
I've put up a pull request to add @JacquelineGoe! 🎉 |
* WIP: Add config/trainer.py with TrainerConfig * Rename common.device -> common.accelerator, return 'gpu' not 'cuda' if torch.cuda.is_available * Fix config section in doc/api/index.rst * Import trainer and TrainerConfig in src/vak/config/__init__.py, add to __all__ * Add pytorch-lightning to intersphinx in doc/conf.py * Fix cross-ref in docstring in src/vak/prep/frame_classification/make_splits.py: :constant: -> :const: * Make lightning a dependency, instead of pytorch_lightning; import lightning.pytorch everywhere instead of pytorch_lightning as lightning -- trying to make it so we can resolve API correctly in docstrings * Fix in doc/api/index.rst: common.device -> common.accelerator * Finish writing TrainerConfig class * Add tests for TrainerConfig class * Add trainer sub-table to all configs in tests/data_for_tests/configs * Add trainer sub-table to all configs in doc/toml * Add trainer sub-table in config/valid-version-1.0.toml, rename -> valid-version-1.1.toml * Remove device key from top-level tables in config/valid-version-1.1.toml * Remove device key from top-level tables in tests/data_for_tests/configs * Remove 'device' key from configs in doc/toml * Add 'trainer' attribute to EvalConfig, an instance of TrainerConfig; remove 'device' attribute * Add 'trainer' attribute to PredictConfig, an instance of TrainerConfig; remove 'device' attribute * Add 'trainer' attribute to TrainConfig, an instance of TrainerConfig; remove 'device' attribute * Fix typo in docstring in src/vak/config/train.py * Add 'trainer' attribute to LearncurveConfig, an instance of TrainerConfig; remove 'device' attribute. Also clean up docstring, removing attributes that no longer exist * Remove device attribute from TrainConfig docstring * Fix VALID_TOML_PATH in config/validators.py -> 'valid-version-1.1.toml' * Fix how we instantiate TrainerConfig classes in from_config_dict method of EvalConfig/LearncurveConfig/PredictConfig/TrainConfig * Fix typo in src/vak/config/valid-version-1.1.toml: predictor -> predict * Fix unit tests after adding trainer attribute that is instance of TrainerConfig * Change src/vak/train/frame_classification.py to take trainer_config argument * Change src/vak/train/parametric_umap.py to take trainer_config argument * Change src/vak/train/train_.py to take trainer_config argument * Fix src/vak/cli/train.py to pass trainer_config.asdict() into vak.train.train_.train * Replace 'device' with 'trainer_config' in vak/eval * Fix cli.eval to pass trainer_config into eval.eval_.eval * Replace 'device' with 'trainer_config' in vak/predict * Fix cli.predict to pass trainer_config into predict.predict_.predict * Replace 'device' with 'trainer_config' in vak/learncurve * Fix cli.learncurve to pass trainer_config into learncurve.learncurve.learning_curve * Rename/replace 'device' fixture with 'trainer' fixture in tests/ * Use config.table.trainer attribute throughout tests, remove config.table.device attribute that no longer exists * Fix value for devices in fixtures/trainer.py: when device is 'cpu' trainer must be > 0 * Fix default devices value for when accelerator is cpu in TrainerConfig * Fix unit tests for TrainerConfig after fixing default devices for accelerator=cpu * Fix default value for 'devices' set to -1 in some unit tests where we over-ride config in toml file * fixup use config.table.trainer attribute throughout tests -- missed one place in tests/test_eval/ * Add back 'device' fixture so we can use it to test Model class * Fix unit tests in test_models/test_base.by that literally used device to put tensors on device, not to change a config * Fix assertion in tests/test_models/test_tweetynet.py, from where we switched to using lightning as the dependency * Fix test for DiceLoss, change trainer_type fixture back to device fixture
Fixed by #752 |
Describe the bug
Running
vak train
with version 1.0.0a1 and the device set to'cpu'
causes a crash, as described in this forum post: https://forum.vocalpy.org/t/errors-in-training-and-predicting-when-the-newest-version-of-vak-is-installed/71It produces the following traceback:
As reported in the forum post, this probably affects all CLI commands besides prep: predict, eval, learncurve -- I did not verify though.
To Reproduce
Steps to reproduce the behavior:
$ mamba create -n vak-env python vak -c pytorch -c conda-forge
[TRAIN]
table that specifiesdevice = 'cpu'
(full file is attached)
vak train config.toml
Expected behavior
vak train
should run without crashingDesktop (please complete the following information):
Additional context
What's going on here is that:
get_default_trainer
get_trainer
, ifdevice
is not set tocuda
, we default toNone
, butNone
is not a valid option foraccelerator
(the argument used when instantiatingTrainer
)vocalpy/vak/blob/3dcce70030ae9b1fd6d040e055def0d656a7512e/src/vak/trainer.py#L60
TeenyTweetyNet_train_audio_cbin_annot_notmat.zip
The text was updated successfully, but these errors were encountered: