Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Version 1.0 #639

Merged
merged 64 commits into from
Mar 6, 2023
Merged

Version 1.0 #639

merged 64 commits into from
Mar 6, 2023

Conversation

NickleDave
Copy link
Collaborator

Going to go ahead and merge this into main so we can

  • release an alpha
  • develop it directly instead of having a long running version-1.0 branch that we juggle

I made a version 0.8 branch and will do any maintenance work on that until 1.0 is ready for release.

Change `lbl_tb2labels` to an instance of
`transforms.labeled_timebins.ToLabels`.
Fix needed after rebasing version 1.0 on #621
Remove since they are basically unused
by anyone but us, but require maintenance, test, etc.

Squash commits:
- Remove vak/entry_points.py
- Remove import of entry_points from vak/__init__.py
- Remove model and metric entry points from pyproject.toml
- Rewrite models/models.py to not use entry points
- Fix use of models.find in config/validators.py
- Fix use of find in config/models.py
- Remove entry point test in
  tests/test_models/test_teenytweetynet.py
- Fix entry point test in tests/test_models/test_tweetynet.py
to hopefully get full log of failures -- not seeing them now
Fix two unit tests to not look for TweetyNet datasets,
so they don't crash on CI

- Fix tests/test_models/test_base.py unit test that tests
  loading state dict from path, by using `model` fixture
  so it only runs with teenytweetynet.
- Fix tests/test_models/test_windowed_frame_classification_model.py
  unit test that tests definitions, by adding a MOCK_INPUT_SHAPE
  so we don't actually need to load a dataset to test.
  We really only used the dataset to get its input shape.
> The config allows instantiating multiple models
  for training / prediction but this was never fully implemented

Squash commits:
- Make config just use "model" option, not "models":
  - Remove `comma_separated_list` from converters
  - Change option name 'models' -> 'model' in config/valid.toml
  - Rewrite is_valid_model_name to test a single string,
    not a list of strings
  - Change attribute `models` -> `model` in config/eval.py
  - Change attribute `models` -> `model` in config/learncurve.py
  - Change attribute `models` -> `model` in config/predict.py
  - Change attribute `models` -> `model` in config/train.py
  - Rewrite/rename config.models -> model.config_from_toml_path,
    model.config_from_toml_dict
  - Fix option 'models' -> model = 'str',
    in all .toml files in tests/data_for_tests/configs
- Rewrite `models.from_model_config_map` as `models.get`:
  - Add src/vak/models/_api.py with BUILTIN_MODELS and
    MODEL_NAMES to use for validation in `models.get`
  - Rewrite models/models.py `from_model_config_map` as `models.get`
  - Import get and _api in vak/models/__init__.py
  - Rewrite core/train.py to take model_name and model_config
    then use models.get
  - Fix cli/train.py to pass model_name and model_config into core.train
  - Rewrite core/eval.py to take model_name and model_config
    then use models.get
  - Fix cli/eval.py to pass model_name and model_config into core.eval
  - Rewrite core/learncurve.py to take model_name and model_config
  - Fix cli/learncurve.py to pass model_name and model_config
    into core.learncurve
  - Rewrite core/predict.py to take model_name and model_config
    then use models.get
  - Fix cli/predict.py to pass model_name and model_config
    into core.predict
  - Make 'model' not 'models' required in src/vak/config/parse.py
  - Use models.MODEL_NAMES in src/vak/config/validators.py
  - Use models.MODEL_NAMES in config/model.py
- Fix tests
  - Fix tests to use vak.config.model.config_from_toml_path:
    - tests/test_models/test_windowed_frame_classification_model.py
    - tests/test_core/test_train.py
    - tests/test_core/test_predict.py
    - tests/test_core/test_learncurve.py
    - tests/test_core/test_eval.py
  - Fix test to use 'model' option not 'models' in
    tests/test_config/test_parse.py
  - Fix assert helper function in tests/test_core
    - test_eval.py
    - test_learncurve.py
    - test_predict.py
    - test_prep.py
    - test_train.py
  - Rewrite fixture module with constants we can import
    in test modules to parametrize: tests/fixtures/config.py
  - Add tests/test_config/test_model.py
After rebasing on top of main
where we added the ability to run
eval with and without
post-processing, we lost that in version 1.0
because the transform is not passed in
to the new model abstraction / backend,
even though everything is in place to do so.
This fixes that.

- Add post_tfm parameter to models.get
  - fix docstring to say PostProcess
  - pass post_tfm into Model.from_config
- Fix post_tfm_kwargs definition in EvalConfig docstring
  in src/vak/config/eval.py to say PostProcess transform
- In core/eval.py, fix post_tfm_kwargs docstring
  and actually pass post_tfm into models.get
- Fix validation_step method of WindowedFrameClassificationModel
  to match what engine.Model._eval does with post_tfm in version 0.x
Multiprocessing fails when it tries to pickle
instances of a subclass made with the
`vak.models.model` decorator, i.e. TweetyNet
and TeenyTweetyNet.

This is because we were giving all attributes
from the family we were subclassing to the subclass,
including `__module__`.
So the subclass would appear to have the `__module__`
of the superclass, and when pickle tried to
do attribute lookup on that module, it would fail.

The fix is to directly set `subclass.__module`
to `definition.__module__` after making the subclass
with type. After doing so, multiprocessing works.
Default should be None, not an empty dict.
Setting it to an empty dict causes logic in core/eval
to instantiate a PostProcess transform
with majority_vote = False and min_segment_dur = None,
so that we end up applying post-processing that does nothing,
and the computed metric is the same as
without "post-processing".
This sets the default to None,
which fixes the issue.
because `csv_path` is not very specific, and because
the path to a dataset may not always be
the path to a csv (although it is for now)

- Rename `csv_path` -> `dataset_path` in vak/config
- Rename `csv_path` -> `dataset_path` in vak/core
- Rename `csv_path` -> `dataset_path` in vak/cli
- Rename `csv_path` -> `dataset_path` in tests
@NickleDave NickleDave merged commit 877e55b into main Mar 6, 2023
@NickleDave NickleDave deleted the version-1.0 branch March 8, 2023 15:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant