vocalpy · NickleDave · Mar 6, 2023 · Nov 24, 2022 · Nov 25, 2022 · Nov 25, 2022
diff --git a/.github/workflows/ci-linux.yml b/.github/workflows/ci-linux.yml
@@ -11,7 +11,7 @@ jobs:
       fail-fast: false
       matrix:
         python-version: [3.8, 3.9, "3.10"]
-    runs-on: ubuntu-latest
+    runs-on: ubuntu-20.04
     steps:
       - uses: actions/checkout@v2
       - uses: actions/setup-python@v2
@@ -24,6 +24,6 @@ jobs:
         run: |
           nox -s test-data-download-source
           nox -s test-data-download-generated-ci
-          nox -s coverage -- running-on-ci
+          nox -s coverage --verbose -- running-on-ci
       - name: upload code coverage
         uses: codecov/codecov-action@v3
diff --git a/doc/CHANGELOG.md b/doc/CHANGELOG.md
@@ -4,6 +4,54 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
+## Unreleased (1.0.0)
+### Added
+- Use `lightning` framework as back end, replaces
+  `vak.engine.Model`
+  [#598](https://github.com/NickleDave/vak/pull/598).
+  Fixes [#597](https://github.com/NickleDave/vak/issues/597).
+  See discussion in [#359](https://github.com/NickleDave/vak/issues/359).
+- Make it easier to make an instance of a model
+  [#605](https://github.com/NickleDave/vak/pull/605).
+  Fixes [#362](https://github.com/NickleDave/vak/issues/362).
+- Add ways to define models and families of models
+  [#605](https://github.com/NickleDave/vak/pull/605).
+  Fixes [#406](https://github.com/NickleDave/vak/issues/406),
+  [#536](https://github.com/NickleDave/vak/issues/536), and 
+  [#603](https://github.com/NickleDave/vak/issues/603).
+- Add built-in TweetyNet model
+  [#605](https://github.com/NickleDave/vak/pull/605).
+  Fixes [#596](https://github.com/NickleDave/vak/issues/596).
+- Add logging of training time
+  [#628](https://github.com/NickleDave/vak/pull/628).
+  Fixes [#2](https://github.com/NickleDave/vak/issues/2).
+
+### Changed
+- Rename config file option `csv_path` to `dataset_path`, 
+  since it is more specific and allows for the possibility 
+  that a dataset is not always a csv file
+  [#632](https://github.com/NickleDave/vak/pull/632).
+  Fixes [#549](https://github.com/NickleDave/vak/issues/549).
+
+### Removed
+- Remove entry points since they are not being unused
+  outside the project but require maintenance and testing
+  [#621](https://github.com/NickleDave/vak/pull/621).
+  Fixes [#601](https://github.com/NickleDave/vak/issues/601).
+- Remove unused/incomplete functionality for training multiple models
+  [#625](https://github.com/NickleDave/vak/pull/625).
+  Fixes [#538](https://github.com/NickleDave/vak/issues/538).
+- Remove `engine` with `Model` class
+  [#627](https://github.com/NickleDave/vak/pull/627).
+  No longer used after switching to Lightning as backend in 
+  [#598](https://github.com/NickleDave/vak/pull/598).
+
+### Fixed
+- Fix functionality to evaluate model with and without
+  post-processing transform that was added in 
+  [#621](https://github.com/NickleDave/vak/pull/621).
+  Fixed in [#626](https://github.com/NickleDave/vak/pull/626).
+
 ## 0.8.1 -- 2023-03-02
 ### Fixed
 - Fix transform that converts labeled timebins to segments
@@ -95,8 +143,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - Refactor and speed up logic for determining whether a 
   dataset with sequence annotations has unlabeled segments 
   that should be assigned a "background" label
- [#559](https://github.com/NickleDave/vak/pull/559).
- Fixes [#243](https://github.com/NickleDave/vak/issues/243).
+  [#559](https://github.com/NickleDave/vak/pull/559).
+  Fixes [#243](https://github.com/NickleDave/vak/issues/243).
   - Adds a new sub-sub-package, `datasets.seq`
     with a `validators` module, which is where the 
     re-written `has_unlabeled` function now lives. 
@@ -110,8 +158,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
   so that the purpose of the functions is clearer, 
   and add clearer error messages with links to documentation 
   about file naming conventions 
- [#566](https://github.com/NickleDave/vak/pull/566).
- Fixes [#525](https://github.com/NickleDave/vak/issues/525).
+  [#566](https://github.com/NickleDave/vak/pull/566).
+  Fixes [#525](https://github.com/NickleDave/vak/issues/525).
 - Revise "autoannotate" tutorial to use .wav audio and .csv 
   annotation files from new release of Bengalese Finch Song 
   Repository, and to suggest that Windows users unpack 

diff --git a/doc/reference/models.md b/doc/reference/models.md
@@ -0,0 +1,36 @@
+(reference-models)=
+
+# Declaring models in vak
+
+This section of the reference explains the design 
+of the abstractions in vak for representing 
+deep learning and neural network models, 
+and the rationale behind that design.
+
+Goals for the design include:  
+- make it easy to test a particular model 
+that was developed for a specified task, 
+- make it easy to instantiate 
+and work with a model interactively,
+e.g. by feeding in a single input 
+and then visualizing the output 
+to directly inspect performance
+- to rely on a "backend" 
+that allows us to achieve these goals 
+and at the same time 
+provide more low-level, fine-grained control 
+when needed
+
+Since that last goal permits the first two,
+we discuss how we achieved it first.
+We have chosen to rely on the lightning framework.
+
+## Declaring a model
+
+To make it easy to declare a model we provide the following abstractions:
+- A model definition
+- Classes that represent a family of models, all developed for a specific task
+- A base model class, that knows how to make an isntance of a model given a definition
+
+## Instatiating a model
+
diff --git a/noxfile.py b/noxfile.py
@@ -26,7 +26,7 @@ def build(session: nox.Session) -> None:
     session.run("flit", "build")
 
 
-@nox.session
+@nox.session(python="3.10.7")
 def dev(session: nox.Session) -> None:
     """
     Sets up a python development environment for the project.
@@ -119,7 +119,7 @@ def test_data_download_source(session) -> None:
 TEST_DATA_GENERATE_SCRIPT = './tests/scripts/generate_data_for_tests.py'
 
 
-@nox.session(name='test-data-generate')
+@nox.session(name='test-data-generate', python="3.10")
 def test_data_generate(session) -> None:
     """
     Produced 'generated' test data, by running TEST_DATA_GENERATE_SCRIPT on 'source' test data.
@@ -151,10 +151,10 @@ def make_tarfile(name: str, to_add: list):
 
 PREP_CI = sorted(pathlib.Path(PREP_DIR).glob('*/*/teenytweetynet'))
 RESULTS_CI = sorted(pathlib.Path(RESULTS_DIR).glob('*/*/teenytweetynet'))
-GENERATED_TEST_DATA_CI_TAR = f'{GENERATED_TEST_DATA_DIR}generated_test_data-version-0.x.ci.tar.gz'
+GENERATED_TEST_DATA_CI_TAR = f'{GENERATED_TEST_DATA_DIR}generated_test_data-version-1.x.ci.tar.gz'
 GENERATED_TEST_DATA_CI_DIRS = [CONFIGS_DIR] + PREP_CI + RESULTS_CI
 
-GENERATED_TEST_DATA_ALL_TAR = f'{GENERATED_TEST_DATA_DIR}generated_test_data-version-0.x.tar.gz'
+GENERATED_TEST_DATA_ALL_TAR = f'{GENERATED_TEST_DATA_DIR}generated_test_data-version-1.x.tar.gz'
 GENERATED_TEST_DATA_ALL_DIRS = [CONFIGS_DIR, PREP_DIR, RESULTS_DIR]
 
 
@@ -176,7 +176,7 @@ def test_data_tar_generated_ci(session) -> None:
     make_tarfile(GENERATED_TEST_DATA_CI_TAR, GENERATED_TEST_DATA_CI_DIRS)
 
 
-GENERATED_TEST_DATA_ALL_URL = 'https://osf.io/532cs/download'
+GENERATED_TEST_DATA_ALL_URL = 'https://osf.io/uvgjt/download'
 
 
 @nox.session(name='test-data-download-generated-all')
@@ -191,12 +191,13 @@ def test_data_download_generated_all(session) -> None:
     with tarfile.open(GENERATED_TEST_DATA_ALL_TAR, "r:gz") as tf:
         tf.extractall(path='.')
     session.log('Fixing paths in .csv files')
+    session.install("pandas")
     session.run(
         "python", "./tests/scripts/fix_prep_csv_paths.py"
     )
 
 
-GENERATED_TEST_DATA_CI_URL = 'https://osf.io/g79sx/download'
+GENERATED_TEST_DATA_CI_URL = 'https://osf.io/un2zs/download'
 
 
 @nox.session(name='test-data-download-generated-ci')

diff --git a/pyproject.toml b/pyproject.toml
@@ -28,6 +28,7 @@ dependencies = [
     "dask >=2.10.1",
     "evfuncs >=0.3.4",
     "joblib >=0.14.1",
+    "pytorch-lightning >=1.8.4.post0",
     "matplotlib >=3.3.3",
     "numpy >=1.18.1",
     "scipy >=1.4.1",
@@ -49,7 +50,6 @@ dev = [
 test = [
     "pytest >=6.2.1",
     "pytest-cov >=2.11.1",
-    "tweetynet >=0.7.0",
 ]
 doc = [
     "furo >=2022.1.2",
@@ -68,18 +68,10 @@ Documentation = "https://vak.readthedocs.io"
 [project.scripts]
 vak = 'vak.__main__:main'
 
-[project.entry-points."vak.models"]
-TeenyTweetyNetModel = 'vak.models.teenytweetynet:TeenyTweetyNetModel'
-
-[project.entry-points."vak.metrics"]
-Accuracy = 'vak.metrics.Accuracy'
-Levenshtein = 'vak.metrics.Levenshtein'
-SegmentErrorRate = 'vak.metrics.SegmentErrorRate'
-
 [tool.flit.sdist]
 exclude = [
     "tests/data_for_tests"
 ]
 
 [tool.pytest.ini_options]
-filterwarnings = ["ignore:::.*torch.utils.tensorboard",]
+filterwarnings = ["ignore:::.*torch.utils.tensorboard",]
diff --git a/src/vak/__init__.py b/src/vak/__init__.py
@@ -20,15 +20,14 @@
     curvefit,
     datasets,
     device,
-    engine,
-    entry_points,
     files,
     io,
     labeled_timebins,
     labels,
     logging,
     metrics,
     models,
+    nets,
     nn,
     paths,
     plot,
@@ -37,13 +36,12 @@
     tensorboard,
     timebins,
     timenow,
+    trainer,
     transforms,
     typing,
     validators,
 )
 
-from .engine.model import Model
-
 
 __all__ = [
     "__main__",
@@ -55,15 +53,12 @@
     "csv",
     "datasets",
     "device",
-    "engine",
-    "entry_points",
     "files",
     "io",
     "labeled_timebins",
     "labels",
     "logging",
     "metrics",
-    "Model",
     "models",
     "nn",
     "paths",
@@ -73,6 +68,7 @@
     "tensorboard",
     "timebins",
     "timenow",
+    "trainer",
     "transforms",
     "typing",
     "validators",

diff --git a/src/vak/cli/eval.py b/src/vak/cli/eval.py
@@ -43,18 +43,20 @@ def eval(toml_path):
 
     logger.info("Logging results to {}".format(cfg.eval.output_dir))
 
-    model_config_map = config.models.map_from_path(toml_path, cfg.eval.models)
+    model_name = cfg.eval.model
+    model_config = config.model.config_from_toml_path(toml_path, model_name)
 
-    if cfg.eval.csv_path is None:
+    if cfg.eval.dataset_path is None:
         raise ValueError(
-            "No value is specified for 'csv_path' in this .toml config file."
+            "No value is specified for 'dataset_path' in this .toml config file."
             f"To generate a .csv file that represents the dataset, "
             f"please run the following command:\n'vak prep {toml_path}'"
         )
 
     core.eval(
-        cfg.eval.csv_path,
-        model_config_map,
+        model_name=model_name,
+        model_config=model_config,
+        dataset_path=cfg.eval.dataset_path,
         checkpoint_path=cfg.eval.checkpoint_path,
         labelmap_path=cfg.eval.labelmap_path,
         output_dir=cfg.eval.output_dir,

diff --git a/src/vak/cli/learncurve.py b/src/vak/cli/learncurve.py
@@ -50,20 +50,22 @@ def learning_curve(toml_path):
     log_version(logger)
     logger.info("Logging results to {}".format(results_path))
 
-    model_config_map = config.models.map_from_path(toml_path, cfg.learncurve.models)
+    model_name = cfg.learncurve.model
+    model_config = config.model.config_from_toml_path(toml_path, model_name)
 
-    if cfg.learncurve.csv_path is None:
+    if cfg.learncurve.dataset_path is None:
         raise ValueError(
-            "No value is specified for 'csv_path' in this .toml config file."
+            "No value is specified for 'dataset_path' in this .toml config file."
             f"To generate a .csv file that represents the dataset, "
             f"please run the following command:\n'vak prep {toml_path}'"
         )
 
     core.learning_curve(
-        model_config_map,
+        model_name=model_name,
+        model_config=model_config,
         train_set_durs=cfg.learncurve.train_set_durs,
         num_replicates=cfg.learncurve.num_replicates,
-        csv_path=cfg.learncurve.csv_path,
+        dataset_path=cfg.learncurve.dataset_path,
         labelset=cfg.prep.labelset,
         window_size=cfg.dataloader.window_size,
         batch_size=cfg.learncurve.batch_size,

diff --git a/src/vak/cli/predict.py b/src/vak/cli/predict.py
@@ -38,20 +38,22 @@ def predict(toml_path):
     log_version(logger)
     logger.info("Logging results to {}".format(cfg.prep.output_dir))
 
-    model_config_map = config.models.map_from_path(toml_path, cfg.predict.models)
+    model_name = cfg.predict.model
+    model_config = config.model.config_from_toml_path(toml_path, model_name)
 
-    if cfg.predict.csv_path is None:
+    if cfg.predict.dataset_path is None:
         raise ValueError(
-            "No value is specified for 'csv_path' in this .toml config file."
+            "No value is specified for 'dataset_path' in this .toml config file."
             f"To generate a .csv file that represents the dataset, "
             f"please run the following command:\n'vak prep {toml_path}'"
         )
 
     core.predict(
-        csv_path=cfg.predict.csv_path,
+        model_name=model_name,
+        model_config=model_config,
+        dataset_path=cfg.predict.dataset_path,
         checkpoint_path=cfg.predict.checkpoint_path,
         labelmap_path=cfg.predict.labelmap_path,
-        model_config_map=model_config_map,
         window_size=cfg.dataloader.window_size,
         num_workers=cfg.predict.num_workers,
         spect_key=cfg.spect_params.spect_key,