Skip to content

v2.0.0.dev1

Latest
Compare
Choose a tag to compare
@chschroeder chschroeder released this 24 Nov 19:15
· 10 commits to main since this release

This intermediate release serves as a preliminary version of the upcoming v2.0.0. Consider it an alpha release, where interface changes are still possible.

Added

  • General
    • Python requirements raised to Python 3.8 since Python 3.7 has reached end of life on 2023-06-27.
    • Dropped torchtext as an integration dependency. For individual use cases it can of course still be used.
    • Added environment variables SMALL_TEXT_PROGRESS_BARS and SMALL_TEXT_OFFLINE to control the default behavior for progress bars and model downloading.
  • PoolBasedActiveLearner:
    • initialize_data() has been replaced by initialize() which can now also be used to provide an initial model in cold start scenarios. (#10)
  • Classification:
    • All PyTorch-classifiers (KimCNN, TransformerBasedClassification, SetFitClassification) now support torch.compile() which can be enabled on demand. (Requires PyTorch >= 2.0.0).
    • All PyTorch-classifiers (KimCNN, TransformerBasedClassification, SetFitClassification) now support Automatic Mixed Precision.
    • SetFitClassification.__init__() now has a verbosity parameter (similar to TransformerBasedClassification) through which you can control the progress bar output of SetFitClassification.fit().
    • TransformerBasedClassification:
      • Removed unnecessary token_type_ids keyword argument in model call.
      • Additional keyword args for config, tokenizer, and model can now be configured.
  • Embeddings:
    • Prevented unnecessary gradient computations for some embedding types and unified code structure.
  • Pytorch:
    • Added an inference_mode() context manager that applies torch.inference_mode or torch.no_grad for older Pytorch versions.
  • Query Strategies:
  • Vector Index Functionality:
    • A new vector index API provides implementations over a unified interface to use different implementations for k-nearest neighbor search.
    • Existing strategies that used a hard-coded vector search ([ContrastiveActiveLearning][contrastive_active_learning], [SEALS][seals], [AnchorSubsampling][anchor_subsampling]) have been adapted and can now be used with different vector index implementations.

Fixed

  • Fixed a bug where the clone() operation wrapped the labels, which then raised an error. This affected the single-label scenario for PytorchTextClassificationDataset and TransformersDataset. (#35)
  • Fixed a bug where the batching in greedy_coreset() and lightweight_coreset() resulted in incorrect batch sizes. (#50)
  • Fixed a bug where lightweight_coreset() failed when computing the norm of the elementwise mean vector.

Changed

  • General
    • Moved split_data() method from small_text.data.datasets to small_text.data.splits.
  • Dependencies
    • Raised setfit version to 1.1.0.
  • Classification:
    • The initialize() methods of all PyTorch-classifiers (KimCNN, TransformerBasedClassification, SetFitClassification) are now more unified. (#57)
    • KimCNNClassifier / TransformerBasedClassification: model selection is now disabled by default. Also, it no longer saves models when disabled, thereby greatly reducing the runtime.
  • Utils
    • init_kmeans_plusplus_safe() now supports weighted kmeans++ initialization for scikit-learn>=1.3.0.

Removed

  • Deprecated functionality
    • Removed default_tensor_type() method.
    • Removed small_text.utils.labels.get_flattened_unique_labels().
    • Removed small_text.integrations.pytorch.utils.labels.get_flattened_unique_labels().
    • Classification
      • Removed early stopping legacy arguments in __init__() for KimCNN and TransformerBasedClassification. (Use fit() keyword arguments instead.)
      • Removed model selection legacy argument in TransformerBasedClassification.__init__().
  • The explicit installation instruction for conda was removed, but the small-text conda-forge package will remain.