Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MNT, ENH, DOC] Rework similarity search #2473

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

baraline
Copy link
Member

Reference Issues/PRs

Fixes #2341, #2236, #2028, #2020, #1806

What does this implement/fix? Explain your changes.

The previous structure for similarity search was not in line with the structure we would expect considering other aeon modules, the lack of distinct base classes for some tasks, as well as the initial design choice (due to the lack of practical experience with using and expanding the module) lead to some really complex code when working on #2341 to make everything work together. Further expanding the module would have made thing worse.

To make the module more flexible and comprehensible, the following rework is proposed in this PR (AEP to be updated acordingly):

  • Focus the module on two distinct tasks : find_neighbors and find_motifs for all type for similarity search estimators
  • Distinguish between two kind of similarity search tasks with the two submodules, SeriesSearch and SubsequencesSearch. The SubsequencesSearch focuses on tasks for which the goal is to find motifs or neighbors in subsequences of time series (e.g. Matrix Profiles, Motiflets, etc.). the SeriesSearch focuses on task using whole series (e.g. Indexes such as LSH, iSAX, etc.)
  • Keep the flexibility of the k, threshold and inverse_distance parameters to customise search outputs
  • Have base classes for families of method to limit code duplication (e.g. BaseMatrixProfile, and STOMP, where most existing code was ported)

Does your contribution introduce a new dependency? If yes, which one?

No.

Any other comments?

As this is still a WIP, I would love some inputs on the structure (notably from @patrickzib !) to make the module more future-proof to future additions and easier to use.

TODO list :

  • Finish to include testing suite for base estimators in the testing module for the SubsequenceSearch part
  • Implement LSH index as a simple first case for BaseIndexSearch
  • Implement tests for SeriesSearch base class and estimators
  • Update API docs / doc pages
  • Update notebooks
  • updated aeon's CODEOWNERS to receive notifications about future changes to these files.

@baraline baraline linked an issue Dec 26, 2024 that may be closed by this pull request
@aeon-actions-bot aeon-actions-bot bot added documentation Improvements or additions to documentation enhancement New feature, improvement request or other non-bug code enhancement maintenance Continuous integration, unit testing & package distribution similarity search Similarity search package labels Dec 26, 2024
@aeon-actions-bot
Copy link
Contributor

Thank you for contributing to aeon

I have added the following labels to this PR based on the title: [ $\color{#F3B9F8}{\textsf{documentation}}$, $\color{#FEF1BE}{\textsf{enhancement}}$, $\color{#EC843A}{\textsf{maintenance}}$ ].
I have added the following labels to this PR based on the changes made: [ $\color{#006b75}{\textsf{similarity search}}$ ]. Feel free to change these if they do not properly represent the PR.

The Checks tab will show the status of our automated tests. You can click on individual test runs in the tab or "Details" in the panel below to see more information if there is a failure.

If our pre-commit code quality check fails, any trivial fixes will automatically be pushed to your PR unless it is a draft.

Don't hesitate to ask questions on the aeon Slack channel if you have any.

PR CI actions

These checkboxes will add labels to enable/disable CI functionality for this PR. This may not take effect immediately, and a new commit may be required to run the new configuration.

  • Run pre-commit checks for all files
  • Run mypy typecheck tests
  • Run all pytest tests and configurations
  • Run all notebook example tests
  • Run numba-disabled codecov tests
  • Stop automatic pre-commit fixes (always disabled for drafts)
  • Disable numba cache loading
  • Push an empty commit to re-run CI checks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature, improvement request or other non-bug code enhancement maintenance Continuous integration, unit testing & package distribution similarity search Similarity search package
Projects
None yet
1 participant