Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tolerance #11

Merged
merged 6 commits into from
Jul 14, 2022
Merged

Tolerance #11

merged 6 commits into from
Jul 14, 2022

Conversation

mateonunez
Copy link
Collaborator

@mateonunez mateonunez commented Jul 13, 2022

This PR is a proposal to resolve the problem of #2.

The changes don't implement a boolean to manage the typo errors. Instead of that, a new parameter has been added to the search method.

As the documentation shows: tolerance is the maximum distance between the searched term and a word inside a document.

I've added a few tests for this implementation.

Coverage

Without Tolerance implementation:

File % Stmts % Branch % Funcs % Lines Uncovered Line #s
All files 85.6 65.38 89.47 85.82
...refix-tree 95.18 78.57 100 97.5
trie.ts 94.02 78.57 100 96.87 104,129

With Tolerance implementation:

File % Stmts % Branch % Funcs % Lines Uncovered Line #s
All files 86.44 67.41 90 87.14
src 82.43 58.18 84.61 82.38
...shtein.ts 91.66 60 100 100 2-3
...refix-tree 95.55 82.35 100 97.7
trie.ts 94.59 82.35 100 97.18 115,140

Time elapsed

Without Tolerance implementation:

{
  elapsed: '135μs',
  hits: [
    {
      id: 'ECbOj5pixIOJc9KfQpwss',
      quote: 'Be yourself; everyone else is already taken.',
      author: 'Oscar Wilde',
      isFavorite: false,
      rating: 4
    },
    {
      id: 'jMfGCEX1zf-TX0Tc5-e-8',
      quote: 'So many books, so little time.',
      author: 'Frank Zappa',
      isFavorite: true,
      rating: 5
    }
  ],
  count: 2
}

With Tolerance implementation:

{
  elapsed: '147μs',
  hits: [
    {
      id: 'FFU-21lklMCkviNQgSPCc',
      quote: 'Be yourself; everyone else is already taken.',
      author: 'Oscar Wilde',
      isFavorite: false,
      rating: 4
    },
    {
      id: '4FAnVKAP2wygWEbnZOUhT',
      quote: 'So many books, so little time.',
      author: 'Frank Zappa',
      isFavorite: true,
      rating: 5
    }
  ],
  count: 2
}

@micheleriva
Copy link
Member

Thank you @mateonunez, this is gold. I'll review the code tomorrow morning and let you know how we want to proceed.

I actually planned to introduce a tolerance-like property in one of the next releases of Lyra, but if we get it merged sooner, it's better.

@mateonunez
Copy link
Collaborator Author

Sounds great @micheleriva! Can't wait for the new release and the new features. 👌

packages/lyra/src/prefix-tree/trie.ts Outdated Show resolved Hide resolved
packages/lyra/tests/lyra.test.ts Outdated Show resolved Hide resolved
packages/lyra/tests/lyra.test.ts Outdated Show resolved Hide resolved
@micheleriva
Copy link
Member

@mateonunez this PR looks really good! I just left a few minor comments, it would be awesome if you could address them.

In a future release, as an improvement, we should aim at running the Levenshtein algorithm against the tree instead of words.

Let me give you an example:

  • If you're searching for "hello", but you mistype and write "hllo" instead, the current algorithm will work just fine.
  • If you're searching for "hello", but you mistype and write "yellow" (with a tolerance of 2), this won't work, as we discard words with y as a prefix.

This is something to consider for future releases, as for now, I am happy with the current PR

@mateonunez mateonunez requested a review from micheleriva July 14, 2022 09:10
@mateonunez
Copy link
Collaborator Author

mateonunez commented Jul 14, 2022

I got the point about the next Levenshtein algorithm implementation @micheleriva. It might be fun to work on it.

@micheleriva
Copy link
Member

LGTM

@micheleriva micheleriva merged commit 0d210d3 into oramasearch:main Jul 14, 2022
@mateonunez mateonunez deleted the feat/tolerance branch July 14, 2022 09:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants