Fuzzy Matching V1 #2099

mikiher · 2023-09-14T23:05:49Z

Problem:
Audiobookshelf requires a pretty strict folder structure. However, users sometimes have many books in existing folders that adhere to different (or no) standards, and they might be reluctant to fix their directory structure. But then book titles and authors are incorrectly read, and consequently, matching usually return no/wrong results, which requires users to manually fix the title and author before matching.

The option to prefer audio metadata over folder names somewhat improves the situation, but does not fix it, and is also not enabled by default.

Proposal:
As a first step, I'd like to suggest a heuristic fuzzy matching, that kicks in if the initial title and author search returns no results (a rudimentary version of this already exists in the code, potentially sending one additional search request with a "clean" version of the title and author - it is subsumed in the new proposal):

If the initial search returns no results, we first further clean the title, and then heuristically split it into hyphen-separated parts.
We then create a Set of title candidates, and add each part to the set. We also try to generate additional title candidates by applying various heuristics on each part, and add those candidates to the set as well.
The resulting list of unique candidates is then heuristically sorted to minimize the number of additional search requests while still keeping the request as specific as possible.
Additional search requests are then sent until one returns results, or until maxFuzzySearches (the maximum number of allowed additional search requests) has been reached
If no results were found, search requests are also repeated without the author (again, until maxFuzzySearches has been reached)

This proposal is implemented here.
I've evaluated it on 50 books that have audible.com metadata from my unmodified audiobook torrents directory, which has no standard folder structure. The existing matching finds the correct result only for 24% of books. Fuzzy matching V1 finds the correct result for 96% of books, and finds the correct result @1 for 92%. I have not calculated the average number of additional search requests, but it looks like it is usually between 0-3.

mikiher · 2023-09-15T09:40:30Z

If we want to make quick-match more conservative than manual match, we can set maxFuzzySearches to a lower value than the default one, by setting the appropriate option in the quick-match call to BookFinder.search().

The second commit demonstrates that.

advplyr · 2023-09-22T21:07:14Z

I ran some tests on this and it matched well. We'll see if it gets too many false positives and adjust from there. Thanks!

Fuzzy Matching V1

ac746f1

mikiher marked this pull request as ready for review September 14, 2023 23:10

Make quick-match more conservative

67bbe21

mikiher and others added 2 commits September 20, 2023 13:12

Merge branch 'advplyr:master' into Fuzzy-Matching

81a9b8d

Add jsdocs to BookFinder search functions

61c4860

advplyr merged commit a11fc21 into advplyr:master Sep 22, 2023
1 check passed

advplyr mentioned this pull request Sep 26, 2023

[Enhancement]: Fuzzy Matching #396

Closed

mikiher mentioned this pull request Oct 5, 2023

Fuzzy matching continued #2186

Merged

mikiher deleted the Fuzzy-Matching branch July 12, 2024 18:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fuzzy Matching V1 #2099

Fuzzy Matching V1 #2099

mikiher commented Sep 14, 2023 •

edited

Loading

mikiher commented Sep 15, 2023

advplyr commented Sep 22, 2023

Fuzzy Matching V1 #2099

Fuzzy Matching V1 #2099

Conversation

mikiher commented Sep 14, 2023 • edited Loading

mikiher commented Sep 15, 2023

advplyr commented Sep 22, 2023

mikiher commented Sep 14, 2023 •

edited

Loading