N-gram indexing is a simple and powerful lookup technique. It is based on approximate (fuzzy) string matching.
The package offers advantages:
- Document type agnostic, thanks to generics.
- Rune based and Unicode friendly.
- Adjustable text normalization to manage things like case sensibility, spaces and punctuation handling, extra typos tolerance etc.
- Simple ranking algorithm out of the box.
- Ability to customize ranking algorithm entirely up to your implementation of less-function for sorting.
- Ability to associate one document with several texts and lookup by several texts
- Beware: index modification is not thread safe.
- It is in-memory implementation.
- There is no way to import/export/save/restore the index.
- It is impossible to remove document from index.