Skip to content

michurin/ngramindex

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

N-gram Indexing and Searching

lint test codecov codecov Go Report Card go.dev reference go.dev play

N-gram indexing is a simple and powerful lookup technique. It is based on approximate (fuzzy) string matching.

Motivation

The package offers advantages:

  • Document type agnostic, thanks to generics.
  • Rune based and Unicode friendly.
  • Adjustable text normalization to manage things like case sensibility, spaces and punctuation handling, extra typos tolerance etc.
  • Simple ranking algorithm out of the box.
  • Ability to customize ranking algorithm entirely up to your implementation of less-function for sorting.
  • Ability to associate one document with several texts and lookup by several texts

Examples

Known issues

  • Beware: index modification is not thread safe.
  • It is in-memory implementation.
  • There is no way to import/export/save/restore the index.
  • It is impossible to remove document from index.

Related links