Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request aims to address #19 issue by adding a Tantivy index.
While there is still room for improvement, it might be a first step.
Configuration
Add a section [search] with a field
directory
that contains where Tantivy should store its files.Search
As it uses QueryParser you can use full Tantivy query language.
By default search use all fields except
name.prefix
(see below) and suggester search amongstname
,name.full
andname.prefix
(see below).Implementation
fts
moduleThis module contains all full text search related structures.
Tantivy
structure handles all boiler plate to setup an index, search and suggest. It also delegate method to index document, commit documents.TantivyDocument
is a structure that represents a crate and can be converted into a Tantivy's DocumentIndices
Crate's name are index multiple times to improve both result relevance of suggester and search.
name
: a simple tokenized version of crate's name :name.full
: not tokenized, only lower-cased. It's main purpose is to increase relevance when the searched text match exactly a crate namename.prefix
: index word prefix to handle suggester.Other fields that are indexed :
categories
are index using the same pipeline asname.full
as they should be amongst a precise listkeywords
are index using the same pipeline as ̀name` as they are free textdescription
andreadme
use the same pipeline asname
.Note that at search time, we should not apply apply the edge ngram filter to reduce noise.
How to index
When Alexandrie starts, it index everything.
Things that still need work
Actually running indexer endpoint causes 500 HTTP error when trying to access UI. It comes from a lock on the database since I browse all crates for indexing in a single transaction.Use run method and index at startup instead in an endpoint.