Compute an ngram field for all street names #364

missinglink · 2019-05-20T10:56:20Z

This PR adds one new field called address_parts.street.ngram

It takes advantage of the fields mapping to generate ngrams for street names.

The ngram field is tokenized using a new tokenizers called peliasIndexStreetOneEdgeGram which is mostly the same as the peliasIndexOneEdgeGram analyzer except using different synonyms files, so as to produce prefix-ngrams which can be used for autocomplete.

The motivation here is to be able to, quite simply and efficiently, improve autocomplete queries which contain street names.
We currently do autocomplete on the name.default field, which mixes names and addresses, using this field with a multi_match will have benefits over the single-field approach:

Allow per-field analysis including synonym substitutions which are specific to streets
Allow for the fields to be included or excluded at query-time using a multi_match query

I suspect the changes required for the queries will be minimal.

Its likely that the index size will be increased after merging this PR because the street names will be indexed using an edge ngram filter twice, once for name.default and once for street.ngram.

The plan is, following this PR to remove the street names from the name.default field, once this has been done the index will return to the previous size (or a very similar on-disk size).

The analysis that I've configured in this PR is likely not perfect, but it mirrors what we already have, so the integration will be easier.
Once we've merge this PR and switched the queries we will be much freer to improve individual fields and analysis.

related: #347
related: #359

missinglink · 2019-05-20T11:10:19Z

This is the plan to roll this out without breaking backwards compatibility:

Update schema to add street.ngram field
Update queries to use multi_match
Queries should now work with street data either in name.default or street.ngram (backwards and forwards compatible)
Check the codebase to ensure that Documents with no name are considered valid
Remove osm and oa code which concats the housenumber and street name together.
Remove name.default from multi_match where applicable (remove backwards compatibility)

missinglink · 2019-06-03T11:05:15Z

This is a really cool feature but I didn't realise how much work would be involved in order to change the way name.default works.
In particular, there would need to be changed to how search works and also how labels are generated.

Let's leave this open for discussion, I'd still very much like to merge this one day because it's a big step forward for modernizing our schema design based on learnings over the last few years.

missinglink · 2019-06-03T11:05:40Z

The increase in disk space for the whole planet was <20GB

feat(street_ngram): add street ngram field

e9f0d24

This was referenced May 24, 2019

do not set name field for addresses pelias/openaddresses#424

Draft

do not set name field for addresses pelias/openstreetmap#488

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compute an ngram field for all street names #364

Compute an ngram field for all street names #364

missinglink commented May 20, 2019 •

edited

Loading

missinglink commented May 20, 2019

missinglink commented Jun 3, 2019

missinglink commented Jun 3, 2019 •

edited

Loading

Compute an ngram field for all street names #364

Are you sure you want to change the base?

Compute an ngram field for all street names #364

Conversation

missinglink commented May 20, 2019 • edited Loading

missinglink commented May 20, 2019

missinglink commented Jun 3, 2019

missinglink commented Jun 3, 2019 • edited Loading

missinglink commented May 20, 2019 •

edited

Loading

missinglink commented Jun 3, 2019 •

edited

Loading