feat(mapping): Remove store
mapping parameter
#329
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Background
I always thought that it was important to use the
store
parameter to specify whether a field should be stored, in addition to indexing, and the default was to not store a field for later retrieval.It turns out this isn't true, and that all fields are copied to the _source field by default.
Setting
"store": "yes"
is only needed if, in addition to getting a field back as part of the_source
(which contains every field in the document), we wanted to be able to return a single field. Peliasdoesn't currently do this, we always ask Elasticsearch for the entire
_source
field.In addition, Elasticsearch has a source filtering feature, so if we ever wanted to return only some of
_source
(which might someday be the case with something like pelias/api#1121), the only reason we would want to bother with"store": "yes"
is if the size of the_source
field was so prohibitive we didn't even want Elasticsearch to fetch all of it from disk. That might be a concern some day, but not today.Changes
This PR removes all
"store": "yes"
parameters for all of our fields.Effectively, we were storing a lot of fields on disk twice, which was wasting space.
In my testing of the Portland, Oregon Docker project, which has about 1.8 million documents, this change reduces the disk space usage from 551MB to 492MB, or about 10%!
Sidenote: If there are other fields we do want to keep out of the
_source
field,_source.exclude
in our document mapping is how we can do it.Bonus: as part of eventually supporting Elasticsearch 6, we would have had to change all these fields from
"yes"
totrue
. So the upgrade will be slightly easier now.