Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(mapping): Remove store mapping parameter #329

Merged
merged 1 commit into from
Nov 3, 2018

Conversation

orangejulius
Copy link
Member

@orangejulius orangejulius commented Oct 23, 2018

Background

I always thought that it was important to use the store parameter to specify whether a field should be stored, in addition to indexing, and the default was to not store a field for later retrieval.

It turns out this isn't true, and that all fields are copied to the _source field by default.

Setting "store": "yes" is only needed if, in addition to getting a field back as part of the _source (which contains every field in the document), we wanted to be able to return a single field. Pelias
doesn't currently do this, we always ask Elasticsearch for the entire _source field.

In addition, Elasticsearch has a source filtering feature, so if we ever wanted to return only some of _source (which might someday be the case with something like pelias/api#1121), the only reason we would want to bother with "store": "yes" is if the size of the _source field was so prohibitive we didn't even want Elasticsearch to fetch all of it from disk. That might be a concern some day, but not today.

Changes

This PR removes all "store": "yes" parameters for all of our fields.

Effectively, we were storing a lot of fields on disk twice, which was wasting space.

In my testing of the Portland, Oregon Docker project, which has about 1.8 million documents, this change reduces the disk space usage from 551MB to 492MB, or about 10%!

Sidenote: If there are other fields we do want to keep out of the _source field, _source.exclude in our document mapping is how we can do it.

Bonus: as part of eventually supporting Elasticsearch 6, we would have had to change all these fields from "yes" to true. So the upgrade will be slightly easier now.

@orangejulius orangejulius force-pushed the remove-extra-stores branch 3 times, most recently from 7c04cb7 to 8f5cf10 Compare October 23, 2018 16:23
@missinglink
Copy link
Member

👍 this makes total sense

docs here: https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-store.html

@orangejulius
Copy link
Member Author

I even have a follow up to this, should be more disk savings :)

Background
==========

I always thought that it was important to use the `store` parameter to
specify whether a field should be stored, in addition to indexing, and
the default was to not store a field for later retrieval.

It turns out this isn't true, and that all fields are [copied to the _source](https://www.elastic.co/guide/en/elasticsearch/reference/2.4/mapping-store.html)
field by default.

Setting `"store": "yes"` is only needed if, in addition to getting a
field back as part of the `_source` (which contains _every_ field
in the document), we wanted to be able to return a single field. Pelias
doesn't currently do this, we always ask Elasticsearch for the entire
`_source` field.

In addition, Elasticsearch has a [source filtering](https://www.elastic.co/guide/en/elasticsearch/reference/2.4/search-request-source-filtering.html)
feature, so if we ever wanted to return only some of `_source` (which
might someday be the case with something like
pelias/api#1121), the only reason we would
want to bother with `"store": "yes"` is if the size of the `_source`
field was so prohibitive we didn't even want Elasticsearch to fetch all
of it from disk. That might be a concern some day, but not today.

Changes
==========

This PR removes all `"store": "yes"` parameters for all of our fields.

Effectively, we were storing a lot of fields on disk twice, which was
wasting space.

In my testing of the Portland, Oregon Docker project, which has about
1.8 million documents, this change reduces the disk space usage from
551MB to 492MB, or about 10%!

_Sidenote:_ If there are other fields we _do_ want to keep out of the
`_source` field,
[`_source.exclude`](https://github.com/pelias/schema/blob/master/mappings/document.js#L158-L159) in our document mapping is how we can do it.

After this change, I'm now pretty confident we are doing the right thing
for all our fields when it comes to storing, and analyzers so this
closes #99
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants