Skip to content
This repository has been archived by the owner on Apr 5, 2021. It is now read-only.

Delta Updates #50

Merged
merged 8 commits into from
Sep 6, 2018
Merged

Delta Updates #50

merged 8 commits into from
Sep 6, 2018

Conversation

kynetiv
Copy link

@kynetiv kynetiv commented Jun 19, 2018

This PR adds a workflow for applying delta updates to the documents of an existing ElasticSearch index, including root-level document fields (file fields specified in the :only array in the data.yaml).

This story comes about from the need to update only the latest data in the index (nested fields and/or root-level fields) more frequently, which is currently problematic with several large CSV documents. In the past, any update to the data required a full re-index of all datafiles. This PR creates a delta rake task that allows for specifying an update file, and mapping it to one of the original data.yaml files entries. This way we can use the same data.yaml config that was stored in ES and update specific fields and nested fields. The only limitation is that it can only update one of the API's nested key per run (you could run the rake command multiple times specifying different :nest keys)

NOTE: One of the key API changes in the document builder is that a config.files file entry can now specify both an :only fields array AND a :nest fields array. Previously it was one or the other. As a side-effect, this allows you to create your root-level document and append nested fields on the same pass of the file importer.

@kynetiv kynetiv merged commit 25e8f61 into dev Sep 6, 2018
@kynetiv kynetiv deleted the dev-delta branch September 6, 2018 15:55
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant