Feature Name: KFDRP Index Naming Schema

Summary

The Kids First Data Portal will need a service to manage which indices are made available for display, faceted and text based search, and data exploration.

In order for these indices to be managed in a way that allows for growth and consistency, there is a need for a schema for the naming convention to these indices.

Motivation

As the Kids First Data Resource Portal (DRP) grows in terms of the data size searchable through it, there will be an ever increasing number of elasticsearch indices that are in use at any given time.

In order to meet the feature requirements of the DRP, the data is sharded into multiple indices based on the study that the data is coming from.

As the number of studies and entities being tracked by the portal grows the number of indices being used will grow in terms of #studies * #entities and if we take into account the possibility of using indices with different purposes such as centric for faceted search and text indices for text search then we have potential growth of #studies * #entities * #types. The management of the publishing of different indices through the maniuplation of aliases will become a large and important task that will be greately aided through the establishment of a schema for the naming of an elasticsearch index.

Elasticsearch

Detailed design

The schema for index naming in terms of a Context-Free Grammar is as follows:

index_name --> <entity>_<index_type>_<shard_prefix>_<shard_id>_<release_id>

<entity> --> "^[a-z]*$"
<index_type> --> centric | text | entity
<shard_prefix> --> "^[a-z]{0,2}$"
<shard_id> --> "^[a-zA-Z0-9]*$"
<release_id> --> "^[a-zA-Z0-9]*$"

Drawbacks

The primary drawback is if a need arises to generate versioned indices that are feature specific and do not conform to the enitity/shard design, our schema will be unable to describe these indicies.

Alternatives

The use of a schema is simply an aid and to help maintain the source of truth on what an index describes with the data itself. An alternative will be to build a service that tracks the metadata about releases and studies to act as a source of truth for which index pertains to a particular dataset or usecase.

Unresolved questions

Release ID format needs to be formalized with CHOP

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kf-index-name-schema.md

kf-index-name-schema.md

Feature Name: KFDRP Index Naming Schema

Summary

Motivation

Detailed design

Drawbacks

Alternatives

Unresolved questions

Files

kf-index-name-schema.md

Latest commit

History

kf-index-name-schema.md

File metadata and controls

Feature Name: KFDRP Index Naming Schema

Summary

Motivation

Detailed design

Drawbacks

Alternatives

Unresolved questions