Skip to content

Latest commit

 

History

History

infer-schema-json-data

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

Infer schema from JSON data

In this recipe we'll learn how to infer a Pinot schema from JSON data

Pinot Version 0.9.3

This is the code for the following recipe: https://dev.startree.ai/docs/pinot/recipes/infer-schema-json-data


Clone this repository and navigate to this recipe:

git clone git@github.com:startreedata/pinot-recipes.git
cd pinot-recipes/recipes/infer-schema-json-data

Infer schema from data/github.json:

docker run \
  -v ${PWD}/data/github.json:/data/github.json \
  -v ${PWD}/config:/config \
  apachepinot/pinot:0.9.3 JsonToPinotSchema \
  -jsonFile /data/github.json \
  -pinotSchemaName="github" \
  -outputDir="/config" \
  -dimensions=""

This will write the schema file to config/github.json.

docker run \
  -v ${PWD}/data/github.json:/data/github.json \
  -v ${PWD}/config:/config \
  apachepinot/pinot:0.9.3 JsonToPinotSchema \
  -jsonFile /data/github.json \
  -pinotSchemaName="github_with_ts" \
  -outputDir="/config" \
  -timeColumnName=created_at

This will write the schema file to config/github_with_ts.json.

docker run \
  -v ${PWD}/data/github.json:/data/github.json \
  -v ${PWD}/config:/config \
  apachepinot/pinot:0.9.3 JsonToPinotSchema \
  -jsonFile /data/github.json \
  -pinotSchemaName="github_unnest" \
  -outputDir="/config" \
  -timeColumnName=created_at \
  -fieldsToUnnest=payload.commits

This will write the schema file to config/github_unnest.json.