In this recipe we'll learn how to infer a Pinot schema from JSON data
Pinot Version | 0.9.3 |
This is the code for the following recipe: https://dev.startree.ai/docs/pinot/recipes/infer-schema-json-data
Clone this repository and navigate to this recipe:
git clone git@github.com:startreedata/pinot-recipes.git
cd pinot-recipes/recipes/infer-schema-json-data
Infer schema from data/github.json:
docker run \
-v ${PWD}/data/github.json:/data/github.json \
-v ${PWD}/config:/config \
apachepinot/pinot:0.9.3 JsonToPinotSchema \
-jsonFile /data/github.json \
-pinotSchemaName="github" \
-outputDir="/config" \
-dimensions=""
This will write the schema file to config/github.json.
docker run \
-v ${PWD}/data/github.json:/data/github.json \
-v ${PWD}/config:/config \
apachepinot/pinot:0.9.3 JsonToPinotSchema \
-jsonFile /data/github.json \
-pinotSchemaName="github_with_ts" \
-outputDir="/config" \
-timeColumnName=created_at
This will write the schema file to config/github_with_ts.json.
docker run \
-v ${PWD}/data/github.json:/data/github.json \
-v ${PWD}/config:/config \
apachepinot/pinot:0.9.3 JsonToPinotSchema \
-jsonFile /data/github.json \
-pinotSchemaName="github_unnest" \
-outputDir="/config" \
-timeColumnName=created_at \
-fieldsToUnnest=payload.commits
This will write the schema file to config/github_unnest.json.