Recipe Manual (draft)

What is recipe

recipe is an instruction about how to construct a new dataset from existing one. by reading and running the recipe executor, one can generate a new dataset based on the ingredients and procedures described in the recipe.

structure of for a recipe

A recipe is made of following parts:

basic info
configuration
includes
cooking procedures

A recipe file can be in either json or yaml format. Check recipe_unpop.yaml for a example.

basic info

all basic info are stored in info section of the recipe. an id field is required inside this section. Any other information about the new dataset can be store inside this section, such as name, provider, description and so on.

Note on yaml format

In this repo we use base to indicate where the ingredients comes from. We set an yaml anchor to each of them (&d1 etc) so that we can reference them later in the recipe (*d1 etc).

config

inside configuration section. we define the configuration of dirs. currently we can set below path:

ddf_dir: the directory that contains all ddf csv repos. Must set this variable in the main recipe to run with chef.
recipes_dir: the directory contains all recipes to include. Must set this variable if we have include section.
dictionary_dir: the directory contains all translation files. Must set this variable if we have json file in the options of procedures. (translation will be discussed later)

include

one recipe can include other recipes inside itself. to include a recipe, simply append the filename to the include section. note that it should be a absolute path or a filename inside the recipes_dir.

On the chef module, the process to generate a dataset have following steps:

read the main recipe
if there is include section, read each file in the include list and expand the main recipe
if there is file name in dictionary option of each procedure, try to expand them if the option value is a filename
run the procedures one by one.

cooking procedures

cooking section is a dictionary contains one or more list of procedures to build a dataset. valid keys for cooking section are datapoints, entities, concepts.

supported procedures currently:

translate_header
- translate the headers of the datapoints
translate_column
- translate the values in a column
identity
- identity function = nothing changes
merge
- merge ingredients together on the keys
align
- align two columns in two ingredients
- discussion: semio/ddf_utils#3
groupby
- group ingredient data by keys
- discussion: semio/ddf_utils#4
filter_row
- filter ingredient data by values
- discussion: semio/ddf_utils#2
filter_item
- filter ingredient data by concepts
- discussion: semio/ddf_utils#14
run_op
- run math operations on ingredient
- discussion: semio/ddf_utils#7
accumulate
- run cumulative functions over an ingredient
copy
- make copy of indicators of ingredient

General guideline for writing recipes

if you need to use translate_header/translate_column/align/copy in your recipe, place them at the beginning of recipe. This can improve the performance of running the recipe.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Recipe Manual (draft)

What is recipe

structure of for a recipe

basic info

config

include

cooking procedures

General guideline for writing recipes

Files

README.md

Latest commit

History

README.md

File metadata and controls

Recipe Manual (draft)

What is recipe

structure of for a recipe

basic info

config

include

cooking procedures

General guideline for writing recipes