-
-
Notifications
You must be signed in to change notification settings - Fork 88
How to define ETL jobs with Kiba
Kiba provides a DSL to let you define ETL jobs.
The recommended way to declare a job is by creating a dedicated module, which will use the Kiba.parse
API:
module ETL
module SyncJob
module_function
def setup(config)
Kiba.parse do
# called only once per run
pre_process do
...
end
# responsible for reading the data
source SomeSource, source_config...
# then transforming it
transform SomeTransform, transform_config...
transform SomeOtherTransform, transform_config...
# alternate block form
transform do |row|
# return row, modified
end
destination SomeDestination, destination_config...
# a final block which will be called only if the pipeline succeeded
post_process do
...
end
end
end
end
end
When one writes source SomeClass, some_config
, it instructs Kiba to register the source at this point in the pipeline.
At runtime (see next section), Kiba will instantiate the class, with the provided arguments. Same goes for transforms and destinations.
Alternate block-forms are available for transforms, for convenience.
Pre-processors and post-processors are simple blocks which are called once per pipeline.
The combination of pre-processors, sources, transforms, destinations and post-processors defines your data processing pipeline for this job.
Home | Core Concepts | Defining jobs | Running jobs | Writing sources | Writing transforms | Writing destinations | Implementation Guidelines | Kiba Pro
This wiki is tracked by git and publicly editable. You are welcome to fix errors and typos. Any defacing or vandalism of content will result in your changes being reverted and you being blocked.