-
-
Notifications
You must be signed in to change notification settings - Fork 87
Core Concepts
Kiba is an ETL Ruby framework.
If you are unfamiliar with the notion of ETL, you will find introductions here:
- The Wikipedia page on Extract,Transform,Load
- The following article: "Rubyists - are you doing ETL unknowingly?" on Kiba's author blog
Kiba "core" (the kiba
gem) does not implement sources, transforms and destinations itself.
Instead, it provides:
- A way for you to declare ETL jobs
- A structure & conventions to implement sources/transforms/destinations
- A "runner" able to execute the job
You can either implement those components yourself, or tap into the ones provided in kiba-common (Open-Source) or Kiba Pro (Commercial extension).
A data pipeline or job is schematically organised like this:
In detail:
- Sources are responsible for reading the data (generally row by row) ; they typically implement some file reading, database connection, or API calls to extract the data.
- Kiba then pass each row along to each transform (in order). A transform can either return the row modified, or even generate multiple output rows, or no row at all.
- Finally, the rows are sent to the destinations, which are responsible for sending the rows wherever you see fit (database, file system, API storage etc).
It is perfectly possible to have multiple jobs that you will run sequentially, each generating an output which will be used by the next job as an input.
Home | Core Concepts | Defining jobs | Running jobs | Writing sources | Writing transforms | Writing destinations | Implementation Guidelines | Kiba Pro
This wiki is tracked by git and publicly editable. You are welcome to fix errors and typos. Any defacing or vandalism of content will result in your changes being reverted and you being blocked.