-
-
Notifications
You must be signed in to change notification settings - Fork 88
Implementing ETL sources
Kiba sources are components you can either implement yourself, or pick from other projects (such as Kiba Common and Kiba Pro).
The sources are components responsible for the extraction of data.
Sources are classes implementing:
- a constructor (to which Kiba will pass the provided arguments in the DSL)
- the
each
method (which should yield rows one by one)
Rows are usually Hash
instances, but could be other structures as long as the next steps of your pipeline know how to handle them.
Since sources are classes, you can (and are encouraged to) unit test them and reuse them.
Here is a simple CSV source:
require 'csv'
class MyCsvSource
attr_reader :input_file
def initialize(input_file)
@input_file = input_file
end
def each
CSV.open(input_file, headers: true, header_converters: :symbol) do |csv|
csv.each do |row|
yield(row.to_hash)
end
end
end
end
Once implemented, you can use your source within Kiba.parse
:
job = Kiba.parse do
source MyCsvSource, filename
# SNIP
end
The first argument for source
is the class name. The other arguments will be passed to the source constructor (initialize
) when Kiba runs your pipeline.
Ideally, it is recommended to open and close resources inside each
, using a block-form (as seen in this example), to ensure that the resources are closed if the pipeline is interrupted.
A couple of sources are available in kiba-common, if you want to see how they are implemented.
Home | Core Concepts | Defining jobs | Running jobs | Writing sources | Writing transforms | Writing destinations | Implementation Guidelines | Kiba Pro
This wiki is tracked by git and publicly editable. You are welcome to fix errors and typos. Any defacing or vandalism of content will result in your changes being reverted and you being blocked.