Implementing pre and post processors

Pre-processors and post-processors are currently blocks, which get called only once per ETL run:

Pre-processors get called before the ETL starts reading rows from the sources.
Post-processors get invoked after the ETL successfully processed all the rows.

Note that post-processors won't get called if an error occurred earlier.

count = 0

def system!(cmd)
  fail "Command #{cmd} failed" unless system(cmd)
end

file = 'my_file.csv'
sample_file = 'my_file.sample.csv'

pre_process do
  # it's handy to work with a reduced data set. you can
  # e.g. just keep one line of the CSV files + the headers
  system! "sed -n \"1p;25706p\" #{file} > #{sample_file}"
end

source MyCsv, file: sample_file

transform do |row|
  count += 1
  row
end

post_process do
  Email.send(supervisor_address, "#{count} rows successfully processed")
end

This wiki is tracked by git and publicly editable. You are welcome to fix errors and typos. Any defacing or vandalism of content will result in your changes being reverted and you being blocked.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementing pre and post processors

Clone this wiki locally