Skip to content

Implementing pre and post processors

Thibaut Barrère edited this page Apr 16, 2017 · 2 revisions

Pre-processors and post-processors are currently blocks, which get called only once per ETL run:

  • Pre-processors get called before the ETL starts reading rows from the sources.
  • Post-processors get invoked after the ETL successfully processed all the rows.

Note that post-processors won't get called if an error occurred earlier.

count = 0

def system!(cmd)
  fail "Command #{cmd} failed" unless system(cmd)
end

file = 'my_file.csv'
sample_file = 'my_file.sample.csv'

pre_process do
  # it's handy to work with a reduced data set. you can
  # e.g. just keep one line of the CSV files + the headers
  system! "sed -n \"1p;25706p\" #{file} > #{sample_file}"
end

source MyCsv, file: sample_file

transform do |row|
  count += 1
  row
end

post_process do
  Email.send(supervisor_address, "#{count} rows successfully processed")
end