Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ETLv2 Transform Step #80

Merged
merged 17 commits into from
Mar 28, 2017
Merged

Conversation

smgallo
Copy link
Contributor

@smgallo smgallo commented Mar 28, 2017

Add a transform step to ETLv2 database ingestors.

Description

This PR adds the pdoIngestor::transform() method when operating in multi-database ingest mode. When iterating over the source data, each source record is passed to the transform() method, allowing transformation to be performed on the fields of each record. A child class can add additional transformations by overriding this method. Note that the pdoIngestor::transform() method supports transforming a single source record into multiple destination records and therefore returns an array of records rather than a single record. If a child class overrides pdoIngestor::transform(), this is automatically detected via reflection in allowSingleDatabaseOptimization() and the ingestor is put into multi-database mode.

By default, the following transformation is applied. parent::transform() should be called prior to applying any local transforms.

protected function transform(array $srcRecord)
{
    foreach ( $srcRecord as $key => &$value ) {
        if ( null === $value ) {
            // Transform NULL values for MySQL LOAD FILE
            $value = '\N';
        } elseif ( empty($value) ) {
            $value = $this->stringEnclosure . '' . $this->stringEnclosure;
        } else {
            // Handle proper escaping of backslashes to preserve source data containing them.
            $value = str_replace('\\', '\\\\', $value);
        }
    }

    return array($srcRecord);

}  // transform()

Also renamed "row" to "record" in the code and logs and updated the comments at the top of the file.

Note that only commit 6694905 is relevant, the rest got pulled in due to a merge and will be squashed.

Motivation and Context

@plessbd asked nicely.

Tests performed

Re-ran ingestion and re-checked job tables.

./etl_overseer.php -c ../../../etc/etl/etl.json -v info -s "2016-10-01 00:00:00" -e "2016-12-31 23:59:59" -k none -p xdcdb-jobs -o "experimental_enable_batch_aggregation=true" -o "truncate_destination=true"
php verify_table_data.php -s modw_baseline -d modw_etltest -v info -x last_modified -n 2 -t job_records -t job_tasks

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project as found in the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@smgallo smgallo added enhancement Enhancement of the functionality of an existing feature Category:ETL Extract Transform Load labels Mar 28, 2017
@smgallo smgallo added this to the v6.6.0 milestone Mar 28, 2017
@smgallo smgallo requested a review from plessbd March 28, 2017 18:58
@smgallo smgallo merged commit b033b0d into ubccr:xdmod6.6 Mar 28, 2017
@smgallo smgallo deleted the etl/add-transform-step branch March 28, 2017 19:35
ryanrath pushed a commit to ryanrath/xdmod that referenced this pull request Apr 27, 2017
* Add transform() method
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Category:ETL Extract Transform Load enhancement Enhancement of the functionality of an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants