Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes to accompany issues found in OSG modifications #84

Merged
merged 8 commits into from
Apr 5, 2017

Conversation

smgallo
Copy link
Contributor

@smgallo smgallo commented Apr 4, 2017

Changes to accompany issues found in OSG modifications

See ubccr/xdmod-xsede#23

Description

The aggregation query for job records and tasks now uses a datatype of DECIMAL(36,4) rather than DOUBLE for the sum_wallduration_squared, sum_waitduration_squared, sum_cpu_time_squared, sum_local_charge_xdsu_squared, and sum_node_time_squared columns. The waitduration has been defaulted to NULL to support cases when we do not have this information (e.g., OSG).

Several enhancements to the table comparison tool were made:

  • Ability to ignore number of columns between source and destination tables. This is useful for comparing tables with added columns to baseline tables
  • Ability to ignore column types between source and destination tables. This is useful for comparing data when migrating column types from double to decimal(m,n)
  • Ability to round values during comparison, useful when comparing data that may differ in the number of significant digits

Motivation and Context

When comparing new data as the result of a change to baseline data it is helpful to use the DECIMAL(M,N) data type rather than DOUBLE which may use approximate representation of floating point numbers. This makes data verification much easier.

Tests performed

The XSEDE job pipeline was run before and after changes to the aggregation query. Running the table verificaiton tool, data was identical with the exception of cases where the baseline used an approximate representation of the data (e.g., 4.1038701667445064e16). In these cases the squared values were off by up to .00000000000000017%

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Enhancement to existing functionality

Checklist:

  • My code follows the code style of this project as found in the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@smgallo smgallo added enhancement Enhancement of the functionality of an existing feature Category:ETL Extract Transform Load labels Apr 4, 2017
@smgallo smgallo added this to the v6.6.0 milestone Apr 4, 2017
@smgallo smgallo requested review from plessbd and tyearke April 4, 2017 20:17
Copy link
Contributor

@tyearke tyearke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, LGTM.

LEFT OUTER JOIN $destTableName dest ON (" . join(' AND ', $constraints) . ")"
. ( 0 != count($where) ? "\nWHERE " . implode(' AND ', $where) : "" )
LEFT OUTER JOIN $destTableName dest ON (" . join("\nAND ", $constraints) . ")"
. ( 0 != count($where) ? "\nWHERE " . implode("\nAND", $where) : "" )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lack of whitespace after AND could cause problems.

@@ -541,7 +541,7 @@ function usage_and_exit($msg = null)
-n, --num-missing-rows <number_of_rows>
Display this number of missing rows. If not specified, all missing rows are displayed.

-r, --round-column <column>[=<digits>]
-r, --round-column <column>[,<digits>]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indentation of this line is a little funky.

@smgallo smgallo merged commit 17ee95e into ubccr:xdmod6.6 Apr 5, 2017
@smgallo smgallo deleted the etl/osg branch April 5, 2017 15:10
ryanrath pushed a commit to ryanrath/xdmod that referenced this pull request Apr 27, 2017
Table Verifier:
* Added --ignore-column-count to only compare columns that exist in the source table
* Add ability to round columns before comparing
* Added ability to ignore column types in comparison

ETLv2:
* Improved log formatting
* Catch error when FROM table does not exist yet
* Change squared colum type to decimal(36,4) and default waitduration to NULL instead of 0 in HPC Jobs aggregator
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Category:ETL Extract Transform Load enhancement Enhancement of the functionality of an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants