Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Oracle ETL DataEndpoint (readonly) #34

Merged
merged 19 commits into from
Jan 23, 2017

Conversation

smgallo
Copy link
Contributor

@smgallo smgallo commented Jan 20, 2017

Implement an ETL DataEndpoint for reading from an Oracle database.

Description

Implements support for importing data from an Oracle database. In addition to adding an Oracle DataEndpoint, the following changes were made as a result of necessary features for importing

  • More flexible database parameters in .ini file (not all databases require all parameters). For example, Oracle supports Easy Connect and Local database connection methods and a port does not always need to be specified.
  • Include support for specifying ORDER BY in ETL queries
  • The PDOOCI wrapper for OCI8 is included. The existing PDO code included with PHP is no longer supported.
  • Minor code cleanup and style fixes
  • Improve error messages
  • Removes the old Sybase code

Note that Oracle requires downloading the Oracle Instant Client libraries.

Motivation and Context

We will need to pull organizational hierarchy and user affiliations from UB's data warehouse (Infosource). This will also assist SDSC with migration from their existing Oracle system.

Tests performed

Imported the organizational hierarchy from UB's Oracle instance.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project as found in the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

- More flexible database parameters in .ini file (not all databases require all parameters)
- Improve handling of DB Engine subclasses (e.g., PostgresDB, MySQLDB)
- Improved parameter validation on a per-engine basis
- Added methods to iDatabase that should have been there before
- General code cleanup
- Improve comments
@smgallo smgallo added the enhancement Enhancement of the functionality of an existing feature label Jan 20, 2017
@smgallo smgallo added this to the v6.6.0 milestone Jan 20, 2017
$this->handle = DB::factory($this->config);
} catch (Exception $e) {
$msg = "Error connecting to data endpoint '" . $this->name . "'. " . $e->getMessage();
$this->logAndThrowException($msg);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra whitespace on line 136?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting that phpcs didn't complain about that.

{
if ( empty($schemaName) ) {
if ( null === $schemaName ) {
$schemaName = $this->getSchema();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 44-49 (from Postgres.php) are seen frequently in these classes that extend aRdbmsEndpoint--is it worth putting a function in aRdbmsEndpoint to handle the test for SchemaName and the logging action?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that makes sense.

// See http://www.postgresql.org/docs/current/static/catalogs.html

$sql = "SELECT
username AS name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting seems off for these SELECT strings (lines 54, 114, 160).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My editor doesn't indent bare strings on the line. Plus, the extra indentation makes it more difficult to read in the debug output.

@@ -0,0 +1,219 @@
# Oracle Data Endpoint
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make more sense to put this file under configuration/ or docs/, perhaps?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, our documentation needs to be updated to reflect Oracle support (and emphasize that it is read-only for ETL), as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved it to docs/etl/data-endpoint-oracle.md. I also need to update the ETL documentation overall and move it out of confluence. There are a lot of docs to move.

Copy link
Contributor

@plessbd plessbd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless the updates to dependencies are directly related to the new dependency I would want to move those updates to a different pull request

Copy link
Contributor

@plessbd plessbd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -689,8 +728,8 @@ function usage_and_exit($msg = null)

NOTE: Date and time options support "+1 day", "now", "now - 1 day", etc. notation.

EOMSG
);
EOMSG;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heredoc requires closing tag to be at the beginning of the line.

Warning

It is very important to note that the line with the closing identifier must contain no other characters, except a semicolon (;). That means especially that the identifier may not be indented, and there may not be any spaces or tabs before or after the semicolon. It's also important to realize that the first character before the closing identifier must be a newline as defined by the local operating system. This is \n on UNIX systems, including Mac OS X. The closing delimiter must also be followed by a newline.

If this rule is broken and the closing identifier is not "clean", it will not be considered a closing identifier, and PHP will continue looking for one. If a proper closing identifier is not found before the end of the current file, a parse error will result at the last line.

I believe this is what is now causing travis to fail with

PHP Parse error:  syntax error, unexpected $end in tools/etl/etl_overseer.php on line 737

@smgallo
Copy link
Contributor Author

smgallo commented Jan 21, 2017

OK, all style and unit test issues are fixed as well as changes requested by @jsperhac and @plessbd

@plessbd
Copy link
Contributor

plessbd commented Jan 22, 2017

Looks good to me once the composer.lock is fixed to not update dependencies.

@smgallo
Copy link
Contributor Author

smgallo commented Jan 22, 2017

Composer lock file reverted. The new file only has timestamps updated and the requirements for PDOOCI. I needed to clear my composer cache or else running composer require would update the packages to the new versions.

@smgallo smgallo merged commit feddfeb into ubccr:xdmod6.6 Jan 23, 2017
@smgallo smgallo deleted the etl/oracle-dataendpoint branch January 23, 2017 14:05
@tyearke tyearke mentioned this pull request Jan 23, 2017
2 tasks
@smgallo smgallo added the Category:ETL Extract Transform Load label Apr 26, 2017
ryanrath pushed a commit to ryanrath/xdmod that referenced this pull request Apr 27, 2017
* Prepare for adding Oracle support
- More flexible database parameters in .ini file (not all databases require all parameters)
- Improve handling of DB Engine subclasses (e.g., PostgresDB, MySQLDB)
- Improved parameter validation on a per-engine basis
- Added methods to iDatabase that should have been there before
- General code cleanup
- Improve comments

* Allow methods to override default schema name when checking if schema or table exists

* Add support for ORDER BY in ETL queries

* Improve debug readability and error messages

* Add modern PDO support for Oracle (PDOOCI wraps the OCI8 library)

* Support for Oracle ETL data endpoint (readonly)

* Style fixes as per phpcs

* Address style and unit test errors, rmove underscores from private class members
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Category:ETL Extract Transform Load enhancement Enhancement of the functionality of an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants