-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add partitioned reads to JDBC SchemaIO #25577
Conversation
…elieve numeric and datetime are currently supported). Start adding a JdbcPartitionedReadSchemaTransformProvider as a more generic SchemaTransform. This fits better with the SchemaTransform approach as the partitioned read is actually a different transform entirely from the non-partitioned version.
…scussion. Added a test to the Python side that should exercise this pathway (though this is difficult to fully verify). Verified that it is actually run during tests and that it will fail if something is very wrong though.
Run Python 3.10 PostCommit |
db41b12
to
9d56d00
Compare
Assigning reviewers. If you would like to opt out of this review, comment R: @AnandInguva for label python. Available commands:
The PR bot will only process comments in the main thread (not review comments). |
9d56d00
to
aa37b06
Compare
Codecov Report
@@ Coverage Diff @@
## master #25577 +/- ##
==========================================
- Coverage 72.12% 72.09% -0.04%
==========================================
Files 771 772 +1
Lines 101870 101959 +89
==========================================
+ Hits 73474 73503 +29
- Misses 26970 27030 +60
Partials 1426 1426
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Run Python_PVR_Flink PreCommit |
Run Java_JDBC_IO_Direct PreCommit |
Run Python 3.10 PostCommit |
integration test passed though postcommit has irrelevant test failing: https://ci-beam.apache.org/job/beam_PostCommit_Python310_PR/lastCompletedBuild/testReport/apache_beam.io.external.xlang_jdbcio_it_test/CrossLanguageJdbcIOTest/. ready for review now |
R: @ahmedabu98 |
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Run Python_PVR_Flink PreCommit update: jenkins not triggering, kick off a manual run on https://ci-beam.apache.org/job/beam_PreCommit_Python_PVR_Flink_Phrase/298 or other tests not showing status were passed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one nit, but this LGTM
readRows = readRows.withNumPartitions(partitions); | ||
} | ||
return input.apply(readRows); | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: no need for this else
block if you always return in the if
block
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When partition is specified, the if {} block assembles a JdbcIO.ReadWithPartitions; otherwise else {} block assembles a JdbcIO.readRows when partition is not specified (current behavior).
* Fix up the JDBCSchemaIO to support partitioned reads on a column (I believe numeric and datetime are currently supported). Start adding a JdbcPartitionedReadSchemaTransformProvider as a more generic SchemaTransform. This fits better with the SchemaTransform approach as the partitioned read is actually a different transform entirely from the non-partitioned version. * Removed the PartitionedReadSchemaTransformProvider pending further discussion. Added a test to the Python side that should exercise this pathway (though this is difficult to fully verify). Verified that it is actually run during tests and that it will fail if something is very wrong though. * address comments * Support Int16 type in schema * Fix pylint; fix unsafe cast; fix test --------- Co-authored-by: Byron Ellis <byronellis@google.com>
* Fix up the JDBCSchemaIO to support partitioned reads on a column (I believe numeric and datetime are currently supported). Start adding a JdbcPartitionedReadSchemaTransformProvider as a more generic SchemaTransform. This fits better with the SchemaTransform approach as the partitioned read is actually a different transform entirely from the non-partitioned version. * Removed the PartitionedReadSchemaTransformProvider pending further discussion. Added a test to the Python side that should exercise this pathway (though this is difficult to fully verify). Verified that it is actually run during tests and that it will fail if something is very wrong though. * address comments * Support Int16 type in schema * Fix pylint; fix unsafe cast; fix test --------- Co-authored-by: Byron Ellis <byronellis@google.com>
* Fix up the JDBCSchemaIO to support partitioned reads on a column (I believe numeric and datetime are currently supported). Start adding a JdbcPartitionedReadSchemaTransformProvider as a more generic SchemaTransform. This fits better with the SchemaTransform approach as the partitioned read is actually a different transform entirely from the non-partitioned version. * Removed the PartitionedReadSchemaTransformProvider pending further discussion. Added a test to the Python side that should exercise this pathway (though this is difficult to fully verify). Verified that it is actually run during tests and that it will fail if something is very wrong though. * address comments * Support Int16 type in schema * Fix pylint; fix unsafe cast; fix test --------- Co-authored-by: Byron Ellis <byronellis@google.com>
Continuing work of #25240 by @byronellis
Tested the change locally.
Found that the original xlang schema has a problem that the Beam primitive type INT16 is not yet supported in Python SDK (!). So added support here.
Please add a meaningful description for your change here
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123
), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>
instead.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.