Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add partitioned reads to JDBC SchemaIO #25577

Merged
merged 5 commits into from
Feb 22, 2023
Merged

Conversation

Abacn
Copy link
Contributor

@Abacn Abacn commented Feb 21, 2023

Continuing work of #25240 by @byronellis

Tested the change locally.

Found that the original xlang schema has a problem that the Beam primitive type INT16 is not yet supported in Python SDK (!). So added support here.

Please add a meaningful description for your change here


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI.

Byron Ellis and others added 4 commits February 21, 2023 14:20
…elieve numeric and datetime are currently supported). Start adding a JdbcPartitionedReadSchemaTransformProvider as a more generic SchemaTransform. This fits better with the SchemaTransform approach as the partitioned read is actually a different transform entirely from the non-partitioned version.
…scussion. Added a test to the Python side that should exercise this pathway (though this is difficult to fully verify). Verified that it is actually run during tests and that it will fail if something is very wrong though.
@Abacn
Copy link
Contributor Author

Abacn commented Feb 21, 2023

Run Python 3.10 PostCommit

@github-actions
Copy link
Contributor

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @AnandInguva for label python.
R: @apilloud for label java.
R: @johnjcasey for label io.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

@codecov
Copy link

codecov bot commented Feb 22, 2023

Codecov Report

Merging #25577 (aa37b06) into master (108d097) will decrease coverage by 0.04%.
The diff coverage is 65.38%.

@@            Coverage Diff             @@
##           master   #25577      +/-   ##
==========================================
- Coverage   72.12%   72.09%   -0.04%     
==========================================
  Files         771      772       +1     
  Lines      101870   101959      +89     
==========================================
+ Hits        73474    73503      +29     
- Misses      26970    27030      +60     
  Partials     1426     1426              
Flag Coverage Δ
python 81.94% <65.38%> (-0.07%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ks/python/apache_beam/coders/coders_test_common.py 98.58% <ø> (ø)
sdks/python/apache_beam/coders/coders.py 87.04% <54.54%> (-0.44%) ⬇️
sdks/python/apache_beam/coders/coder_impl.py 93.53% <57.14%> (-0.24%) ⬇️
sdks/python/apache_beam/coders/row_coder.py 94.26% <66.66%> (-0.70%) ⬇️
sdks/python/apache_beam/coders/slow_stream.py 94.87% <100.00%> (+0.18%) ⬆️
sdks/python/apache_beam/io/jdbc.py 79.16% <100.00%> (+0.90%) ⬆️
.../apache_beam/runners/direct/transform_evaluator.py 90.04% <0.00%> (-0.28%) ⬇️
...hon/apache_beam/runners/worker/bundle_processor.py 94.33% <0.00%> (ø)
...n/apache_beam/ml/inference/tensorflow_inference.py 0.00% <0.00%> (ø)
...xamples/inference/tensorflow_mnist_with_weights.py 0.00% <0.00%> (ø)
... and 7 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@Abacn
Copy link
Contributor Author

Abacn commented Feb 22, 2023

Run Python_PVR_Flink PreCommit

@Abacn
Copy link
Contributor Author

Abacn commented Feb 22, 2023

Run Java_JDBC_IO_Direct PreCommit

@Abacn
Copy link
Contributor Author

Abacn commented Feb 22, 2023

Run Python 3.10 PostCommit

@Abacn
Copy link
Contributor Author

Abacn commented Feb 22, 2023

integration test passed though postcommit has irrelevant test failing: https://ci-beam.apache.org/job/beam_PostCommit_Python310_PR/lastCompletedBuild/testReport/apache_beam.io.external.xlang_jdbcio_it_test/CrossLanguageJdbcIOTest/. ready for review now

@Abacn
Copy link
Contributor Author

Abacn commented Feb 22, 2023

R: @ahmedabu98
CC: @byronellis

@github-actions
Copy link
Contributor

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control

Copy link
Contributor

@johnjcasey johnjcasey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Abacn
Copy link
Contributor Author

Abacn commented Feb 22, 2023

Run Python_PVR_Flink PreCommit

update: jenkins not triggering, kick off a manual run on https://ci-beam.apache.org/job/beam_PreCommit_Python_PVR_Flink_Phrase/298

or other tests not showing status were passed

Copy link
Contributor

@ahmedabu98 ahmedabu98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one nit, but this LGTM

readRows = readRows.withNumPartitions(partitions);
}
return input.apply(readRows);
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: no need for this else block if you always return in the if block

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When partition is specified, the if {} block assembles a JdbcIO.ReadWithPartitions; otherwise else {} block assembles a JdbcIO.readRows when partition is not specified (current behavior).

@Abacn Abacn merged commit 85817b6 into apache:master Feb 22, 2023
@Abacn Abacn deleted the beam25011-jdbc-java branch February 22, 2023 17:19
lostluck pushed a commit to lostluck/beam that referenced this pull request Feb 22, 2023
* Fix up the JDBCSchemaIO to support partitioned reads on a column (I believe numeric and datetime are currently supported). Start adding a JdbcPartitionedReadSchemaTransformProvider as a more generic SchemaTransform. This fits better with the SchemaTransform approach as the partitioned read is actually a different transform entirely from the non-partitioned version.

* Removed the PartitionedReadSchemaTransformProvider pending further discussion. Added a test to the Python side that should exercise this pathway (though this is difficult to fully verify). Verified that it is actually run during tests and that it will fail if something is very wrong though.

* address comments

* Support Int16 type in schema

* Fix pylint; fix unsafe cast; fix test

---------

Co-authored-by: Byron Ellis <byronellis@google.com>
ruslan-ikhsan pushed a commit to akvelon/beam that referenced this pull request Mar 10, 2023
* Fix up the JDBCSchemaIO to support partitioned reads on a column (I believe numeric and datetime are currently supported). Start adding a JdbcPartitionedReadSchemaTransformProvider as a more generic SchemaTransform. This fits better with the SchemaTransform approach as the partitioned read is actually a different transform entirely from the non-partitioned version.

* Removed the PartitionedReadSchemaTransformProvider pending further discussion. Added a test to the Python side that should exercise this pathway (though this is difficult to fully verify). Verified that it is actually run during tests and that it will fail if something is very wrong though.

* address comments

* Support Int16 type in schema

* Fix pylint; fix unsafe cast; fix test

---------

Co-authored-by: Byron Ellis <byronellis@google.com>
cushon pushed a commit to cushon/beam that referenced this pull request Mar 16, 2023
* Fix up the JDBCSchemaIO to support partitioned reads on a column (I believe numeric and datetime are currently supported). Start adding a JdbcPartitionedReadSchemaTransformProvider as a more generic SchemaTransform. This fits better with the SchemaTransform approach as the partitioned read is actually a different transform entirely from the non-partitioned version.

* Removed the PartitionedReadSchemaTransformProvider pending further discussion. Added a test to the Python side that should exercise this pathway (though this is difficult to fully verify). Verified that it is actually run during tests and that it will fail if something is very wrong though.

* address comments

* Support Int16 type in schema

* Fix pylint; fix unsafe cast; fix test

---------

Co-authored-by: Byron Ellis <byronellis@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants