Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add test ORC multi column data generator. #15434

Merged
merged 5 commits into from
Jan 10, 2023

Conversation

mx123
Copy link
Contributor

@mx123 mx123 commented Dec 16, 2022

Description

Add test ORC multi column data generator.

Fixes: #15420

@cla-bot cla-bot bot added the cla-signed label Dec 16, 2022
@mx123 mx123 requested review from findinpath and ebyhr December 16, 2022 11:33
@mx123 mx123 changed the title Add test ORC multi column data generator. Add test ORC multi collumn data generator. Dec 16, 2022
@mx123 mx123 changed the title Add test ORC multi collumn data generator. Add test ORC multi column data generator. Dec 16, 2022
@mx123 mx123 force-pushed the refactor_orc_test_data_generator branch from 368b1ed to 3798070 Compare December 16, 2022 12:26
@mx123 mx123 force-pushed the refactor_orc_test_data_generator branch 3 times, most recently from f7e2420 to b7b3092 Compare December 19, 2022 18:21
@mx123 mx123 force-pushed the refactor_orc_test_data_generator branch from b7b3092 to e1f348f Compare December 20, 2022 08:25
@ebyhr
Copy link
Member

ebyhr commented Dec 20, 2022

Could you rebase on upstream to resolve conflicts?

@mx123 mx123 force-pushed the refactor_orc_test_data_generator branch from e1f348f to b2a97f8 Compare December 20, 2022 09:09
@mx123
Copy link
Contributor Author

mx123 commented Dec 20, 2022

Could you rebase on upstream to resolve conflicts?

done

@mx123 mx123 force-pushed the refactor_orc_test_data_generator branch from afcc50b to 33dbf74 Compare December 20, 2022 10:22
Copy link
Contributor

@findinpath findinpath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the end result very good.

Please consider restructuring the commits.

I think that we can achieve the logical succession of the changes in fewer commits.
The refactoring related to working RecordWriterBuilder can be done as a preparatory commit and (if i'm not mistaken) the changes related to multi-columns orc writer support can be done in one commit.

@mx123 mx123 force-pushed the refactor_orc_test_data_generator branch 5 times, most recently from b3fd5fa to 4cad5ed Compare December 21, 2022 11:34
@findinpath
Copy link
Contributor

LGTM % comments

@mx123 mx123 force-pushed the refactor_orc_test_data_generator branch from 4cad5ed to 801d376 Compare December 21, 2022 13:41
@mx123
Copy link
Contributor Author

mx123 commented Dec 22, 2022

mvn clean install -P errorprone-compiler -DskipTests -nsu -pl :trino-orc

i've fix file reading to bytest and tried locally this one^^^ - build passed with no errors.

@mx123 mx123 force-pushed the refactor_orc_test_data_generator branch from aff2479 to eff1424 Compare December 22, 2022 15:06
@findinpath
Copy link
Contributor

@ebyhr CPTAL?

@findinpath findinpath self-requested a review December 23, 2022 06:37
@mx123 mx123 force-pushed the refactor_orc_test_data_generator branch from eff1424 to a3eeeed Compare December 23, 2022 09:13
@ebyhr
Copy link
Member

ebyhr commented Dec 23, 2022

@mx123 Could you avoid rebasing when there's no conflict? It makes it hard to see the diff since the last review in GitHub UI.

@mx123 mx123 force-pushed the refactor_orc_test_data_generator branch from a3eeeed to c4e52f0 Compare December 23, 2022 11:16
@mx123
Copy link
Contributor Author

mx123 commented Dec 23, 2022

Could you avoid rebasing when there's no conflict? It makes it hard to see the diff since the last review in GitHub UI.

ok. sorry. will do.

@github-actions github-actions bot added the docs label Dec 23, 2022
@mx123 mx123 force-pushed the refactor_orc_test_data_generator branch from c4e52f0 to ae28767 Compare December 23, 2022 11:54
@ebyhr
Copy link
Member

ebyhr commented Dec 23, 2022

Can we test both apache-lz4.orc and generated ORC file by the library? Removing the file loses test coverage for a specific version.

@mx123 mx123 force-pushed the refactor_orc_test_data_generator branch from ae28767 to 0173236 Compare January 9, 2023 10:19
@mx123
Copy link
Contributor Author

mx123 commented Jan 9, 2023

Can we test both apache-lz4.orc and generated ORC file by the library? Removing the file loses test coverage for a specific version.

ok. done: https://github.com/trinodb/trino/compare/ae28767e093ebb4b1d9fd13bc91bfd63e18ff92a..0173236252814774e5b22d1fec46f4c40b317dec

@mx123 mx123 force-pushed the refactor_orc_test_data_generator branch from 0173236 to 4ff0fd6 Compare January 9, 2023 11:04
@mx123 mx123 force-pushed the refactor_orc_test_data_generator branch 2 times, most recently from 95a3089 to c219687 Compare January 9, 2023 11:54
mx123 added 2 commits January 9, 2023 14:38
Prior approach was to use pre-generated ORC file for multi column HIVE data.
This approach is replaced to runtime generated data OCR file for testing.
@mx123 mx123 force-pushed the refactor_orc_test_data_generator branch from c219687 to 6b33037 Compare January 9, 2023 12:39
@ebyhr ebyhr merged commit 07152c0 into trinodb:master Jan 10, 2023
@github-actions github-actions bot added this to the 406 milestone Jan 10, 2023
@mx123 mx123 deleted the refactor_orc_test_data_generator branch January 10, 2023 09:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Use Apache ORC library in OrcTester
3 participants