Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(script): generate tpch data set #6024

Merged
merged 5 commits into from
Jun 17, 2022
Merged

Conversation

xudong963
Copy link
Member

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

Generate tpch date set for databend

Changelog

  • New Feature

Related Issues

Fixes #5912

@vercel
Copy link

vercel bot commented Jun 16, 2022

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Updated
databend ⬜️ Ignored (Inspect) Jun 17, 2022 at 2:47AM (UTC)

@mergify
Copy link
Contributor

mergify bot commented Jun 16, 2022

Thanks for the contribution!
I have applied any labels matching special text in your PR Changelog.

Please review the labels and make any necessary changes.

@mergify mergify bot added the pr-feature this PR introduces a new feature to the codebase label Jun 16, 2022
@PsiACE PsiACE requested a review from everpcpc June 16, 2022 08:22
@PsiACE
Copy link
Member

PsiACE commented Jun 16, 2022

Do we need a workflow to publish the docker image for tpch? Alternatively, is there a readily available image, which would reduce maintenance work?

@xudong963
Copy link
Member Author

Do we need a workflow to publish the docker image for tpch? Alternatively, is there a readily available image, which would reduce maintenance work?

We don't need a workflow to publish. The image is used to generate the date set in local, the run tpch related benchmark(The related code is developing).

@everpcpc
Copy link
Member

We could install this tool to our build-tool image if it is need in CI.

@xudong963
Copy link
Member Author

Where would this tool be used?

Used by generating tpch data set. Then run tpch benchmark with the data set.
image

@xudong963
Copy link
Member Author

We could install this tool to our build-tool image if it is need in CI.

I think it doesn't need in CI. cc @leiysky Do we have a plan to run tpch in CI to detect performance regression?

@Xuanwo
Copy link
Member

Xuanwo commented Jun 16, 2022

How about putting them under scripts?

@Xuanwo
Copy link
Member

Xuanwo commented Jun 16, 2022

Do we have a plan to run tpch in CI to detect performance regression?

Please consider do bench via databend-perf. Our CI resources are very limited and not suitable for those workloads.

@xudong963
Copy link
Member Author

How about putting them under scripts?

I thought about putting it under scripts before, but in the future, I want to put tpch benchmark related code under tpch

tpch/run-tpch-dbgen.sh Outdated Show resolved Hide resolved
tpch/tpchdata.dockerfile Outdated Show resolved Hide resolved
@Xuanwo
Copy link
Member

Xuanwo commented Jun 16, 2022

I want to put tpch benchmark related code under tpch

I prefer adding a new folder called benchmarks or benches. So that we can add other bench-related tools.

@xudong963
Copy link
Member Author

I want to put tpch benchmark related code under tpch

I prefer adding a new folder called benchmarks or benches. So that we can add other bench-related tools.

LGTM

tpch/tpchdata.dockerfile Outdated Show resolved Hide resolved
tpch/run-tpch-dbgen.sh Outdated Show resolved Hide resolved
@xudong963
Copy link
Member Author

Updated, please take another look @Xuanwo @leiysky @everpcpc

benchmark/tpch/README.md Outdated Show resolved Hide resolved
Co-authored-by: BohuTANG <overred.shuttler@gmail.com>
Copy link
Contributor

@leiysky leiysky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, maybe we can add this to https://github.com/leiysky/tpch-databend as well.

@BohuTANG BohuTANG merged commit d3618d0 into databendlabs:main Jun 17, 2022
@xudong963 xudong963 deleted the tpch-data branch June 17, 2022 05:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need-review pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Prepared TPCH data set for databend
7 participants