Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(source): add ndjson source for streaming load #4561

Merged
merged 4 commits into from
Mar 24, 2022

Conversation

sundy-li
Copy link
Member

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

Summary about this PR

  • Add ndjson source for streaming load
  • Make parquet support size_limit option

Changelog

  • New Feature

Related Issues

Fixes #4531

Test Plan

Unit Tests

Stateless Tests

@vercel
Copy link

vercel bot commented Mar 24, 2022

This pull request is being automatically deployed with Vercel (learn more).
To see the status of your deployment, click below or on the icon next to each commit.

🔍 Inspect: https://vercel.com/databend/databend/C2CdjAHtRkZrF4gyA4XSPdkmDbN3
✅ Preview: https://databend-git-fork-sundy-li-source-refactor-databend.vercel.app

[Deployment for e3b4166 canceled]

@mergify
Copy link
Contributor

mergify bot commented Mar 24, 2022

Thanks for the contribution!
I have applied any labels matching special text in your PR Changelog.

Please review the labels and make any necessary changes.

@mergify mergify bot added the pr-feature this PR introduces a new feature to the codebase label Mar 24, 2022
@sundy-li sundy-li marked this pull request as ready for review March 24, 2022 10:05
@sundy-li sundy-li requested a review from BohuTANG as a code owner March 24, 2022 10:05
@BohuTANG
Copy link
Member

Cool, almost good to me, stateless test failed.

Copy link
Member

@BohuTANG BohuTANG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@BohuTANG BohuTANG merged commit 00e90e0 into databendlabs:main Mar 24, 2022
self.buffer.clear();
if self
.reader
.read_line(&mut self.buffer)
Copy link
Contributor

@DCjanus DCjanus Mar 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a block IO in an async function, might block runtime thread.

In this PR, it would not cause serious problem, because we are using std::io::Curse, but in the future, peopel may trying to use this in other situnations, and shot their foot.

Maybe we should replace R: std::io::BufRead with R: tokio::io::BufRead or R: futures::io::AsyncBufRead

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, good catch, I will create a fix for that.

@BohuTANG BohuTANG mentioned this pull request May 12, 2022
55 tasks
@Xuanwo Xuanwo added this to the v0.8 milestone May 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need-review pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add size_limit for Parquet file
5 participants