Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 batch export: Parallelize upload for increased performance #28129

Open
tomasfarias opened this issue Jan 31, 2025 · 0 comments
Open

S3 batch export: Parallelize upload for increased performance #28129

tomasfarias opened this issue Jan 31, 2025 · 0 comments
Labels
enhancement New feature or request

Comments

@tomasfarias
Copy link
Contributor

Feature request

Is your feature request related to a problem?

Not strictly a problem, but S3 batch exports are leaving performance gains on the table now that they do not track progress. Given that we always restart from the beginning, we can drop ordering constraints and parallelize the consumption of data from ClickHouse.

Describe the solution you'd like

The S3 multi part upload management would have to be separated from the S3 consumers. Now, multiple S3 consumers should share a reference to the same S3 multi part upload manager, which would be in charge of deciding when to start a new upload. This will require the use of some async synchronization primitives.

In the event of a failure, all pending uploads can be cancelled.

Describe alternatives you've considered

N/A

Additional context

Faster performance also helps with avoiding timeouts in the event of instability.

Debug info

- [ ] PostHog Cloud, Debug information: [please copy/paste from https://us.posthog.com/settings/project-details#variables]
- [ ] PostHog Hobby self-hosted with `docker compose`, version/commit: [please provide]
- [ ] PostHog self-hosted with Kubernetes (deprecated, see [`Sunsetting Kubernetes support`](https://posthog.com/blog/sunsetting-helm-support-posthog)), version/commit: [please provide]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Features
Development

No branches or pull requests

1 participant