You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Not strictly a problem, but S3 batch exports are leaving performance gains on the table now that they do not track progress. Given that we always restart from the beginning, we can drop ordering constraints and parallelize the consumption of data from ClickHouse.
Describe the solution you'd like
The S3 multi part upload management would have to be separated from the S3 consumers. Now, multiple S3 consumers should share a reference to the same S3 multi part upload manager, which would be in charge of deciding when to start a new upload. This will require the use of some async synchronization primitives.
In the event of a failure, all pending uploads can be cancelled.
Describe alternatives you've considered
N/A
Additional context
Faster performance also helps with avoiding timeouts in the event of instability.
Debug info
- [ ] PostHog Cloud, Debug information: [please copy/paste from https://us.posthog.com/settings/project-details#variables]
- [ ] PostHog Hobby self-hosted with `docker compose`, version/commit: [please provide]
- [ ] PostHog self-hosted with Kubernetes (deprecated, see [`Sunsetting Kubernetes support`](https://posthog.com/blog/sunsetting-helm-support-posthog)), version/commit: [please provide]
The text was updated successfully, but these errors were encountered:
Feature request
Is your feature request related to a problem?
Not strictly a problem, but S3 batch exports are leaving performance gains on the table now that they do not track progress. Given that we always restart from the beginning, we can drop ordering constraints and parallelize the consumption of data from ClickHouse.
Describe the solution you'd like
The S3 multi part upload management would have to be separated from the S3 consumers. Now, multiple S3 consumers should share a reference to the same S3 multi part upload manager, which would be in charge of deciding when to start a new upload. This will require the use of some async synchronization primitives.
In the event of a failure, all pending uploads can be cancelled.
Describe alternatives you've considered
N/A
Additional context
Faster performance also helps with avoiding timeouts in the event of instability.
Debug info
The text was updated successfully, but these errors were encountered: