-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve ELT performance by running multiple tap stream processes in parallel ("Melturbo") #2677
Comments
This has been marked as stale because it is unassigned, and has not had recent activity. It will be closed after 21 days if no further activity occurs. If this should never go stale, please add the |
Still relevant |
This has been marked as stale because it is unassigned, and has not had recent activity. It will be closed after 21 days if no further activity occurs. If this should never go stale, please add the |
Migrated from GitLab: https://gitlab.com/meltano/meltano/-/issues/2727
Originally created by @kgpayne on 2021-04-26 19:13:52
For integrations with large numbers of streams (e.g. monolithic application databases) it is often advantageous to break the integration up into multiple Meltano pipelines. Today, this is supported (using tap inheritance and selection criteria) but very manual. In an ideal world, Meltano would be able to use a combination of context collected in discovery, context from previous pipeline runs (e.g. records transferred) and hints/overrides provided by end-users in config to optimise the 'sharding' of pipelines. Like the turbocharger on a Diesel engine, this is in aid of eke out every bit of available performance, and may also serve as a way of parallelising taps and targets that do not implement multithreading.
Discovery Context:
created_at
field is available)Run-history Context:
Hint/Override Context:
Initially I would expect to have to run a
melatno optimise extractor <extractor-name>
command. In future, it would be amazing to delegate optimisation to Meltano in some sort of 'auto-pilot' mode, where optimisation happens unsupervised as run history is accumulated and in response to up-stream changes (changes in data volumes, velocities and as new tables are added).The text was updated successfully, but these errors were encountered: