Improve ELT performance by running multiple tap stream processes in parallel ("Melturbo") #2677

MeltyBot · 2021-04-26T19:13:52Z

Migrated from GitLab: https://gitlab.com/meltano/meltano/-/issues/2727

Originally created by @kgpayne on 2021-04-26 19:13:52

For integrations with large numbers of streams (e.g. monolithic application databases) it is often advantageous to break the integration up into multiple Meltano pipelines. Today, this is supported (using tap inheritance and selection criteria) but very manual. In an ideal world, Meltano would be able to use a combination of context collected in discovery, context from previous pipeline runs (e.g. records transferred) and hints/overrides provided by end-users in config to optimise the 'sharding' of pipelines. Like the turbocharger on a Diesel engine, this is in aid of eke out every bit of available performance, and may also serve as a way of parallelising taps and targets that do not implement multithreading.

Discovery Context:

Availability of Primary Keys (large FULL_TABLE replications in their own pipelines)
Data Volume (count of records)
Data Velocity (records per hour, if a created_at field is available)

Run-history Context:

Avg. time taken for Stream sync
Records per Stream sync (could be many more than records per hour above for e.g. weekly schedules)

Hint/Override Context:

Composable Table/Stream groups (we have several pipelines that collect subsets of tables that correspond to a 'domain' in our monolith)
Custom grouping rules (e.g. all FULL_TABLE replication Streams under 10k rows in one pipeline)

Initially I would expect to have to run a melatno optimise extractor <extractor-name> command. In future, it would be amazing to delegate optimisation to Meltano in some sort of 'auto-pilot' mode, where optimisation happens unsupervised as run history is accumulated and in response to up-stream changes (changes in data volumes, velocities and as new tables are added).

The text was updated successfully, but these errors were encountered:

MeltyBot · 2022-05-30T07:12:39Z

View 7 previous comments from the original issue on GitLab

stale · 2023-06-25T00:03:49Z

This has been marked as stale because it is unassigned, and has not had recent activity. It will be closed after 21 days if no further activity occurs. If this should never go stale, please add the evergreen label, or request that it be added.

tayloramurphy · 2023-06-26T13:14:46Z

Still relevant

stale · 2024-12-25T13:22:17Z

This has been marked as stale because it is unassigned, and has not had recent activity. It will be closed after 21 days if no further activity occurs. If this should never go stale, please add the evergreen label, or request that it be added.

MeltyBot mentioned this issue May 30, 2022

Refresh catalog on every invoke (fresh_catalog: true) #2848

Closed

labelsync-manager bot added the kind/Feature label Jun 23, 2022

tayloramurphy removed the kind/Feature label Jun 24, 2022

tayloramurphy mentioned this issue Aug 3, 2022

Is there a way run meltano tap+targets with multiple cpu cores? #6377

Closed

aaronsteers moved this to Up Next in Office Hours Aug 17, 2022

aaronsteers added this to Office Hours Aug 17, 2022

aaronsteers moved this from Up Next to Discussed in Office Hours Aug 17, 2022

aaronsteers mentioned this issue Aug 23, 2022

Feature: Add end_date support in generic tap config meltano/sdk#922

Open

tayloramurphy changed the title ~~Improve ELT performance by running multiple tap processes in parallel ("Melturbo")~~ Improve ELT performance by running multiple tap stream processes in parallel ("Melturbo") Feb 24, 2023

tayloramurphy removed flow::triage migrated from gitlab labels Feb 24, 2023

stale bot added the stale label Jun 25, 2023

stale bot removed the stale label Jun 26, 2023

stale bot added the stale label Dec 25, 2024

stale bot closed this as completed Jan 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve ELT performance by running multiple tap stream processes in parallel ("Melturbo") #2677

Improve ELT performance by running multiple tap stream processes in parallel ("Melturbo") #2677

MeltyBot commented Apr 26, 2021

MeltyBot commented May 30, 2022

stale bot commented Jun 25, 2023

tayloramurphy commented Jun 26, 2023

stale bot commented Dec 25, 2024

Improve ELT performance by running multiple tap stream processes in parallel ("Melturbo") #2677

Improve ELT performance by running multiple tap stream processes in parallel ("Melturbo") #2677

Comments

MeltyBot commented Apr 26, 2021

MeltyBot commented May 30, 2022

stale bot commented Jun 25, 2023

tayloramurphy commented Jun 26, 2023

stale bot commented Dec 25, 2024