Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: optimize performance, add batch support #4

Merged
merged 5 commits into from
Apr 1, 2023

Conversation

aaronsteers
Copy link
Contributor

@aaronsteers aaronsteers commented Mar 31, 2023

This addresses some odd performance profile characteristics:

  1. jafgen execution alone is approximately 20 seconds.
  2. Running tap-jaffle-shop and echoing output to a text file took approximately 2 minutes.
  3. Running tap-jaffle-shop in batch mode (as this PR enables) took approximately 20 seconds.

This PR fixes a few things:

  • Adds batch support.
  • Caches PandasStream.schema results internally to reduce runtime per-record performance cost. (Was previously being called twice per record streamed.)
  • Bumps batch size to 100000, reducing frequency of batches to 1-per-stream for 1 year of data.
  • Properly defines tap properties in meltano.yml.

@aaronsteers aaronsteers changed the title feat: optimize performance with batch feat: optimize performance, add batch support Apr 1, 2023
@aaronsteers aaronsteers merged commit 60a0851 into main Apr 1, 2023
@aaronsteers aaronsteers deleted the batch-messaging-support branch April 1, 2023 04:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant