-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider exposing config like dry_run_record_limit
#1366
Comments
@aaronsteers you make good points here but is this different from #1333? It sounds like a proposal for implementing that feature. I'll put my thoughts here but it might fragment our discussion if this is in two issues. Around naming when I hear
It seems like there could be 2 different implementations that we can either chose from or implement both and let the user chose whats best for their use case:
I guess my initial thought of how this would work is closer to My initial idea for this in #1333 was for testing where I want it to stop after X records because my start date isnt sufficient to limit the size of my dataset. Using What do you think? I'm not married to these ideas but wanted to share my thoughts in case its helpful context. |
This has been marked as stale because it is unassigned, and has not had recent activity. It will be closed after 21 days if no further activity occurs. If this should never go stale, please add the |
Still relevant |
We could theoretically expose a record limit in config for SDK-based taps. This would basically be a more advanced implementation of
--test
CLI flag.However, it is important to properly disclaim the following caveats for any users that opt into this behavior:
orders
table, for instance, might not match all - or even any - of the customer IDs from the first 100 records of the customers table.n
parents don't have any children, then no records for the child stream would be synced. Example: I'm syncing with max count of10
. My parent stream isIssues
and my child stream isIssue comments
. If my first 10 issues do not have comments, then exactly zeroIssue comments
records will be synced, even though the max for that stream type was never met.For these reasons, I suggest we pay special attention to naming. I propose above a
dry_run_
prefix but there are other ways this could be communicated as well.A similar approach would be something like
dry_run: true
to enable a developer-specified default (or the SDK default of something like 1000), possibly also allowing an integer instead of the boolean valuedry_run: 10
- with docs clearly explaining thattrue
uses the built-in defaults and any othern
will override the limit at the users' request.Alternative using the CLI
An alternative using just the CLI would be to accept this integer as an optional input to
--test
. So,--test
today tries to sync zero or one records per stream, where-as if--test=N
were specified, we'd try to sync (at least)N
records per stream.cc @kgpayne, @edgarrmondragon
Related:
_MAX_RECORDS_LIMIT
#1349The text was updated successfully, but these errors were encountered: