Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create table concurrently with subscribing to stream of events #77

Merged
merged 1 commit into from
Aug 8, 2024

Conversation

istreeter
Copy link
Collaborator

For KCL apps, we want the KCL to get initialized as early as possible, so the worker can claim shard leases before they get stolen by other workers.

Before this PR, the loader initialized the destination table first, and then subscribed to the stream afterwards. Initializing the destination table can be fairly slow, especially because we do things like syncing to the external catalog, and possibly cleaning up aborted commits.

After this PR, the loader subscribes to the stream concurrently with initializing the destination table. This lets the KCL claim leases before they get stolen.

For KCL apps, we want the KCL to get initialized as early as possible,
so the worker can claim shard leases before they get stolen by other
workers.

Before this PR, the loader initialized the destination table first, and
then subscribed to the stream afterwards.  Initializing the destination
table can be fairly slow, especially because we do things like syncing
to the external catalog, and possibly cleaning up aborted commits.

After this PR, the loader subscribes to the stream concurrently with
initializing the destination table. This lets the KCL claim leases
before they get stolen.
@istreeter istreeter merged commit b67cb86 into develop Aug 8, 2024
2 checks passed
@istreeter istreeter deleted the create-table-in-background branch August 8, 2024 11:54
zhaow-de added a commit to alloy-ch/rcplus-alloy-snowplow-lake-loader that referenced this pull request Oct 4, 2024
…patch-for-alloy

* commit '7ab2edc3fd4d81ffb4d5f3285d02330def7672b1':
  Upgrade common-streams to 0.8.0-M5
  Delete files asynchronously (snowplow-incubator#82)
  Upgrade common-streams 0.8.0-M4 (snowplow-incubator#81)
  Avoid error on duplicate view name (snowplow-incubator#80)
  Add option to exit on missing Iglu schemas (snowplow-incubator#79)
  common-streams 0.8.x with refactored health monitoring (snowplow-incubator#78)
  Create table concurrently with subscribing to stream of events (snowplow-incubator#77)
  Iceberg fail fast if missing permissions on the catalog (snowplow-incubator#76)
  Make alert messages more human-readable (snowplow-incubator#75)
  Hudi loader should fail early if missing permissions on Glue catalog (snowplow-incubator#72)
  Add alert & retry for delta/s3 initialization (snowplow-incubator#74)
  Implement alerting and retrying mechanisms
  Bump aws-hudi to 1.0.0-beta2 (snowplow-incubator#71)
  Bump hudi to 0.15.0 (snowplow-incubator#70)
  Allow disregarding Iglu field's nullability when creating output columns (snowplow-incubator#66)
  Extend health probe to report unhealthy on more error scenarios (snowplow-incubator#69)
  Fix bad rows resizing (snowplow-incubator#68)
oguzhanunlu pushed a commit that referenced this pull request Nov 1, 2024
For KCL apps, we want the KCL to get initialized as early as possible,
so the worker can claim shard leases before they get stolen by other
workers.

Before this PR, the loader initialized the destination table first, and
then subscribed to the stream afterwards.  Initializing the destination
table can be fairly slow, especially because we do things like syncing
to the external catalog, and possibly cleaning up aborted commits.

After this PR, the loader subscribes to the stream concurrently with
initializing the destination table. This lets the KCL claim leases
before they get stolen.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants