Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-37278] Optimize regular schema evolution topology's performance #3912

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

yuxiqian
Copy link
Contributor

@yuxiqian yuxiqian commented Feb 8, 2025

This closes FLINK-37278.

Currently, regular SE topology uses the following process to drain existing DataChangeEvents in the pipeline:

  1. SchemaOperator ("client") emits FlushEvent to downstream.
  2. The "client" keeps polling the SchemaCoordinator ("server") with 1-second interval.
  3. The "server" rejects all requests from clients until it has collected enough FlushSuccessEvent notifications from Sink.

As a result, all schema change requests will took at least 1 second to finish, after at least one polling interval.


This PR replaces the polling code with maintaining a pending schema change request queue, where SchemaCoordinator could manage all pending clients and effectively blocking them from handling upstream events. Schema evolution process could start immediately after FlushSuccessEvent got reported, needless to wait for polling requests from clients.


With this change, time consumption of testRegularTablesSourceInMultipleParallelism test case has been reduced from ~6 minutes to ~50 seconds.

@yuxiqian
Copy link
Contributor Author

yuxiqian commented Feb 8, 2025

Would @hiliuxg like to take a look?

@yuxiqian yuxiqian marked this pull request as ready for review February 8, 2025 08:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant