Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust wait time after replication restart #1422

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

andyedison
Copy link

@andyedison andyedison commented Jun 11, 2024

A Pull Request should be associated with an Issue.

We wish to have discussions in Issues. A single issue may be targeted by multiple PRs.
If you're offering a new feature or fixing anything, we'd like to know beforehand in Issues,
and potentially we'll be able to point development in a particular direction.

Related issue:

Further notes in https://github.com/github/gh-ost/blob/master/.github/CONTRIBUTING.md
Thank you! We are open to PRs, but please understand if for technical reasons we are unable to accept each and any PR

Description

This PR increases the startSlavePostWaitMilliseconds as we are seeing an error when running gh-ost in some cloud environments that the Slave_IO_Running is Connecting rather than Yes as expected.

We found this old PR that described the issue we're having #337 - as a first step we are increasing by doubling the value. If this test is successful, then we'll look into making this something that could be configured.

In case this PR introduced Go code changes:

  • contributed code is using same conventions as original code
  • script/cibuild returns with no formatting errors, build errors or unit test errors.

@timvaillancourt
Copy link
Collaborator

@andyedison I wonder if this fix is good enough for all cases. The sleep that exists now is a bit hacky

in some cloud environments that the Slave_IO_Running is Connecting rather than Yes as expected.

What should we be waiting for? I think you're saying the IO thread running. Whatever the answer, it would be safer if gh-ost waited + checked that what we want is achieved vs a time.Sleep()

@andyedison
Copy link
Author

No I agree, I doubt this is good enough for all cases. This was a bit of an experiment to see if increasing this time would prevent the errors we were seeing in a particular environment from happening over a period of time. I believe it has, we just haven't had time to swing back to this and dig into it to find a more permanent solution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants