Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: use same initialcluster config to restart joined member #1279

Merged
merged 10 commits into from
Oct 25, 2018

Conversation

nolouch
Copy link
Contributor

@nolouch nolouch commented Oct 18, 2018

What problem does this PR solve?

detail in: pingcap/tidb-operator#126
because some problem in etcd when joining a member. it may prepare to join failed. and we dynamically generate the join config for join member. and that's may cause the problem:

2018/10/18 03:43:15.239 main.go:92: [fatal] run server failed: couldn't find local name "demo-cluster-pd-3" in the initial cluster configuration

What is changed and how it works?

use the same join config after join success.

Check List

Tests

  • Unit test
  • Integration test(tidb-operator)

@nolouch nolouch changed the title server: use the same initcluster to restart joined member server: use same initalcluster config to restart joined member Oct 18, 2018
server/join.go Outdated
@@ -73,8 +83,20 @@ func PrepareJoinCluster(cfg *Config) error {
return errors.New("join self is forbidden")
}

// Cases with data directory.
filePath := cfg.DataDir + "/join"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
filePath := cfg.DataDir + "/join"
filePath := filepath.Join(cfg.DataDir, "join")

server/join.go Show resolved Hide resolved
server/join.go Outdated
@@ -138,7 +161,20 @@ func PrepareJoinCluster(cfg *Config) error {
initialCluster = strings.Join(pds, ",")
cfg.InitialCluster = initialCluster
cfg.InitialClusterState = embed.ClusterStateFlagExisting
return nil
err = os.Mkdir(cfg.DataDir, privateDirMode)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to use MkdirAll here?

@nolouch nolouch added DNM and removed DNM labels Oct 23, 2018
@nolouch
Copy link
Contributor Author

nolouch commented Oct 23, 2018

PTAL

server/join.go Outdated
@@ -103,6 +126,9 @@ func PrepareJoinCluster(cfg *Config) error {

existed := false
for _, m := range listResp.Members {
if len(m.Name) == 0 {
return errors.New("exsist a member that the join is not completed")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about there is a member that has not been joined successfully?
PTAL @CaitinChen

Copy link

@CaitinChen CaitinChen Oct 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@disksing there is a member that has not joined successfully

server/join.go Outdated
return nil
err = os.MkdirAll(cfg.DataDir, privateDirMode)
if err != nil && !os.IsExist(err) {
return err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WithStack?

server/join.go Outdated
return err
}

for i := 0; i < retryTimes; i++ {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we don't need to retry. It's ok to exit directly.

server/join.go Outdated
}
break
}
return err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WithStack?

@nolouch nolouch changed the title server: use same initalcluster config to restart joined member server: use same initialcluster config to restart joined member Oct 24, 2018
@gregwebs
Copy link
Contributor

gregwebs commented Nov 9, 2018

@disksing are there any plans to put this into the next RC (assuming there is one)? This issue shows up frequently in our DinD setup, so we could also use this to produce a docker image specifically for that setup.

@disksing
Copy link
Contributor

disksing commented Nov 9, 2018

I think we can include it in RC5. /cc @nolouch

@nolouch nolouch added the needs-cherry-pick-release-2.1 The PR needs to cherry pick to release-2.1 branch. label Nov 12, 2018
@nolouch nolouch deleted the fix-join branch November 12, 2018 03:37
nolouch added a commit that referenced this pull request Nov 12, 2018
* server: use same initialcluster config to restart joined member (#1279)

* server/leader: use the last modify revision to watch leader (#1317)
@nolouch nolouch mentioned this pull request Nov 12, 2018
@nolouch nolouch added the needs-cherry-pick-release-2.0 The PR needs to cherry pick to release-2.0 branch. label Nov 13, 2018
disksing pushed a commit to oh-my-tidb/pd that referenced this pull request Nov 14, 2018
disksing added a commit that referenced this pull request Nov 14, 2018
* server, client: fix hanging problem when etcd failed to start (#1267)

* server: use same initialcluster config to restart joined member (#1279)

* fix server build

* pdctl: cherry pick bugfixes (#1298, #1299, #1308)

* server/api: fix the issue about `regions/check` API (#1311)

* fix join build

* fix pdctl build

* fix region test

* fix warnings
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-cherry-pick-release-2.0 The PR needs to cherry pick to release-2.0 branch. needs-cherry-pick-release-2.1 The PR needs to cherry pick to release-2.1 branch.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants