-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
raft: fix leaderID error when state changed #2415
Conversation
Thanks @xiaost -- I'll take a look. In the meantime can you please sign the CLA here: http://influxdb.com/community/cla.html We can't merge this change until then. |
@otoolep LTGM, signed. 😃 |
Can you describe the problems you had with the system before this change On Friday, April 24, 2015, xiaost notifications@github.com wrote:
|
@otoolep A -> Leader After A restarted:
after election, B should be leader but log.Printf("isLeader : %v, State: %v", l.IsLeader(), l.State()) outputs:
|
We have discovered a leader-election bug and are fixing it in PR #2418. How easily can you reproduce the error you are seeing? I'd like to know if you can reproduce the issue after we merge 2418. |
#2418 seems to be another bug i think. :-( ping @jwilder test code: package main
import (
"encoding/binary"
"fmt"
"io"
"log"
"net/http"
"net/url"
"os"
"sync"
"time"
"github.com/influxdb/influxdb/raft"
)
func main() {
os.RemoveAll("/tmp/testraft1")
os.RemoveAll("/tmp/testraft2")
os.RemoveAll("/tmp/testraft3")
log1 := StartRaft("/tmp/testraft1", 3771, "")
log2 := StartRaft("/tmp/testraft2", 3772, "http://127.0.0.1:3771")
log3 := StartRaft("/tmp/testraft3", 3773, "http://127.0.0.1:3771")
t := time.Tick(1 * time.Second)
failover := false
for _ = range t {
if failover == false {
log.Printf("log1 isLeader : %v, State: %v", log1.IsLeader(), log1.State())
}
log.Printf("log2 isLeader : %v, State: %v", log2.IsLeader(), log2.State())
log.Printf("log3 isLeader : %v, State: %v", log3.IsLeader(), log3.State())
if log1.IsLeader() {
log1.Close()
failover = true
}
}
}
func StartRaft(path string, port int, join string) *raft.Log {
bindstr := fmt.Sprintf("127.0.0.1:%d", port)
urlstr := fmt.Sprintf("http://%s/", bindstr)
myurl, _ := url.Parse(urlstr)
l := raft.NewLog()
//l.DebugEnabled = true
l.SetURL(*myurl)
l.FSM = NewIndexFSM()
l.Open(path)
index, _ := l.LastLogIndexTerm()
if index == 0 {
if join != "" {
joinURL, err := url.Parse(join)
if err != nil {
log.Fatal(err)
}
if err := l.Join(*joinURL); err != nil {
if err != raft.ErrInitialized {
log.Fatal(err)
}
}
} else {
if err := l.Initialize(); err != nil {
if err != raft.ErrInitialized {
log.Fatal(err)
}
}
}
}
h := raft.Handler{Log: l}
mux := http.NewServeMux()
mux.HandleFunc("/raft/", h.ServeHTTP)
go func() {
log.Println(http.ListenAndServe(bindstr, mux))
}()
return l
}
// IndexFSM represents a state machine that only records the last applied index.
type IndexFSM struct {
mu sync.Mutex
index uint64
}
func NewIndexFSM() *IndexFSM {
fsm := &IndexFSM{}
return fsm
}
// MustApply updates the index.
func (fsm *IndexFSM) Apply(entry *raft.LogEntry) error {
fsm.mu.Lock()
fsm.index = entry.Index
fsm.mu.Unlock()
return nil
}
// Index returns the highest applied index.
func (fsm *IndexFSM) Index() uint64 {
fsm.mu.Lock()
defer fsm.mu.Unlock()
return fsm.index
}
// WriteTo writes a snapshot of the FSM to w.
func (fsm *IndexFSM) WriteTo(w io.Writer) (n int64, err error) {
fsm.mu.Lock()
defer fsm.mu.Unlock()
return 0, binary.Write(w, binary.BigEndian, fsm.index)
}
// ReadFrom reads an FSM snapshot from r.
func (fsm *IndexFSM) ReadFrom(r io.Reader) (n int64, err error) {
fsm.mu.Lock()
defer fsm.mu.Unlock()
return 0, binary.Read(r, binary.BigEndian, &fsm.index)
} |
Thanks for the test program @xiaost -- we'll check it out. |
I was able to reproduce this issue in a different way as well. |
OK, we looked at the code, this does seem be a real issue, good catch @xiaost. From looking at the code, the following would occur if
|
👍 Verified this fixed the issue for my test as well. |
Thanks again @xiaost -- merging now. |
raft: fix leaderID error when state changed
👍 |
I'm try to use raft module in my project and found the bug:
after follower becoming leader, the
IsLeader()
api don't return correctly 😭 😢