Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: read-write splitting #402

Merged
merged 10 commits into from
Nov 21, 2023
Merged

Conversation

proost
Copy link
Contributor

@proost proost commented Nov 9, 2023

previous discussion: #400

add read-write splitting when cluster mode

@codecov-commenter
Copy link

codecov-commenter commented Nov 9, 2023

Codecov Report

Attention: 6 lines in your changes are missing coverage. Please review.

Comparison is base (98aa541) 97.52% compared to head (d115a78) 97.51%.
Report is 10 commits behind head on main.

Files Patch % Lines
cluster.go 97.10% 4 Missing and 2 partials ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #402      +/-   ##
==========================================
- Coverage   97.52%   97.51%   -0.02%     
==========================================
  Files          76       76              
  Lines       30868    30999     +131     
==========================================
+ Hits        30105    30229     +124     
- Misses        639      644       +5     
- Partials      124      126       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

cluster.go Outdated
Comment on lines 524 to 537
var p conn
isSendToReplicas := c.opt.SendToReplicas(cmd)
if isSendToReplicas {
p = c.rslots[rand.Intn(int(cmds.InitSlot))]
} else {
p = c.pslots[rand.Intn(int(cmds.InitSlot))]
}

if p == nil {
return nil, 0
}

count.m[p]++
connIndexes[i] = p
Copy link
Collaborator

@rueian rueian Nov 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part looks incorrect and inconsistent with the previous implementation where we only do

init = true
continue

because we would like to defer slot selection when we have commands with no slot in DoMulti().

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, when pipelining the MULTI, SET k v, EXIPRE k 1000, EXEC through DoMulti, we would choose the slot by SET k v and EXIPRE k 1000.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is test cases for this:

	t.Run("DoMulti Single Slot Read Operation And Write Operation + Init Slot", func(t *testing.T) {
		c1 := client.B().Get().Key("K1{a}").Build()
		c2 := client.B().Set().Key("K2{a}").Value("V2{a}").Build()
		c3 := client.B().Info().Build()
		resps := client.DoMulti(context.Background(), c1, c2, c3)
		if v, err := resps[0].ToString(); err != nil || v != "GET K1{a}" {
			t.Fatalf("unexpected response %v %v", v, err)
		}
		if v, err := resps[1].ToString(); err != nil || v != "SET K2{a} V2{a}" {
			t.Fatalf("unexpected response %v %v", v, err)
		}
		if v, err := resps[2].ToString(); err != nil || v != "INFO" {
			t.Fatalf("unexpected response %v %v", v, err)
		}
	})

Above case, "INFO" command is ignored. so i add connection to init slot command

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get what you mean by providing the test case.

I mean, previously we intentionally did not select connections for cmd.Slot() == cmds.InitSlot. Therefore, I expect you would also write the same code of

if cmd.Slot() == cmds.InitSlot {
	init = true
	continue
}

here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible if multiple connections is selected and there is "InitSlot" command, then "InitSlot" command is ignored.

For exmaple,
"GET K1{a}" command uses connection "A", "SET K2{a} V2{a}" command uses connection "B", and there is "InitSlot" command likes "INFO" together. original code ignore "INFO" because none of connections is assigned.

So if you want to changed to original code, above ignoring command is what you intended?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the problem. But, we should only pick one connection if "InitSlot" commands are involved, otherwise commands like MULTI and EXEC will fail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. How about this?
eac17d5
When "InitSlot" command exist among multiple commands, use one of primary node only.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. How about this? eac17d5 When "InitSlot" command exist among multiple commands, use one of primary node only.

Hi @proost,

Sure. Using one of the primary nodes only sounds good to me.

However, the eac17d5 makes the _pickMulti becomes too complex, IMHO. How about we refactor it a little bit to first detect if there are "InitSlot" commands? If yes, we then just go to the original logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1178a7c

Sure! I changed it more simple

cluster.go Outdated
Comment on lines 717 to 730
isSendToReplicas := false
if c.opt.SendToReplicas != nil {
isSendToReplicas = true
for _, cmd := range multi {
if cmd.Slot() == cmds.InitSlot {
continue
}

isSendToReplicas = isSendToReplicas && c.opt.SendToReplicas(cmd)
if !isSendToReplicas {
break
}
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need this part. All the works have been done in the _pickMulti.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part also changed it in here. eac17d5

@proost proost requested a review from rueian November 13, 2023 11:39
cluster.go Outdated
Comment on lines 524 to 537
var p conn
isSendToReplicas := c.opt.SendToReplicas(cmd)
if isSendToReplicas {
p = c.rslots[rand.Intn(int(cmds.InitSlot))]
} else {
p = c.pslots[rand.Intn(int(cmds.InitSlot))]
}

if p == nil {
return nil, 0
}

count.m[p]++
connIndexes[i] = p
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get what you mean by providing the test case.

I mean, previously we intentionally did not select connections for cmd.Slot() == cmds.InitSlot. Therefore, I expect you would also write the same code of

if cmd.Slot() == cmds.InitSlot {
	init = true
	continue
}

here.

@proost proost requested a review from rueian November 14, 2023 11:06
cluster.go Outdated
p = c.slots[slot]
switch {
case slot == cmds.InitSlot:
p = c.pslots[rand.Intn(int(cmds.InitSlot))]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed that we should not choose a random connection by choosing a random slot because it is possible that the chosen slot is not covered by the user's cluster. We should choose randomly from known connections instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1178a7c

Oh, I missed that point. how about this? use pconns.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @proost,

Your pconns reminds me that the original implementation is wrong. The original implementation is possible to choose a replica connection for “InitSlot” commands.

However, I wonder if there is a simpler solution than introducing pconns.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree that introducing "pconns" makes code more complex. but when choose primary node connection randomly among available primary node connections, pconn is simple & efficient solution.
How about replace "conns" with "pconns" and "rconns" like i did to replace "slots" to "pslots" and "rslots"?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the pconns and rconns proposal.

But, after reviewing the access pattern of the conns field, I think we still need this unified map. If we replace it with pconns and rconns, many places will become much more complex, for example:

rueidis/cluster.go

Lines 387 to 390 in 4532630

func (c *clusterClient) redirectOrNew(addr string, prev conn) (p conn) {
c.mu.RLock()
p = c.conns[addr]
c.mu.RUnlock()

Here, we must check both fields to find an existing connection.

Another example is here:

rueidis/cluster.go

Lines 222 to 232 in 4532630

var removes []conn
c.mu.RLock()
for addr, cc := range c.conns {
if _, ok := conns[addr]; ok {
conns[addr] = cc
} else {
removes = append(removes, cc)
}
}
c.mu.RUnlock()

where we reuse living connections and find which connections to remove after refreshing the cluster information. If we split the conns into two fields, this logic will become far more complex, especially when handling living connections switching their roles from one to the other.

I think the simplest solution to this problem could be extending the conns field to be a map[string]struct{conn, bool} where the new bool flag indicates it is a primary connection or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. I changed it 3139c13

@proost proost requested a review from rueian November 17, 2023 15:51
cluster.go Outdated
// NOTE: clusterConnection and conn must be initialized at the same time
type clusterConnection struct {
conn conn
isPrimaryNodeConn bool
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
isPrimaryNodeConn bool
replica bool

I think it is better to consider a conn not a replica connection by default whereas the current isPrimaryNodeConn treats a conn not a primary connection by default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also changed it too. 3bb4fb5

cluster.go Outdated
@@ -226,31 +253,54 @@ func (c *clusterClient) _refresh() (err error) {
if _, ok := conns[addr]; ok {
conns[addr] = cc
Copy link
Collaborator

@rueian rueian Nov 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The role of a connection can change overtime, therefore we should not override the whole field from the old map, we should keep the new role instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, i missed that point. d115a78 Thank you!

cluster.go Outdated
func newClusterClient(opt *ClientOption, connFn connFn) (client *clusterClient, err error) {
client = &clusterClient{
cmd: cmds.NewBuilder(cmds.InitSlot),
opt: opt,
connFn: connFn,
conns: make(map[string]conn),
conns: make(map[string]*clusterConnection),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
conns: make(map[string]*clusterConnection),
conns: make(map[string]connrole),

clusterConnection is too long for me. And I think it is better to use struct directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3bb4fb5 OK. I changed it.

@proost proost requested a review from rueian November 21, 2023 07:30
@rueian
Copy link
Collaborator

rueian commented Nov 21, 2023

Hi @proost,

Thank you for your hard work! It looks good to me.

@rueian rueian merged commit cb76d15 into redis:main Nov 21, 2023
@rueian rueian added the feature label Nov 21, 2023
@proost proost deleted the feat-read-write-splitting branch November 22, 2023 05:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants