Invalid memory address or nil pointer when write commands in function _backgroundWrite #702

gj3233 · 2024-12-18T07:46:32Z

Hello,
Yesterday when testing with redis cluster, my application had been panic at line ( (of rueidis v1.0.46) :
return c.cs.s which in function Commands():
func (c *Completed) Commands() []string { return c.cs.s }
which is call by err = writeCmd(p.w, cmd.Commands())

The complete code is:
`func (p *pipe) _backgroundWrite() (err error) {
var (
ones = make([]Completed, 1)
multi []Completed
ch chan RedisResult
flushDelay = p.maxFlushDelay
flushStart = time.Time{}
)

for err == nil {
	if ones[0], multi, ch = p.queue.NextWriteCmd(); ch == nil {
		if flushDelay != 0 {
			flushStart = time.Now()
		}
		if p.w.Buffered() != 0 {
			if err = p.w.Flush(); err != nil {
				break
			}
		}
		ones[0], multi, ch = p.queue.WaitForWrite()
		if flushDelay != 0 && atomic.LoadInt32(&p.waits) > 1 { // do not delay for sequential usage
			// Blocking commands are executed in dedicated client which is acquired from pool.
			// So, there is no sense to wait other commands to be written.
			// https://github.com/redis/rueidis/issues/379
			var blocked bool
			for i := 0; i < len(multi) && !blocked; i++ {
				blocked = multi[i].IsBlock()
			}
			if !blocked {
				time.Sleep(flushDelay - time.Since(flushStart)) // ref: https://github.com/redis/rueidis/issues/156
			}
		}
	}
	if ch != nil && multi == nil {
		multi = ones
	}
	for _, cmd := range multi {
		err = writeCmd(p.w, cmd.Commands())
	}
}
return

}`

Redis topology:
cluster enabled
3 master 3 replica

Action: Node is starting.

Could you help on this? Thanks. It seems cmd passed to writeCmd() has something wrong that lead to the crash. It confused because cmd is read from range multi, so I think something wrong happened to multi.

The text was updated successfully, but these errors were encountered:

rueian · 2024-12-18T08:40:55Z

Hi @gj3233, do you have a reproducible program on this? And what do you mean by Action: Node is starting?

gj3233 · 2024-12-19T02:54:26Z

Hi @gj3233, do you have a reproducible program on this? And what do you mean by Action: Node is starting?

HI @rueian, I do not have a reproducible program on this, Action: Node is starting means the crash happened when the node just restarted, during the time the system is very busy for querying data from Redis DB with Client.DoMulti(ctx, cmds...), the crash is seen when there are amount of querying DB request send but not seen when querying is not that frequent. I attach the complete back trace file for your reference, thanks.
go-panic.txt

rueian · 2024-12-19T15:01:14Z

Hi @gj3233,

From the clues you gave me:

It panicked on c.cs.s where the only possibility is that the cs is invalid and this can only happen with false recycling.
There was a lot of doretry in back your traces.

I think the panic was caused by an incorrect error checking that falsely recycled an intermediate struct here:

rueidis/cluster.go

Lines 703 to 705 in 9ff4c60

    
           if ctx.Err() == nil { 
        
           	retryp.Put(re) 
        
           }

#706 should fix the issue. You can try it with v1.0.52-alpha.3

gj3233 · 2024-12-20T08:06:40Z

HI @rueian much thanks! I will test with the fix later and update to you.

rueian · 2024-12-31T09:00:24Z

v1.0.52 was released last week. I am going to close the issue for now. Please feel free to reopen if you are still suffering from this.

nikhillimaje · 2025-02-20T18:18:03Z

Hi @rueian, as mentioned in this thread, our application also recently encountered this issue. We're trying to understand the trigger scenario for this bug. In our case, read qps wasn't high neither the redis node/cluster were restarting. However, there was a connection storm to the Redis cluster (as the pods were crash looping) and we're assuming that should be the primary reason. Is there a way to confirm this? Just to ensure that this bug doesn't occur again in case of connection storm in the future.

rueian · 2025-02-20T18:54:48Z

Hi @nikhillimaje,

As you know, non-pinned commands built by the command builder can’t be shared across multiple Do() and DoMulti() because they will be recycled by rueidis for saving memory allocations. The bug was caused by a wrong check for recycling errored commands internally.

Connection storms, depending on your use cases and setup, might be an expected behavior. But it indeed could cause network errors on the client side and further triggered the previous wrong check and made those commands be falsely recycled.

I can only say that was the only wrong check we know so far and has been fixed in the newer version.

rueian mentioned this issue Dec 19, 2024

fix: do not recycle the intermediate commands slice of cluster DoMulti when network errors #706

Merged

rueian closed this as completed Dec 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Invalid memory address or nil pointer when write commands in function _backgroundWrite #702

Invalid memory address or nil pointer when write commands in function _backgroundWrite #702

gj3233 commented Dec 18, 2024

rueian commented Dec 18, 2024 •

edited

Loading

gj3233 commented Dec 19, 2024

rueian commented Dec 19, 2024

gj3233 commented Dec 20, 2024

rueian commented Dec 31, 2024

nikhillimaje commented Feb 20, 2025

rueian commented Feb 20, 2025

Invalid memory address or nil pointer when write commands in function _backgroundWrite #702

Invalid memory address or nil pointer when write commands in function _backgroundWrite #702

Comments

gj3233 commented Dec 18, 2024

rueian commented Dec 18, 2024 • edited Loading

gj3233 commented Dec 19, 2024

rueian commented Dec 19, 2024

gj3233 commented Dec 20, 2024

rueian commented Dec 31, 2024

nikhillimaje commented Feb 20, 2025

rueian commented Feb 20, 2025

rueian commented Dec 18, 2024 •

edited

Loading