Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nested serialize.Executor.AsyncExec calls may deadlock when overwhelmed #512

Closed
xzfc opened this issue Oct 8, 2020 · 4 comments
Closed
Assignees
Labels
bug Something isn't working good first issue Good for newcomers

Comments

@xzfc
Copy link
Contributor

xzfc commented Oct 8, 2020

Example of code that deadlocks:

func TestDeadlock(t *testing.T) {
        var exec serialize.Executor

        exec.AsyncExec(func() {
                time.Sleep(time.Second)
                exec.AsyncExec(func() {})
        })

        for i := 0; i < 105; i++ { // 105 is a number that slightly more than channelSize
                exec.AsyncExec(func() {})
        }

        <-exec.AsyncExec(func() {})
}

The issue is that AsyncExec() may block on channel write.

We have code that uses nested AsyncExec() and thus may potentially deadlock:

We should either document this limitation and rewrite the code that uses nested AsyncExec() OR rework serialize.Executor to support unlimited nested calls.

@denis-tingaikin denis-tingaikin added the bug Something isn't working label Oct 8, 2020
@denis-tingaikin
Copy link
Member

denis-tingaikin commented Oct 8, 2020

Tested locally it is reproducible.
Note: This dead lock can be achieved in the real world only by Server chain elements which can receive in one moment 100(channelSize) or more requests in case of using nested AsyncExec().

@denis-tingaikin
Copy link
Member

denis-tingaikin commented Oct 8, 2020

To fix the issue we can use a concurrent queue OR change channelSize to the max count of request per one moment for GRPC.

Fix via list + lock:

diff --git a/pkg/tools/serialize/serialize.go b/pkg/tools/serialize/serialize.go
index 8dc6ec5..23bb782 100644
--- a/pkg/tools/serialize/serialize.go
+++ b/pkg/tools/serialize/serialize.go
@@ -20,6 +20,7 @@
 package serialize
 
 import (
+	"container/list"
 	"sync"
 	"sync/atomic"
 )
@@ -30,9 +31,9 @@ const (
 
 // Executor - a struct that can be used to guarantee exclusive, in order execution of functions.
 type Executor struct {
-	execCh chan func()
-	init   sync.Once
-	count  int32
+	mutex sync.Mutex
+	queue list.List
+	count int32
 }
 
 // NewExecutor - provides a new Executor
@@ -44,15 +45,19 @@ func NewExecutor() Executor {
 // AsyncExec - guarantees f() will be executed Exclusively and in the Order submitted.
 //        It immediately returns a channel that will be closed when f() has completed execution.
 func (e *Executor) AsyncExec(f func()) <-chan struct{} {
-	// Initialize *once*
-	e.init.Do(func() {
-		e.execCh = make(chan func(), channelSize)
-	})
 	// Start go routine if we don't have one
 	if atomic.AddInt32(&e.count, 1) == 1 {
 		go func() {
-			for g := range e.execCh {
-				g()
+			for {
+				e.mutex.Lock()
+				if e.queue.Len() == 0 {
+					e.mutex.Unlock()
+					continue
+				}
+				f := e.queue.Front().Value.(func())
+				e.queue.Remove(e.queue.Front())
+				e.mutex.Unlock()
+				f()
 				if atomic.AddInt32(&e.count, -1) == 0 {
 					return
 				}
@@ -60,9 +65,11 @@ func (e *Executor) AsyncExec(f func()) <-chan struct{} {
 		}()
 	}
 	done := make(chan struct{})
-	e.execCh <- func() {
+	e.mutex.Lock()
+	e.queue.PushBack(func() {
 		f()
 		close(done)
-	}
+	})
+	e.mutex.Unlock()
 	return done
 }
diff --git a/pkg/tools/serialize/serialize_test.go b/pkg/tools/serialize/serialize_test.go
index 08fcd69..9f03ed3 100644
--- a/pkg/tools/serialize/serialize_test.go
+++ b/pkg/tools/serialize/serialize_test.go
@@ -22,12 +22,33 @@ package serialize_test
 
 import (
 	"testing"
+	"time"
 
 	"github.com/stretchr/testify/assert"
 
 	"github.com/networkservicemesh/sdk/pkg/tools/serialize"
 )
 
+func TestNestedAsyncExec(t *testing.T) {
+	var exec serialize.Executor
+
+	exec.AsyncExec(func() {
+		time.Sleep(time.Millisecond * 50)
+		exec.AsyncExec(func() {
+			time.Sleep(time.Millisecond * 50)
+			exec.AsyncExec(func() {
+				time.Sleep(time.Millisecond * 50)
+			})
+		})
+	})
+
+	for i := 0; i < 1e3; i++ {
+		exec.AsyncExec(func() {})
+	}
+
+	<-exec.AsyncExec(func() {})
+}
+
 func TestASyncExec(t *testing.T) {
 	exec := serialize.NewExecutor()
 	count := 0

@denis-tingaikin
Copy link
Member

denis-tingaikin commented Oct 13, 2020

@edwarnicke , @haiodo

I've found additional deadlock with executor:

func TestDeadlock2(t *testing.T)  {
	e := serialize.Executor{}
	e.AsyncExec(func() {
		time.Sleep(time.Millisecond * 100)
		<-e.AsyncExec(func() {
		})
	})
	<-e.AsyncExec(func() {
		logrus.Info("Deadlock")
	})
}

It looks like we need t rework serialize.Executor to support this kind of scenario.
I think we could add a NOTE into the description for this case: Do not call sync exec inside of async exec statement

Thoughts?

@edwarnicke
Copy link
Member

I don't think the issue in this one is calling AsyncExec inside AsyncExec ... its blocking on the nested AsyncExec call. And yes, that's one probably best addressed by documentation.

edwarnicke pushed a commit that referenced this issue Oct 18, 2020
* use remote serialize executor to solve issue 512

Signed-off-by: denis-tingajkin <denis.tingajkin@xored.com>

* make format

Signed-off-by: denis-tingajkin <denis.tingajkin@xored.com>
illbegood pushed a commit to illbegood/sdk that referenced this issue Dec 20, 2020
…etworkservicemesh#540)

* use remote serialize executor to solve issue 512

Signed-off-by: denis-tingajkin <denis.tingajkin@xored.com>

* make format

Signed-off-by: denis-tingajkin <denis.tingajkin@xored.com>
Signed-off-by: Sergey Ershov <sergey.ershov@xored.com>
illbegood pushed a commit to illbegood/sdk that referenced this issue Dec 23, 2020
…etworkservicemesh#540)

* use remote serialize executor to solve issue 512

Signed-off-by: denis-tingajkin <denis.tingajkin@xored.com>

* make format

Signed-off-by: denis-tingajkin <denis.tingajkin@xored.com>
Signed-off-by: Sergey Ershov <sergey.ershov@xored.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants