Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[中文] help! websocket write: broken pipe, and ws.cleanupConnection() Unlock() not work #290

Open
AndrewYEEE opened this issue Aug 31, 2024 · 1 comment

Comments

@AndrewYEEE
Copy link

OCPP version:
[x] 1.6
[ ] 2.0.1

I'm submitting a ...

[x] bug report
[ ] feature request

Current behavior:
ChargingPoint因為異常同時發送大量Heartbeat,我們發生以下錯誤:
time="2024-08-31T06:45:32+08:00" level=info msg=CCCCCC____Heartbeat
time="2024-08-31T06:45:32+08:00" level=info msg=BBBBBB____Heartbeat
time="2024-08-31T06:45:33+08:00" level=info msg=AAAAAA____Heartbeat
time="2024-08-31T06:45:33+08:00" level=info msg=AAAAAA____Heartbeat
time="2024-08-31T06:45:33+08:00" level=info msg="write failed for AAAAAA: %!w(*net.OpError=&{write tcp 0xc0007a24b0 0xc0007a24e0 0xc000682a60})" logger=websocket
time="2024-08-31T06:45:33+08:00" level=error msg="write failed for AAAAAA: write tcp 172.18.0.xx:9001->172.18.0.xx:57006: write: broken pipe" logger=websocket
time="2024-08-31T06:45:33+08:00" level=info msg=AAAAAA____Heartbeat
time="2024-08-31T06:45:33+08:00" level=info msg=EEEEEE____Heartbeat

但下一步沒有印出
"closed connection to AAAAAA"

依照ws/websocket.go的writePump(),
當發生"write failed"時,下一步會執行 server.cleanupConnection(ws) 關閉連線,並且印出"closed connection to AAAAAA",然而並沒有。

我們判斷程序被卡住在server.cleanupConnection(ws) 並沒有釋放 server.connMutex.Unlock()。

導致後續所有訊息都因server.connMutex而被卡住無法回應給ChargingPoint。

有人可以給出解答嗎?

是否有可能發生Race condition在cleanupConnection(),或是server.cleanupConnection(ws) 裡的 close(ws.outQueue) and close(ws.closeC) 被卡住了呢?

謝謝

@AndrewYEEE AndrewYEEE changed the title websocket write: broken pipe, and Unlock() in ws.cleanupConnection() not work websocket write: broken pipe, and ws.cleanupConnection() Unlock() not work Aug 31, 2024
@AndrewYEEE AndrewYEEE changed the title websocket write: broken pipe, and ws.cleanupConnection() Unlock() not work [中文] websocket write: broken pipe, and ws.cleanupConnection() Unlock() not work Aug 31, 2024
@AndrewYEEE AndrewYEEE changed the title [中文] websocket write: broken pipe, and ws.cleanupConnection() Unlock() not work [中文] help! websocket write: broken pipe, and ws.cleanupConnection() Unlock() not work Aug 31, 2024
@AndrewYEEE
Copy link
Author

AndrewYEEE commented Sep 1, 2024

Maybe I found problem

The problem is, using RLock() in a process that may require Lock() :

    func (server *Server) Write(webSocketId string, data []byte) error {
	    server.connMutex.RLock()
	    defer server.connMutex.RUnlock()  \\<=============
	    ws, ok := server.connections[webSocketId]
	    if !ok {
		    return fmt.Errorf("couldn't write to websocket. No socket with id %v is open", webSocketId)
	    }
	    log.Debugf("queuing data for websocket %s", webSocketId)
	    ws.outQueue <- data \\<=============
	    return nil
    }

We know that when several messages enter at the same time, several RLock() will be generated,
And if one of the messages is wrong, cleanupConnection() in writePump() will be triggered.

            case ping := <-ws.pingMessage:
	            _ = conn.SetWriteDeadline(time.Now().Add(server.timeoutConfig.WriteWait))
	            err := conn.WriteMessage(websocket.PongMessage, ping)
	            if err != nil {
		            server.error(fmt.Errorf("write failed for %s: %w", ws.ID(), err))
		            // Invoking cleanup, as socket was forcefully closed
		            server.cleanupConnection(ws) //<============= here
		            return
	            }

There is Lock() in cleanupConnection(), which will cause Lock() to be blocked by other RLock() and generate Deadlock.

          func (server *Server) cleanupConnection(ws *WebSocket) {
	          _ = ws.connection.Close()
	          server.connMutex.Lock() //<============== here
	          close(ws.outQueue)
	          close(ws.closeC)
	          delete(server.connections, ws.id)
	          server.connMutex.Unlock()
	          log.Infof("closed connection to %s", ws.ID())
	          if server.disconnectedHandler != nil {
		          server.disconnectedHandler(ws)
	          }
          }

So, we need to modify the RUnLock position so that it does not occur at the same time as Lock():

      func (server *Server) Write(webSocketId string, data []byte) error {
	      server.connMutex.RLock()
	      // defer server.connMutex.RUnlock() //<======== change here
	      ws, ok := server.connections[webSocketId] 
              server.connMutex.RUnlock()  //<======== to here
	      if !ok {
		      return fmt.Errorf("couldn't write to websocket. No socket with id %v is open", webSocketId)
	      }
	      log.Debugf("queuing data for websocket %s", webSocketId)
	      ws.outQueue <- data
	      return nil
      }

Not sure if it is correct, please help me confirm, thks.
@lorenzodonini

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant