Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CS confuses error responses between requests in case of timeout #294

Open
andig opened this issue Sep 14, 2024 · 11 comments
Open

CS confuses error responses between requests in case of timeout #294

andig opened this issue Sep 14, 2024 · 11 comments

Comments

@andig
Copy link
Contributor

andig commented Sep 14, 2024

OCPP version:
[x] 1.6
[ ] 2.0.1

I'm submitting a ...

[x] bug report
[ ] feature request

Current behavior:
At evcc-io/evcc#15991 we're experiencing an issue where a message timeout on one message is returned for a different message.

Expected behavior:
When CP does not reply to a CS request, a following (or concurrent) CS request (that the CP replies to) should not receive an error result.

Consider this log:

enqueued CALL [3655400180, ChangeAvailability] for test-4
dispatched request 3655400180 for test-4
send test-4: [2,"3655400180","ChangeAvailability",{"connectorId":0,"type":"Operative"}]
started timeout timer for test-4
recv server: [2,"3655400180","ChangeAvailability",{"connectorId":0,"type":"Operative"}]
handling incoming CALL [3655400180, ChangeAvailability]
enqueued CALL [484949078, GetConfiguration] for test-4

Steps to reproduce:

Related code:

// send availability request and log error if cp doesn't reply
	if err := Instance().ChangeAvailabilityRequest(cp.ID(), 0, core.AvailabilityTypeOperative); err != nil {
		cp.log.DEBUG.Printf("failed configuring availability: %v", err)
	}

// send get configuration request - this finally errors with timeout which it MUST NOT
	resp, err := cp.GetConfiguration()
	if err != nil {
		return err
	}

Other information:

@andig andig changed the title CS confuses error responses between requests CS confuses error responses between requests in case of timeout Sep 14, 2024
@andig
Copy link
Contributor Author

andig commented Sep 14, 2024

Here's another log from testing. All timeouts are default except for dispatcher timeout set to 5s for quicker test results:

2024-09-14 12:42:04 enqueued CALL [2784858784, ChangeAvailability] for test-4
2024-09-14 12:42:04 dispatched request 2784858784 for test-4
2024-09-14 12:42:04 sent JSON message to test-4: [2,"2784858784","ChangeAvailability",{"connectorId":0,"type":"Operative"}]
2024-09-14 12:42:04 started timeout timer for test-4
2024-09-14 12:42:04 received JSON message from server: [2,"2784858784","ChangeAvailability",{"connectorId":0,"type":"Operative"}]
2024-09-14 12:42:04 handling incoming CALL [2784858784, ChangeAvailability]
2024-09-14 12:42:09 timeout for client test-4, canceling message
2024-09-14 12:42:09 completed request 2784858784 for test-4
2024-09-14 12:42:09 request 2784858784 for test-4 timed out
2024-09-14 12:42:09 test-4 ready to transmit again
2024-09-14 12:42:09 enqueued CALL [3284185250, GetConfiguration] for test-4
2024-09-14 12:42:09 dispatched request 3284185250 for test-4
2024-09-14 12:42:09 sent JSON message to test-4: [2,"3284185250","GetConfiguration",{}]
2024-09-14 12:42:09 started timeout timer for test-4
2024-09-14 12:42:14 timeout for client test-4, canceling message
2024-09-14 12:42:14 completed request 3284185250 for test-4
2024-09-14 12:42:14 request 3284185250 for test-4 timed out
2024-09-14 12:42:14 test-4 ready to transmit again

The client timing out is simulated by letting handler wait in a delay:

func (handler *ChargePointHandler) OnChangeAvailability(request *core.ChangeAvailabilityRequest) (confirmation *core.ChangeAvailabilityConfirmation, err error) {
	time.Sleep(time.Minute)
	return core.NewChangeAvailabilityConfirmation(core.AvailabilityStatusAccepted), nil
}

At this point I'm no longer sure if what I'm seeing here is really the real issue or if this is an artefact from testing setup. Log shows that GetConfiguration times out. However, the actual OnGetConfiguration handler of the charge point is never called. At this point I'm no longer sure if this is proper simulation of the actual charge point or if this specific behaviour is due to requests to charge point are being serialised in the library implementation.

@andig
Copy link
Contributor Author

andig commented Sep 14, 2024

If the behaviour here is due to the simulated CP during testing, it would be great to have a different/better way of simulating CP timeouts. For example, the CP implementation could allow returning a sentinel error that is then not relayed to the CS.

@andig
Copy link
Contributor Author

andig commented Sep 14, 2024

@lorenzodonini if you would consider accepting a bounty for resolving this issue it would be great if you could reach out to info@evcc.io. Thank you!

@andig
Copy link
Contributor Author

andig commented Sep 14, 2024

Indeed it seems that this is a testing artefact. I have changed the CP handleIncomingRequest to simply return when it receives a ChangeAvailability:

2024-09-14 13:47:57 enqueued CALL [1904156176, ChangeAvailability] for test-4
2024-09-14 13:47:57 dispatched request 1904156176 for test-4
2024-09-14 13:47:57 sent JSON message to test-4: [2,"1904156176","ChangeAvailability",{"connectorId":0,"type":"Operative"}]
2024-09-14 13:47:57 started timeout timer for test-4
2024-09-14 13:47:57 received JSON message from server: [2,"1904156176","ChangeAvailability",{"connectorId":0,"type":"Operative"}]
2024-09-14 13:47:57 handling incoming CALL [1904156176, ChangeAvailability]
2024-09-14 13:48:02 timeout for client test-4, canceling message
2024-09-14 13:48:02 completed request 1904156176 for test-4
2024-09-14 13:48:02 request 1904156176 for test-4 timed out
2024-09-14 13:48:02 test-4 ready to transmit again
2024-09-14 13:48:02 enqueued CALL [3405965410, GetConfiguration] for test-4
2024-09-14 13:48:02 dispatched request 3405965410 for test-4
2024-09-14 13:48:02 sent JSON message to test-4: [2,"3405965410","GetConfiguration",{}]
2024-09-14 13:48:02 started timeout timer for test-4
2024-09-14 13:48:02 received JSON message from server: [2,"3405965410","GetConfiguration",{}]
2024-09-14 13:48:02 handling incoming CALL [3405965410, GetConfiguration]
2024-09-14 13:48:02 sent CALL RESULT [3405965410]
2024-09-14 13:48:02 sent JSON message to server: [3,"3405965410",{"configurationKey":[{"key":"AuthorizationKey","readonly":false},{"key":"NumberOfConnectors","readonly":false,"value":"1"},{"key":"ChargeProfileMaxStackLevel","readonly":false,"value":"1"},{"key":"ChargingScheduleMaxPeriods","readonly":false,"value":"1"},{"key":"MaxChargingProfilesInstalled","readonly":false,"value":"1"},{"key":"ChargingScheduleAllowedChargingRateUnit","readonly":false,"value":"1"},{"key":"MeterValuesSampledData","readonly":false,"value":"Power.Active.Import    

It's now obvious that the test continues without blocking.

@andig
Copy link
Contributor Author

andig commented Sep 14, 2024

That said: it would be great if we had a means of simulating missing request acceptance or missing reply from either CS or CP. It seems this is currently not possible?

@sc-atompower
Copy link
Contributor

sc-atompower commented Sep 16, 2024

Not totally sure if this is related but we are starting to see panics due to on checked casting of ocpp resonses.

It looks like something can happen and any call can get another calls response and panic.

Examples:

interface conversion: ocpp.Response is *core.HeartbeatConfirmation, not *core.StartTransactionConfirmation

panic: interface conversion: ocpp.Response is *core.HeartbeatConfirmation, not *core.MeterValuesConfirmation

@lorenzodonini this may interest you

@lorenzodonini
Copy link
Owner

@sc-atompower did this start at a specific point in time? Are you on master branch or latest release?

It's tricky to reproduce but I'll look into it, since this sounds like a major issue.

@sc-atompower
Copy link
Contributor

@lorenzodonini it started as soon as we went from

v0.16.1-0.20230417193019-3ecaf00b33b6

to

github.com/lorenzodonini/ocpp-go v0.18.0

and it just happened again.

@sc-atompower
Copy link
Contributor

@lorenzodonini we reverted to version 17 and still had this issue.

We did note that it occurred after a very long hanging request.

We had a client take 57 seconds to process a meter values call. Then we panic.

{"level":"error","ts":1726756089.3155274,"msg":"recovered from panic","stationId":"deb120c4-0b6f-4c48-ab4f-6d84602b5c92","ocppStationId":"KHOBKLMDSO","recover":"interface conversion: ocpp.Response is *core.HeartbeatConfirmation, not *core.MeterValuesConfirmation"}

{"level":"warn","ts":1726756089.3156137,"msg":"long duration","stationId":"deb120c4-0b6f-4c48-ab4f-6d84602b5c92","ocppStationId":"KHOBKLMDSO","duration":57.144220828,"method":"mqttMeterDataCallback"}

we are going to revert back to 16 tomorrow.

@lorenzodonini
Copy link
Owner

Just to clear up the base functionality:

  • There cannot be two requests sent out at the same time from one endpoint to the other (regardless which one). Everything is processed sequentially as the protocol supports only 1 message at a time
    • a second request is only sent out if a response to the previous one was received OR if there was a timeout
  • Whenever a request is sent out, it's saved within a state object, since it is awaiting a response
    1. when a response/error is received, the state object is queried for the original request
    2. the original request is matched against the response, meaning the message ID is verified
    3. if the response has the same message ID as the original request, the type is extracted from the original request
    4. the response is then unmarshaled using reflection, based on the extracted type

Even assuming there is some glitch caused by timeouts, I don't see how the response can be parsed using the incorrect type. The ParseMessage function returns an error if the incoming response doesn't match a pending (already sent) request.

@lorenzodonini
Copy link
Owner

@sc-atompower The behavior you're experience would only make sense if two requests of different types have the same ID, and there is some glitch (which I'm investigating).

Could you please confirm that:

  • you are using the default (not customized) ocpp-j and websocket client/server
  • the message IDs are indeed unique
  • gently provide the full debug log of the event?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants