Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

system_tests_topology_disposition: KeyError: 'name' #1041

Closed
jiridanek opened this issue Apr 2, 2023 · 3 comments · Fixed by #1043
Closed

system_tests_topology_disposition: KeyError: 'name' #1041

jiridanek opened this issue Apr 2, 2023 · 3 comments · Fixed by #1043
Assignees
Labels
test-bug It's a test bug unless proven otherwise
Milestone

Comments

@jiridanek
Copy link
Contributor

https://github.com/skupperproject/skupper-router/actions/runs/4585208141/jobs/8097248918#step:10:2557

36:             if self.state == 'topo checking' :
36:                 # In the 'topo checking' state, we send management messages to
36:                 # ask the 4 routers about their connections. Then, parsing the
36:                 # replies, we make sure that we count the expected 6 connections.
36:                 # (The 4 routers are completely connected.)
36:                 if 'OK' == event.message.properties['statusDescription']:
36:                     try:
36:                         # debug for
36:                         # https://github.com/skupperproject/skupper-router/issues/962
36: >                       connection_name = event.message.body['name']
36: E                       KeyError: 'name'
36: 
36: /home/runner/work/skupper-router/skupper-router/skupper-router/tests/system_tests_topology_disposition.py:1023: KeyError

This is in the new code merged by @kgiusti recently in #963

@jiridanek
Copy link
Contributor Author

(this is a flaky fail, it passed upon rerun, https://github.com/skupperproject/skupper-router/actions/runs/4585208141/jobs/8101560431)

@kgiusti
Copy link
Contributor

kgiusti commented Apr 3, 2023

Perfect! That new code I added to the test dumps the message that is failing to parse. Now we have something to look at!

Looks like a test bug. The part of the code that throws the key error is expecting the response to a query for "io.skupper.router.connector" in map form. Specifically:

{'type': 'io.skupper.router.connector', 'name': 'AB_connector',

But when the failure happens the message is not a response to a connection query. Instead it is a response to a link query. From the test failure logs:

body={'attributeNames': ['linkType', 'linkDir', 'linkName', 'owningAddr', 'capacity', 'undeliveredCount', 'unsettledCount', 'acceptedCount', 'rejectedCount', 'releasedCount', 'modifiedCount'],'results': [['router-control', 'out',

So there's a bug in the test state machine where it expects a different type of error.

@kgiusti kgiusti added the test-bug It's a test bug unless proven otherwise label Apr 3, 2023
@kgiusti kgiusti assigned kgiusti and mgoulish and unassigned kgiusti and mgoulish Apr 3, 2023
@kgiusti
Copy link
Contributor

kgiusti commented Apr 3, 2023

And here's the problem with the test:

2023-04-01T21:12:48.4123419Z 36: 1680383568.028513 topo check found connector AD_connector
2023-04-01T21:12:48.4123722Z 36: 1680383568.029111 topo check found connector BD_connector
2023-04-01T21:12:48.4124061Z 36: 1680383568.029652 topo check found connector AC_connector
2023-04-01T21:12:48.4124385Z 36: 1680383568.029907 topo check found connector BC_connector
2023-04-01T21:12:48.4124675Z 36: 1680383568.030450 topo check found connector AB_connector
2023-04-01T21:12:48.4125026Z 36: 1680383568.214657 sent: 0 received: 0 accepted: 0 released: 0 confirmed kills: 0
2023-04-01T21:12:48.4125457Z 36: 1680383568.215063 topo check found connector CD_connector
2023-04-01T21:12:48.4125922Z 36: 1680383568.215071 state transition to : link checking -- because topo check successful
2023-04-01T21:12:48.4126378Z 36: 1680383568.216147 state transition to : topo checking -- because on_link_opened
2023-04-01T21:12:48.4129544Z 36: ERROR: see skupper-router ISSUE #9621

The test has found all the connectors and proceeds to the link check state where it issues link queries. Then a new link comes up. The on_link_opened event fires which unconditionally (and incorrectly) transitions the test back to topo checking. This causes the test to assume the management query response is for a connection query, where in fact the response is from a link query.

IIUC the real fix is to not transition to topo checking until after all the management links have finished coming up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test-bug It's a test bug unless proven otherwise
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants