Provisioning Beaker machines w/count > 1 causes XML RPC errors #1693

Dannyb48 · 2020-03-16T18:56:29Z

Describe the bug
On March 11th carbon was provisoining beaker systems with a count of 2 and everything was working creating and destroying fine. From that point on provisioning were failing on destroy. The error from Beaker is the following.

xmlrpclib.Fault: <Fault 1: "<class 'sqlalchemy.exc.OperationalError'>:(OperationalError) (1213, 'Deadlock found when trying to get lock; try restarting transaction') 'INSERT INTO job_activity (id, job_id) VALUES (%s, %s)' (92678998L, 4138086L)

But the issue seems to be cosmetic in the sense that when I lookup the beaker job id the job was indeed cancelled. So it seems like some type of race condition.

I was able to reproduce this outside of Carbon with the following PinFile and a count of greater than 1. If I use the same PinFile and specify just a count of 1 there are no issues on:

---
beaker-test:
  topology:
    resource_groups:
    - resource_definitions:
      - job_group: ci-ops-central
        recipesets:
        - arch: x86_64
          count: 2
          distro: RHEL-7.5
          hostrequires:
          - op: '='
            tag: pool
            value: ci-ops-central-qe
          - op: '>'
            tag: memory
            value: 15000
          - op: <
            tag: memory
            value: 400000
          - op: '>'
            tag: cpu_count
            value: 2
          - op: <
            tag: cpu_count
            value: 13
          name: carbon-beaker-node
          ssh_key_file:
          - demo.pub
          variant: Server
        role: bkr_server
        ssh_keys_path: /home/dbaez/projects/carbon-py3/carbon_include_scenario_example/keys
        whiteboard: Danny_Test
      resource_group_name: carbon
      resource_group_type: beaker
    topology_name: carbon

STACKTRACE

Traceback (most recent call last):
  File "/home/dbaez/.ansible/tmp/ansible-tmp-1584371936.9-84960481172744/AnsiballZ_bkr_server.py", line 102, in <module>
    _ansiballz_main()
  File "/home/dbaez/.ansible/tmp/ansible-tmp-1584371936.9-84960481172744/AnsiballZ_bkr_server.py", line 94, in _ansiballz_main
    invoke_module(zipped_mod, temp_path, ANSIBALLZ_PARAMS)
  File "/home/dbaez/.ansible/tmp/ansible-tmp-1584371936.9-84960481172744/AnsiballZ_bkr_server.py", line 40, in invoke_module
    runpy.run_module(mod_name='ansible.modules.bkr_server', init_globals=None, run_name='__main__', alter_sys=False)
  File "/usr/lib64/python2.7/runpy.py", line 192, in run_module
    fname, loader, pkg_name)
  File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/tmp/ansible_bkr_server_payload_rzY4Xe/ansible_bkr_server_payload.zip/ansible/modules/bkr_server.py", line 333, in <module>
  File "/tmp/ansible_bkr_server_payload_rzY4Xe/ansible_bkr_server_payload.zip/ansible/modules/bkr_server.py", line 329, in main
  File "/tmp/ansible_bkr_server_payload_rzY4Xe/ansible_bkr_server_payload.zip/ansible/modules/bkr_server.py", line 268, in cancel_jobs
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1243, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1602, in __request
    verbose=self.__verbose
  File "/home/dbaez/.virtualenvs/linchpin/lib/python2.7/site-packages/bkr/common/xmlrpc2.py", line 478, in request
    result = transport_class.request(self, *args, **kwargs)
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1283, in request
    return self.single_request(host, handler, request_body, verbose)
  File "/home/dbaez/.virtualenvs/linchpin/lib/python2.7/site-packages/bkr/common/xmlrpc2.py", line 386, in _single_request
    return self.parse_response(response)
  File "/usr/lib64/python2.7/xmlrpclib.py", line 1493, in parse_response
    return u.close()
  File "/usr/lib64/python2.7/xmlrpclib.py", line 800, in close
    raise Fault(**self._stack[0])
xmlrpclib.Fault: <Fault 1: "<class 'sqlalchemy.exc.OperationalError'>:(OperationalError) (1213, 'Deadlock found when trying to get lock; try restarting transaction') 'INSERT INTO job_activity (id, job_id) VALUES (%s, %s)' (92678998L, 4138086L)">

To Reproduce
Steps to reproduce the behavior:

Create a PinFile to provision a Beaker resource using a count > 1
Run linchpin -vvvv up
Run linchpin -vvvv destroy
See error

The text was updated successfully, but these errors were encountered:

samvarankashyap · 2020-03-16T19:46:11Z

@Dannyb48
I am assuming you are using a develop branch of linchpin.
Were you able to reproduce the bug in previous versions too? like 1.9.2 ?
I think the error is due to some unknown changes from the beaker server.
Further, I see the above is running python2.7, it would be great if it runs on python3.x

This should fix CentOS-PaaS-SIG#1693 where two resources belong to the same job are filtered to avoid xmlrpc condition

Dannyb48 · 2020-03-16T21:53:30Z

@samvarankashyap

I agree I think this is an unknown change on Beaker server side. Yes, I am using latest develop branch.

Good catch, it looks like it's not reproduceable on python 3. Although, I have identified and tested a fix for py2. I just got done testing it with python3 and it worked fine. The fix will allow whatever changes happened on the Beaker server side to be backwards compatible for python 2 as well.

I just submitted it. I'll let you guys review to see if you want to merge it in.

This should fix CentOS-PaaS-SIG#1693 where two resources belong to the same job are filtered to avoid xmlrpc condition

Dannyb48 self-assigned this Mar 16, 2020

Dannyb48 added a commit to Dannyb48/linchpin that referenced this issue Mar 16, 2020

Fix an xmlrpc beaker error when count > 1

e512320

This should fix CentOS-PaaS-SIG#1693 where two resources belong to the same job are filtered to avoid xmlrpc condition

Dannyb48 added a commit to Dannyb48/linchpin that referenced this issue Mar 16, 2020

Fix an xmlrpc beaker error when count > 1

1bc5a4b

This should fix CentOS-PaaS-SIG#1693 where two resources belong to the same job are filtered to avoid xmlrpc condition

Dannyb48 mentioned this issue Mar 16, 2020

Fix an xmlrpc beaker error when count > 1 #1694

Merged

Dannyb48 added a commit to Dannyb48/linchpin that referenced this issue Mar 17, 2020

Fix an xmlrpc beaker error when count > 1

2fddbad

This should fix CentOS-PaaS-SIG#1693 where two resources belong to the same job are filtered to avoid xmlrpc condition

samvarankashyap added this to the v2.0.0 milestone Mar 17, 2020

adl-bot closed this as completed in #1694 Mar 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provisioning Beaker machines w/count > 1 causes XML RPC errors #1693

Provisioning Beaker machines w/count > 1 causes XML RPC errors #1693

Dannyb48 commented Mar 16, 2020

samvarankashyap commented Mar 16, 2020

Dannyb48 commented Mar 16, 2020

Provisioning Beaker machines w/count > 1 causes XML RPC errors #1693

Provisioning Beaker machines w/count > 1 causes XML RPC errors #1693

Comments

Dannyb48 commented Mar 16, 2020

samvarankashyap commented Mar 16, 2020

Dannyb48 commented Mar 16, 2020