Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] offline installation fails with error 'yum lockfile is held by another process' (Azure/RHEL) #1824

Closed
przemyslavic opened this issue Nov 4, 2020 · 3 comments

Comments

@przemyslavic
Copy link
Collaborator

przemyslavic commented Nov 4, 2020

Describe the bug
Epicli: offline installation fails with error yum lockfile is held by another process on Azure/RHEL environment.
Reproduced on version 0.5.x and develop, so probably it also applies to all other versions. The problem occurs unpredictably in various tasks.

To Reproduce
Run offline installation on Azure RedHat:

  1. Prepare offline requirements: epicli prepare --os redhat-7
  2. Download offline requirements: /tmp/offline_requirements_rhel/prepare_scripts_rhel/download-requirements.sh /tmp/offline_requirements_rhel/downloads
  3. Block internet access on vms, e.g. by adding a new rule for each machine in epicli/data/azure/defaults/infrastructure/virtual-machine.yml
      - name: deny-internet
        description: Deny Internet
        priority: 100
        direction: Outbound
        access: Deny
        protocol: "*"
        source_port_range: "*"
        destination_port_range: "*"
        source_address_prefix: "*"
        destination_address_prefix: "Internet"
  1. Execute epicli apply -f /shared/05offazurrhelflannel/05offazurrhelflannel.yml --offline-requirements /shared/build/offline_requirements_rhel

Expected behavior
The cluster has been deployed successfully with no Internet access.

Config files
Sample configuration for 0.5:

---
kind: epiphany-cluster
name: default
provider: azure
specification:
  admin_user:
    key_path: /shared/ssh/testenvs/id_rsa
    name: operations
  cloud:
    region: North Europe
    subscription_name: XXX
    use_public_ips: true
    use_service_principal: true   
  components:
    kafka:
      count: 1
      machine: kafka-machine-rhel
    kubernetes_master:
      count: 1
      machine: kubernetes-master-machine-rhel
    kubernetes_node:
      count: 3
      machine: kubernetes-node-machine-rhel
    load_balancer:
      count: 1
      machine: lb-machine-rhel  
    logging:
      count: 2
      machine: logging-machine-rhel
    monitoring:
      count: 1
      machine: monitoring-machine-rhel
    postgresql:
      count: 2
      machine: postgresql-machine-rhel
    rabbitmq:
      count: 2
      machine: rabbitmq-machine-rhel
    ignite:
      count: 2
      machine: ignite-machine-rhel
    opendistro_for_elasticsearch:
      count: 2
      machine:  opendistro-machine-rhel
  name: test
  prefix: qa
title: Epiphany cluster Config
---
kind: infrastructure/virtual-machine
name: kafka-machine-rhel
provider: azure
based_on: kafka-machine
specification:
  storage_image_reference:
    publisher: RedHat
    offer: RHEL
    sku: 7-RAW
    version: "7.7.2019090418"
---
kind: infrastructure/virtual-machine
name: kubernetes-master-machine-rhel
provider: azure
based_on: kubernetes-master-machine
specification:
  storage_image_reference:
    publisher: RedHat
    offer: RHEL
    sku: 7-RAW
    version: "7.7.2019090418"
---
kind: infrastructure/virtual-machine
name: kubernetes-node-machine-rhel
provider: azure
based_on: kubernetes-node-machine
specification:
  storage_image_reference:
    publisher: RedHat
    offer: RHEL
    sku: 7-RAW
    version: "7.7.2019090418"
---
kind: infrastructure/virtual-machine
name: logging-machine-rhel
provider: azure
based_on: logging-machine
specification:
  storage_image_reference:
    publisher: RedHat
    offer: RHEL
    sku: 7-RAW
    version: "7.7.2019090418"
---
kind: infrastructure/virtual-machine
name: monitoring-machine-rhel
provider: azure
based_on: monitoring-machine
specification:
  storage_image_reference:
    publisher: RedHat
    offer: RHEL
    sku: 7-RAW
    version: "7.7.2019090418"
---
kind: infrastructure/virtual-machine
name: postgresql-machine-rhel
provider: azure
based_on: postgresql-machine
specification:
  storage_image_reference:
    publisher: RedHat
    offer: RHEL
    sku: 7-RAW
    version: "7.7.2019090418"
---
kind: infrastructure/virtual-machine
name: lb-machine-rhel
provider: azure
based_on: load-balancer-machine
specification:
  storage_image_reference:
    publisher: RedHat
    offer: RHEL
    sku: 7-RAW
    version: "7.7.2019090418"
---
kind: infrastructure/virtual-machine
name: rabbitmq-machine-rhel
provider: azure
based_on: rabbitmq-machine
specification:
  storage_image_reference:
    publisher: RedHat
    offer: RHEL
    sku: 7-RAW
    version: "7.7.2019090418" 
---
kind: infrastructure/virtual-machine
name: ignite-machine-rhel
provider: azure
based_on: ignite-machine
specification:
  storage_image_reference:
    publisher: RedHat
    offer: RHEL
    sku: 7-RAW
    version: "7.7.2019090418" 
---
kind: infrastructure/virtual-machine
name: opendistro-machine-rhel
provider: azure
based_on: logging-machine
specification:
  storage_image_reference:
    publisher: RedHat
    offer: RHEL
    sku: 7-RAW
    version: "7.7.2019090418" 
---
kind: configuration/kubernetes-master
name: default
provider: azure
specification:
  advanced:
    networking:
      plugin: flannel
    certificates:
      location: /etc/kubernetes/pki
      expiration_days: 24855
      renew: true
---
kind: configuration/postgresql
name: default
provider: azure
specification:
  replication:
    enable: true
    user: postgresql-replication-user
    password: postgresql-replication-password
    max_wal_senders: 5
    wal_keep_segments: 32
  additional_components:
    pgbouncer:
      enabled: yes
  extensions:
    pgaudit:
      enabled: yes
title: Postgresql
---        
kind: configuration/rabbitmq
title: "RabbitMQ"
provider: azure
name: default
specification:
  version: 3.7.10
  rabbitmq_user: rabbitmq
  rabbitmq_group: rabbitmq
  logrotate_period: weekly
  logrotate_number: 10
  ulimit_open_files: 65535
  amqp_port: 5672
  rabbitmq_use_longname: true
  rabbitmq_policies: []
  rabbitmq_plugins:
    - rabbitmq_management_agent
    - rabbitmq_management
  custom_configurations: []
  cluster:
    is_clustered: true

OS (please complete the following information):

  • OS: [RHEL 7.7]
  storage_image_reference:
    publisher: RedHat
    offer: RHEL
    sku: 7-RAW
    version: "7.7.2019090418" 

Cloud Environment (please complete the following information):

  • Cloud Provider [MS Azure]

Additional context

2020-11-03T17:13:11.2734160Z [38;21m17:13:11 INFO cli.engine.ansible.AnsibleCommand - TASK [filebeat : Install filebeat package] *************************************
2020-11-03T17:13:15.9248547Z [38;21m17:13:15 INFO cli.engine.ansible.AnsibleCommand - changed: [ci-devofazurrhelflannel-kubernetes-node-vm-2]
2020-11-03T17:13:16.0344446Z [38;21m17:13:16 INFO cli.engine.ansible.AnsibleCommand - changed: [ci-devofazurrhelflannel-kubernetes-node-vm-1]
2020-11-03T17:13:16.2696101Z [38;21m17:13:16 INFO cli.engine.ansible.AnsibleCommand - changed: [ci-devofazurrhelflannel-kubernetes-node-vm-0]
2020-11-03T17:13:16.3005881Z [38;21m17:13:16 INFO cli.engine.ansible.AnsibleCommand - changed: [ci-devofazurrhelflannel-logging-vm-1]
2020-11-03T17:13:16.5105481Z [38;21m17:13:16 INFO cli.engine.ansible.AnsibleCommand - changed: [ci-devofazurrhelflannel-kubernetes-master-vm-0]
2020-11-03T17:13:20.3192256Z [38;21m17:13:20 INFO cli.engine.ansible.AnsibleCommand - changed: [ci-devofazurrhelflannel-logging-vm-0]
2020-11-03T17:13:20.3454136Z [38;21m17:13:20 INFO cli.engine.ansible.AnsibleCommand - changed: [ci-devofazurrhelflannel-monitoring-vm-0]
2020-11-03T17:13:20.8867028Z [38;21m17:13:20 INFO cli.engine.ansible.AnsibleCommand - changed: [ci-devofazurrhelflannel-kafka-vm-1]
2020-11-03T17:13:21.4149429Z [38;21m17:13:21 INFO cli.engine.ansible.AnsibleCommand - changed: [ci-devofazurrhelflannel-postgresql-vm-1]
2020-11-03T17:13:23.1440597Z [38;21m17:13:23 INFO cli.engine.ansible.AnsibleCommand - changed: [ci-devofazurrhelflannel-kafka-vm-0]
2020-11-03T17:13:24.1600243Z [38;21m17:13:24 INFO cli.engine.ansible.AnsibleCommand - changed: [ci-devofazurrhelflannel-load-balancer-vm-0]
2020-11-03T17:13:24.4720026Z [38;21m17:13:24 INFO cli.engine.ansible.AnsibleCommand - changed: [ci-devofazurrhelflannel-postgresql-vm-0]
2020-11-03T17:13:25.0109931Z [38;21m17:13:25 INFO cli.engine.ansible.AnsibleCommand - changed: [ci-devofazurrhelflannel-rabbitmq-vm-0]
2020-11-03T17:13:25.1123580Z [38;21m17:13:25 INFO cli.engine.ansible.AnsibleCommand - changed: [ci-devofazurrhelflannel-rabbitmq-vm-1]
2020-11-03T17:13:26.8644658Z [38;21m17:13:26 INFO cli.engine.ansible.AnsibleCommand - changed: [ci-devofazurrhelflannel-ignite-vm-0]
2020-11-03T17:13:28.4782499Z [38;21m17:13:28 INFO cli.engine.ansible.AnsibleCommand - changed: [ci-devofazurrhelflannel-opendistro-for-elasticsearch-vm-0]
2020-11-03T17:13:31.6700345Z [38;21m17:13:31 INFO cli.engine.ansible.AnsibleCommand - changed: [ci-devofazurrhelflannel-repository-vm-0]
2020-11-03T17:13:55.0207717Z [31;21m17:13:55 ERROR cli.engine.ansible.AnsibleCommand - fatal: [ci-devofazurrhelflannel-ignite-vm-1]: FAILED! => {"changed": false, "msg": "yum lockfile is held by another process"}
2020-11-03T17:13:55.9104887Z [31;21m17:13:55 ERROR cli.engine.ansible.AnsibleCommand - fatal: [ci-devofazurrhelflannel-opendistro-for-elasticsearch-vm-1]: FAILED! => {"changed": false, "msg": "yum lockfile is held by another process"}

Different installation:

2020-11-03T20:07:45.9938552Z 38;21m20:07:45 INFO cli.engine.ansible.AnsibleCommand - TASK [common : Install RedHat family packages] *********************************
2020-11-03T20:08:00.3987629Z 38;21m20:08:00 INFO cli.engine.ansible.AnsibleCommand - changed: [ci-05offazurrhelcanal-kubernetes-node-vm-1]
2020-11-03T20:08:01.0151939Z 38;21m20:08:01 INFO cli.engine.ansible.AnsibleCommand - changed: [ci-05offazurrhelcanal-logging-vm-0]
2020-11-03T20:08:01.3886645Z 38;21m20:08:01 INFO cli.engine.ansible.AnsibleCommand - changed: [ci-05offazurrhelcanal-kubernetes-node-vm-0]
2020-11-03T20:08:03.3419859Z 38;21m20:08:03 INFO cli.engine.ansible.AnsibleCommand - changed: [ci-05offazurrhelcanal-kubernetes-node-vm-2]
2020-11-03T20:08:14.2637261Z 38;21m20:08:14 INFO cli.engine.ansible.AnsibleCommand - changed: [ci-05offazurrhelcanal-kafka-vm-0]
2020-11-03T20:08:15.8061648Z 38;21m20:08:15 INFO cli.engine.ansible.AnsibleCommand - changed: [ci-05offazurrhelcanal-logging-vm-1]
2020-11-03T20:08:15.9455542Z 38;21m20:08:15 INFO cli.engine.ansible.AnsibleCommand - changed: [ci-05offazurrhelcanal-monitoring-vm-0]
2020-11-03T20:08:16.3888319Z 38;21m20:08:16 INFO cli.engine.ansible.AnsibleCommand - changed: [ci-05offazurrhelcanal-postgresql-vm-0]
2020-11-03T20:08:29.3615662Z 31;21m20:08:29 ERROR cli.engine.ansible.AnsibleCommand - FAILED - RETRYING: Install RedHat family packages (3 retries left).
2020-11-03T20:08:29.3654830Z 38;21m20:08:29 INFO cli.engine.ansible.AnsibleCommand - changed: [ci-05offazurrhelcanal-postgresql-vm-1]
2020-11-03T20:08:29.4864636Z 38;21m20:08:29 INFO cli.engine.ansible.AnsibleCommand - changed: [ci-05offazurrhelcanal-rabbitmq-vm-1]
2020-11-03T20:08:30.6854416Z 38;21m20:08:30 INFO cli.engine.ansible.AnsibleCommand - changed: [ci-05offazurrhelcanal-load-balancer-vm-0]
2020-11-03T20:08:30.8796880Z 38;21m20:08:30 INFO cli.engine.ansible.AnsibleCommand - changed: [ci-05offazurrhelcanal-rabbitmq-vm-0]
2020-11-03T20:08:43.3762056Z 38;21m20:08:43 INFO cli.engine.ansible.AnsibleCommand - changed: [ci-05offazurrhelcanal-ignite-vm-1]
2020-11-03T20:08:43.6684533Z 38;21m20:08:43 INFO cli.engine.ansible.AnsibleCommand - changed: [ci-05offazurrhelcanal-ignite-vm-0]
2020-11-03T20:08:48.9300708Z 31;21m20:08:48 ERROR cli.engine.ansible.AnsibleCommand - FAILED - RETRYING: Install RedHat family packages (2 retries left).
2020-11-03T20:08:49.6192349Z 38;21m20:08:49 INFO cli.engine.ansible.AnsibleCommand - changed: [ci-05offazurrhelcanal-opendistro-for-elasticsearch-vm-1]
2020-11-03T20:08:50.1129305Z 38;21m20:08:50 INFO cli.engine.ansible.AnsibleCommand - changed: [ci-05offazurrhelcanal-opendistro-for-elasticsearch-vm-0]
2020-11-03T20:09:20.7467690Z 31;21m20:09:20 ERROR cli.engine.ansible.AnsibleCommand - FAILED - RETRYING: Install RedHat family packages (1 retries left).
2020-11-03T20:09:52.5492329Z 31;21m20:09:52 ERROR cli.engine.ansible.AnsibleCommand - fatal: [ci-05offazurrhelcanal-kubernetes-master-vm-0]: FAILED! => {"attempts": 3, "changed": false, "msg": "yum lockfile is held by another process"}

I think people install Epiphany in offline mode as 'on prem' and not in the cloud, but still something is not working here.

@przemyslavic przemyslavic added this to the S20201119 milestone Nov 5, 2020
@sk4zuzu sk4zuzu self-assigned this Nov 17, 2020
@mkyc mkyc modified the milestones: S20201119, S20201203 Nov 19, 2020
@sk4zuzu sk4zuzu changed the title [BUG] Epicli: offline installation fails with error 'yum lockfile is held by another process' (Azure/RHEL) [BUG] offline installation fails with error 'yum lockfile is held by another process' (Azure/RHEL) Nov 30, 2020
@mkyc mkyc removed this from the S20201203 milestone Dec 4, 2020
@mkyc
Copy link
Contributor

mkyc commented Dec 17, 2020

@sk4zuzu can you please write why is it blocked?

@sk4zuzu
Copy link
Contributor

sk4zuzu commented Dec 17, 2020

Well, I created a workaround for this issue (only partial solution) in a PR, but I didn't like what I did. If we'd upgrade ansible to at least 2.9 (we use 2.8) this code would simplify greatly. On top of it this issue is not something critical, I believe it can wait. We can do 2 things: I could revisit latest changes in develop and adjust the PR then we merge it now or we could upgrade ansible and refine it.

@sk4zuzu sk4zuzu added this to the S20210128 milestone Jan 25, 2021
@przemyslavic przemyslavic self-assigned this Jan 28, 2021
@mkyc mkyc modified the milestones: S20210128, S20210211 Jan 28, 2021
@przemyslavic
Copy link
Collaborator Author

przemyslavic commented Jan 29, 2021

✔Offline installation tested. No issues found.

@mkyc mkyc closed this as completed Feb 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants