Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker compose fails to start service that use option network_mode: service. Cannot join network of a non running container #6329

Closed
abilous-ti opened this issue Nov 7, 2018 · 8 comments

Comments

@abilous-ti
Copy link

abilous-ti commented Nov 7, 2018

Description of the issue

Currently I have a single docker-compose.yml file which contains 3 services, two services depends on 3rd service
"depends_on": ["service3"],
"network_mode": "service:service3",
"restart": "always"

If I restart the OS, sometimes 1st and 2nd services do not start and s status of the containers is
Exited (255)

docker inspect command show error:
"ExitCode": 255,
"Error": "cannot join network of a non running container: cd8b33ccf2a0cc7b84d06302d641676a05f7a2fbba2cf72f77a8d339e29fd76f",

Context information (for bug reports)

Output of docker-compose version

1.22.0

Output of docker version

17.12.0-ce

Output of docker-compose config
(Make sure to add the relevant -f and other flags)

services:
  service1:
    container_name: service1
    depends_on:
    - service3
    image: repository/service1
    network_mode: service:service3
    restart: always
    volumes:
    - /folder/device:/folder/device:rw
    - /var/run/docker.sock:/var/run/docker.sock:rw
    - /etc/logrotate.conf:/etc/logrotate.conf:rw
    - /etc/crontab:/etc/crontab:rw
  service3:
    cap_add:
    - NET_ADMIN
    command: --config /vpn/openvpn.conf --auth-nocache
    container_name: service3
    devices:
    - /dev/net/tun
    image: repository/service3
    network_mode: bridge
    restart: always
    volumes:
    - /folder/vpn:/vpn:rw
  service2:
    command: folder
    container_name: service2
    depends_on:
    - service3
    image: repository/service2
    network_mode: service:service3
    restart: always
    volumes:
    - /folder/device/persistent_storage:/folder/device/persistent_storage:rw
    - /folder/device/qos:/folder/device/qos:rw
version: '3.3'

Steps to reproduce the issue

  1. docker-compose -f docker-compose.yml up -d
  2. sudo reboot

Observed result

service1 and service2 failed to start
"ExitCode": 255,
"Error": "cannot join network of a non running container: cd8b33ccf2a0cc7b84d06302d641676a05f7a2fbba2cf72f77a8d339e29fd76f",

Expected result

service1 and service2 must be running after reboot and must be started after service3

Stacktrace / full error message

"ExitCode": 255,
            "Error": "cannot join network of a non running container: cd8b33ccf2a0cc7b84d06302d641676a05f7a2fbba2cf72f77a8d339e29fd76f",

Additional information

OS version - Ubuntu 16.04

@shin-
Copy link

shin- commented Nov 7, 2018

Hi @abilous-ti

Is there any reason you've set up your networking in such a way? Using a user-defined network would generally make for a more robust architecture, getting rid of the issue you're seeing and significantly simplifying your setup.

@abilous-ti
Copy link
Author

Hi @shin-
service3 is container with OpenVPN client, running inside the container.
service1 and service2 must be connected to OpenVPN network directly.
Installing OpenVPN client inside service1 and service2 is overhead. In production we are having 10+ containers and all them must be connected to container with OpenVPN client.
Thanks

@asi6611622
Copy link

I had same issues as @abilous-ti ,hope you to find out the answer

@swrap
Copy link

swrap commented Jan 9, 2019

STOPGAP MEASURE

I have this same issue! I found a stopgap measure until this issue is resolved.

You can add a HEALTHCHECK to monitor whether or not service3 is still running and restart the other containers if service3 has been restarted and disconnected. I realize that this will NOT work in all cases where restarting the other containers will cause other issues.

I did not test with sudo reboot but I tested with restarting a container that has a similar openvpn connection setup.

Example:

version: '3'

services:
  service3:
    container_name: service3
    image: service3/image
    ports:
      - 9000:9000

  service2:
    container_name: service2
    depends_on:
      - service3
    healthcheck:
# put some type of test below to verify if this container is healthy or not by trying to access something in service3 that should be running
      test: ["CMD", "curl", "-f", "http://localhost:9000"] 
      interval: 1m30s
      timeout: 2s
      retries: 1

  autoheal:
    restart: always
    image: willfarrell/autoheal
    environment:
      - AUTOHEAL_CONTAINER_LABEL=all
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock

Some references:

Restarting an unhealthy container: https://stackoverflow.com/questions/47088261/restarting-an-unhealthy-docker-container-based-on-healthcheck
Image github repo: https://github.com/willfarrell/docker-autoheal

@stale
Copy link

stale bot commented Oct 9, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Oct 9, 2019
@stale
Copy link

stale bot commented Oct 16, 2019

This issue has been automatically closed because it had not recent activity during the stale period.

@stephen304
Copy link

stephen304 commented Apr 20, 2023

I'm having a similar issue. I use a wireguard container to connect to a VPS, and rely on network_mode: "service:wgcontainer" to attach a traefik container which can listen on the wireguard interface, and receive reverse proxy traffic from the VPS. I could connect wireguard to traefik using a network and then add routing configuration to let the VPS reach traefik through the tunnel or add another reverse proxy to a chain of reverse proxies, but this way is simpler and more compact. Despite using restart: unless-stopped or always, the startup race when rebooting leaves traefik stopped until manual intervention (docker restart traefik).

Edit: Weirdly it seems that setting the wireguard container to restart:always instead of unless-stopped seems to fix the traefik container starting back up correctly after reboot. Before I had wireguard as unless-stopped and traefik as restart:always. I rebooted several times and it seems to work fine now unless I'm just getting lucky - every time I set wireguard back to unless-stopped, the wireguard container successfully starts after reboot but the restart:always traefik remains down.

@zwimer
Copy link

zwimer commented May 28, 2024

I still have this issue as well using a similar setup, where my serivceX container uses network_mode: service:ovpn; when the daemon restarts serviceX container sometimes fails and does not restart despite all containers being marked as restart: always. I wonder if it tries to use the old ovpn container's network before that one is removed during the daemon boot or something, so the new ovpn container is using a different network than the serivceX manages to grab initially before it is deleted or something?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants