Docker compose fails to start service that use option network_mode: service. Cannot join network of a non running container #6329

abilous-ti · 2018-11-07T11:04:26Z

Description of the issue

Currently I have a single docker-compose.yml file which contains 3 services, two services depends on 3rd service
"depends_on": ["service3"],
"network_mode": "service:service3",
"restart": "always"

If I restart the OS, sometimes 1st and 2nd services do not start and s status of the containers is
Exited (255)

docker inspect command show error:
"ExitCode": 255,
"Error": "cannot join network of a non running container: cd8b33ccf2a0cc7b84d06302d641676a05f7a2fbba2cf72f77a8d339e29fd76f",

Context information (for bug reports)

Output of docker-compose version

1.22.0

Output of docker version

17.12.0-ce

Output of docker-compose config
(Make sure to add the relevant -f and other flags)

services:
  service1:
    container_name: service1
    depends_on:
    - service3
    image: repository/service1
    network_mode: service:service3
    restart: always
    volumes:
    - /folder/device:/folder/device:rw
    - /var/run/docker.sock:/var/run/docker.sock:rw
    - /etc/logrotate.conf:/etc/logrotate.conf:rw
    - /etc/crontab:/etc/crontab:rw
  service3:
    cap_add:
    - NET_ADMIN
    command: --config /vpn/openvpn.conf --auth-nocache
    container_name: service3
    devices:
    - /dev/net/tun
    image: repository/service3
    network_mode: bridge
    restart: always
    volumes:
    - /folder/vpn:/vpn:rw
  service2:
    command: folder
    container_name: service2
    depends_on:
    - service3
    image: repository/service2
    network_mode: service:service3
    restart: always
    volumes:
    - /folder/device/persistent_storage:/folder/device/persistent_storage:rw
    - /folder/device/qos:/folder/device/qos:rw
version: '3.3'

Steps to reproduce the issue

docker-compose -f docker-compose.yml up -d
sudo reboot

Observed result

service1 and service2 failed to start
"ExitCode": 255,
"Error": "cannot join network of a non running container: cd8b33ccf2a0cc7b84d06302d641676a05f7a2fbba2cf72f77a8d339e29fd76f",

Expected result

service1 and service2 must be running after reboot and must be started after service3

Stacktrace / full error message

"ExitCode": 255,
            "Error": "cannot join network of a non running container: cd8b33ccf2a0cc7b84d06302d641676a05f7a2fbba2cf72f77a8d339e29fd76f",

Additional information

OS version - Ubuntu 16.04

The text was updated successfully, but these errors were encountered:

shin- · 2018-11-07T22:45:33Z

Hi @abilous-ti

Is there any reason you've set up your networking in such a way? Using a user-defined network would generally make for a more robust architecture, getting rid of the issue you're seeing and significantly simplifying your setup.

abilous-ti · 2018-11-08T06:20:46Z

Hi @shin-
service3 is container with OpenVPN client, running inside the container.
service1 and service2 must be connected to OpenVPN network directly.
Installing OpenVPN client inside service1 and service2 is overhead. In production we are having 10+ containers and all them must be connected to container with OpenVPN client.
Thanks

asi6611622 · 2019-01-02T08:19:42Z

I had same issues as @abilous-ti ,hope you to find out the answer

swrap · 2019-01-09T21:32:26Z

STOPGAP MEASURE

I have this same issue! I found a stopgap measure until this issue is resolved.

You can add a HEALTHCHECK to monitor whether or not service3 is still running and restart the other containers if service3 has been restarted and disconnected. I realize that this will NOT work in all cases where restarting the other containers will cause other issues.

I did not test with sudo reboot but I tested with restarting a container that has a similar openvpn connection setup.

Example:

version: '3'

services:
  service3:
    container_name: service3
    image: service3/image
    ports:
      - 9000:9000

  service2:
    container_name: service2
    depends_on:
      - service3
    healthcheck:
# put some type of test below to verify if this container is healthy or not by trying to access something in service3 that should be running
      test: ["CMD", "curl", "-f", "http://localhost:9000"] 
      interval: 1m30s
      timeout: 2s
      retries: 1

  autoheal:
    restart: always
    image: willfarrell/autoheal
    environment:
      - AUTOHEAL_CONTAINER_LABEL=all
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock

Some references:

Restarting an unhealthy container: https://stackoverflow.com/questions/47088261/restarting-an-unhealthy-docker-container-based-on-healthcheck
Image github repo: https://github.com/willfarrell/docker-autoheal

stale · 2019-10-09T20:56:05Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale · 2019-10-16T21:03:22Z

This issue has been automatically closed because it had not recent activity during the stale period.

stephen304 · 2023-04-20T02:41:34Z

I'm having a similar issue. I use a wireguard container to connect to a VPS, and rely on network_mode: "service:wgcontainer" to attach a traefik container which can listen on the wireguard interface, and receive reverse proxy traffic from the VPS. I could connect wireguard to traefik using a network and then add routing configuration to let the VPS reach traefik through the tunnel or add another reverse proxy to a chain of reverse proxies, but this way is simpler and more compact. Despite using restart: unless-stopped or always, the startup race when rebooting leaves traefik stopped until manual intervention (docker restart traefik).

Edit: Weirdly it seems that setting the wireguard container to restart:always instead of unless-stopped seems to fix the traefik container starting back up correctly after reboot. Before I had wireguard as unless-stopped and traefik as restart:always. I rebooted several times and it seems to work fine now unless I'm just getting lucky - every time I set wireguard back to unless-stopped, the wireguard container successfully starts after reboot but the restart:always traefik remains down.

zwimer · 2024-05-28T14:37:46Z

I still have this issue as well using a similar setup, where my serivceX container uses network_mode: service:ovpn; when the daemon restarts serviceX container sometimes fails and does not restart despite all containers being marked as restart: always. I wonder if it tries to use the old ovpn container's network before that one is removed during the daemon boot or something, so the new ovpn container is using a different network than the serivceX manages to grab initially before it is deleted or something?

shin- added kind/question area/networking labels Nov 7, 2018

DavHau mentioned this issue Apr 1, 2019

Restart/Reconnect containers connected via 'network_mode: service' automatically when main service is restarted #6626

Closed

stale bot added the stale label Oct 9, 2019

stale bot closed this as completed Oct 16, 2019

robflate mentioned this issue Dec 10, 2020

network_mode and routing traffic through vpn anandslab/docker-traefik#35

Closed

rakbladsvalsen mentioned this issue Sep 25, 2021

Bug: Connectivity is lost once gluetun container is restarted qdm12/gluetun#641

Open

zwimer mentioned this issue May 31, 2024

[BUG] Reopen: #6329 : Docker compose fails to start service that use option network_mode: service. Cannot join network of a non running container #6329 #11872

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docker compose fails to start service that use option network_mode: service. Cannot join network of a non running container #6329

Docker compose fails to start service that use option network_mode: service. Cannot join network of a non running container #6329

abilous-ti commented Nov 7, 2018 •

edited

Loading

shin- commented Nov 7, 2018

abilous-ti commented Nov 8, 2018

asi6611622 commented Jan 2, 2019

swrap commented Jan 9, 2019 •

edited

Loading

stale bot commented Oct 9, 2019

stale bot commented Oct 16, 2019

stephen304 commented Apr 20, 2023 •

edited

Loading

zwimer commented May 28, 2024 •

edited

Loading

Docker compose fails to start service that use option network_mode: service. Cannot join network of a non running container #6329

Docker compose fails to start service that use option network_mode: service. Cannot join network of a non running container #6329

Comments

abilous-ti commented Nov 7, 2018 • edited Loading

Description of the issue

Context information (for bug reports)

Steps to reproduce the issue

Observed result

Expected result

Stacktrace / full error message

Additional information

shin- commented Nov 7, 2018

abilous-ti commented Nov 8, 2018

asi6611622 commented Jan 2, 2019

swrap commented Jan 9, 2019 • edited Loading

STOPGAP MEASURE

Some references:

stale bot commented Oct 9, 2019

stale bot commented Oct 16, 2019

stephen304 commented Apr 20, 2023 • edited Loading

zwimer commented May 28, 2024 • edited Loading

abilous-ti commented Nov 7, 2018 •

edited

Loading

swrap commented Jan 9, 2019 •

edited

Loading

stephen304 commented Apr 20, 2023 •

edited

Loading

zwimer commented May 28, 2024 •

edited

Loading