service: service.Start() can block forever #382

ilgooz · 2018-08-22T06:32:46Z

Similar to #381.

If dependencies from same service listens from the same port, service.Start() will block forever because it waits for status RUNNING for dependency containers. However, N-1 of them will never be in RUNNING state because ports collides.

Also any other misconfiguration on Service or service file (mesg.yml) can cause the same issue.

antho1404 · 2018-08-22T09:30:47Z

In this case one of the task created by the docker service will never trigger the task and so we will never get the error from that. We need to find a way to get the errors between the creation of the service (that is fine) and before the creation of the task

ilgooz · 2018-08-22T09:34:41Z

future note: add a test afterwards to service/start_test.go

func TestX(t *testing.T) {
    service := &Service{
        Name: "TestServiceDependenciesListensFromSamePort",
        Dependencies: map[string]*Dependency{
            "test": {
                Image: "nginx",
                Ports: []string{"80"},
            },
            "test2": {
                Image: "nginx",
                Ports: []string{"80"},
            },
        },
    }
    fmt.Println(service.Start()) // currently blocks forever
}

NicolasMahe · 2018-11-27T09:58:00Z

Let's check if the error is still here

krhubert · 2018-11-27T17:09:32Z

The pbm stills exist

NicolasMahe · 2018-11-28T11:04:09Z

So,
The service that fail to start has one task stuck in "new" status.

What about adding a timeout on container.waitForStatus function?

Otherwise, we could add a more sophisticated system that consider a task as error if it didn't change status for let's say 1min.

ilgooz · 2018-11-28T11:25:41Z

Maybe service.Start() should return with an error after first task exited with 1. But keep in mind that this is a very strict rule. Because it requires services to be more logicfull for example a service should not exit with 1 if it cannot connect to its dependency database. It should retry instead. Since docker services are started without an order this might happen.

In future we may give more flexibility to users by allowing a config like restart=always for services but for now let's go strict and remove any margin to errors. If we adopt restart=always, it's users' responsibility to check if services are running correctly so we shouldn't block on Start() like we do now.

NicolasMahe · 2018-11-28T11:29:55Z

I really think we should keep the restart always. It's a really nice feature for the service to restart if any error happen.
We have a pretty good error handling so far. But for this issue, docker have a weird behavior.

NicolasMahe · 2018-11-28T11:38:54Z

Actually, I have a solution:
This problem occurs only when the 2 docker services are start at the same time. If the services are start one by one, then the second return an error. Eg:

package service

import (
	"fmt"
	"testing"
	"github.com/mesg-foundation/core/container"
)

func TestX(t *testing.T) {
	c, _ := container.New()
	service, _ := FromService(&Service{
		Name: "TestServiceDependenciesListensFromSamePort1",
		Dependencies: []*Dependency{
			{
				Key:   "1",
				Image: "nginx",
				Ports: []string{"80"},
			},
		},
	}, ContainerOption(c))
	service2, _ := FromService(&Service{
		Name: "TestServiceDependenciesListensFromSamePort2",
		Dependencies: []*Dependency{
			{
				Key:   "1",
				Image: "nginx",
				Ports: []string{"80"},
			},
		},
	}, ContainerOption(c))

	fmt.Println(service.Start()) // start correctly
	fmt.Println(service2.Start()) // return an error
}

Error returned:

[] Error response from daemon: rpc error: code = InvalidArgument desc = port '80' is already in use by service 'core-dc12e74f476172dd9a69948259e8e44666c0b12d-1' (wjqankzlrsggpvi28dp1hhqyj) as an ingress port

So the fix is to start the docker service one by one.
We could also add an optional order parameter in the mesg.yml to specify in which order the services need to be start (in some case it could be: main service, then first dependency, etc... In so other case it could be: first dependency and then main service)

What do you think @mesg-foundation/core ?

ilgooz · 2018-11-28T11:39:53Z

Maybe we should make an exception for ports that are in use by handling this error in our side so we can return an error on Start(). Note that colliding ports may happen both in the private network of a MESG service and public ports that mapped to host machine.

NicolasMahe · 2018-11-28T11:41:47Z

Maybe we should make an exception for ports that are in use by handling this error in our side so we can return an error on Start(). Note that colliding ports may happen both in the private network of a MESG service and public ports that mapped to host machine.

Port collision can only happen on the ingress and host network because they share the same IP.
On any other network, each service/container has different IPs, so they can use the same port without collision.

krhubert · 2018-11-28T13:32:53Z

I like the idea to start service one after another. What we must do as well, is to create an issue in docker repo and wait for the fix, then switch to parallel service deploy

ilgooz added the bug Something isn't working label Aug 22, 2018

ilgooz changed the title ~~service. service.Start() can block forever~~ service: service.Start() can block forever Aug 22, 2018

ilgooz added the high priority label Aug 22, 2018

antho1404 removed the high priority label Aug 27, 2018

NicolasMahe added this to the v0.5.0 milestone Nov 27, 2018

NicolasMahe assigned NicolasMahe and krhubert and unassigned NicolasMahe Nov 27, 2018

krhubert mentioned this issue Nov 28, 2018

Start service dependencies one by one #598

Merged

NicolasMahe mentioned this issue Nov 29, 2018

Start docker services in parallel #599

Closed

NicolasMahe closed this as completed in #598 Nov 29, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

service: service.Start() can block forever #382

service: service.Start() can block forever #382

ilgooz commented Aug 22, 2018 •

edited

Loading

antho1404 commented Aug 22, 2018

ilgooz commented Aug 22, 2018

NicolasMahe commented Nov 27, 2018

krhubert commented Nov 27, 2018

NicolasMahe commented Nov 28, 2018

ilgooz commented Nov 28, 2018 •

edited

Loading

NicolasMahe commented Nov 28, 2018

NicolasMahe commented Nov 28, 2018 •

edited

Loading

ilgooz commented Nov 28, 2018

NicolasMahe commented Nov 28, 2018 •

edited

Loading

krhubert commented Nov 28, 2018

service: service.Start() can block forever #382

service: service.Start() can block forever #382

Comments

ilgooz commented Aug 22, 2018 • edited Loading

antho1404 commented Aug 22, 2018

ilgooz commented Aug 22, 2018

NicolasMahe commented Nov 27, 2018

krhubert commented Nov 27, 2018

NicolasMahe commented Nov 28, 2018

ilgooz commented Nov 28, 2018 • edited Loading

NicolasMahe commented Nov 28, 2018

NicolasMahe commented Nov 28, 2018 • edited Loading

ilgooz commented Nov 28, 2018

NicolasMahe commented Nov 28, 2018 • edited Loading

krhubert commented Nov 28, 2018

ilgooz commented Aug 22, 2018 •

edited

Loading

ilgooz commented Nov 28, 2018 •

edited

Loading

NicolasMahe commented Nov 28, 2018 •

edited

Loading

NicolasMahe commented Nov 28, 2018 •

edited

Loading