Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

service: service.Start() can block forever #382

Closed
ilgooz opened this issue Aug 22, 2018 · 11 comments · Fixed by #598
Closed

service: service.Start() can block forever #382

ilgooz opened this issue Aug 22, 2018 · 11 comments · Fixed by #598
Assignees
Labels
bug Something isn't working
Milestone

Comments

@ilgooz
Copy link
Contributor

ilgooz commented Aug 22, 2018

Similar to #381.

If dependencies from same service listens from the same port, service.Start() will block forever because it waits for status RUNNING for dependency containers. However, N-1 of them will never be in RUNNING state because ports collides.

Also any other misconfiguration on Service or service file (mesg.yml) can cause the same issue.

@ilgooz ilgooz added the bug Something isn't working label Aug 22, 2018
@ilgooz ilgooz changed the title service. service.Start() can block forever service: service.Start() can block forever Aug 22, 2018
@antho1404
Copy link
Member

In this case one of the task created by the docker service will never trigger the task and so we will never get the error from that. We need to find a way to get the errors between the creation of the service (that is fine) and before the creation of the task

@ilgooz
Copy link
Contributor Author

ilgooz commented Aug 22, 2018

future note: add a test afterwards to service/start_test.go

func TestX(t *testing.T) {
    service := &Service{
        Name: "TestServiceDependenciesListensFromSamePort",
        Dependencies: map[string]*Dependency{
            "test": {
                Image: "nginx",
                Ports: []string{"80"},
            },
            "test2": {
                Image: "nginx",
                Ports: []string{"80"},
            },
        },
    }
    fmt.Println(service.Start()) // currently blocks forever
}

@NicolasMahe NicolasMahe added this to the v0.5.0 milestone Nov 27, 2018
@NicolasMahe
Copy link
Member

Let's check if the error is still here

@krhubert
Copy link
Contributor

The pbm stills exist

@NicolasMahe
Copy link
Member

So,
The service that fail to start has one task stuck in "new" status.

What about adding a timeout on container.waitForStatus function?

Otherwise, we could add a more sophisticated system that consider a task as error if it didn't change status for let's say 1min.

@ilgooz
Copy link
Contributor Author

ilgooz commented Nov 28, 2018

Maybe service.Start() should return with an error after first task exited with 1. But keep in mind that this is a very strict rule. Because it requires services to be more logicfull for example a service should not exit with 1 if it cannot connect to its dependency database. It should retry instead. Since docker services are started without an order this might happen.

In future we may give more flexibility to users by allowing a config like restart=always for services but for now let's go strict and remove any margin to errors. If we adopt restart=always, it's users' responsibility to check if services are running correctly so we shouldn't block on Start() like we do now.

@NicolasMahe
Copy link
Member

I really think we should keep the restart always. It's a really nice feature for the service to restart if any error happen.
We have a pretty good error handling so far. But for this issue, docker have a weird behavior.

@NicolasMahe
Copy link
Member

NicolasMahe commented Nov 28, 2018

Actually, I have a solution:
This problem occurs only when the 2 docker services are start at the same time. If the services are start one by one, then the second return an error. Eg:

package service

import (
	"fmt"
	"testing"
	"github.com/mesg-foundation/core/container"
)

func TestX(t *testing.T) {
	c, _ := container.New()
	service, _ := FromService(&Service{
		Name: "TestServiceDependenciesListensFromSamePort1",
		Dependencies: []*Dependency{
			{
				Key:   "1",
				Image: "nginx",
				Ports: []string{"80"},
			},
		},
	}, ContainerOption(c))
	service2, _ := FromService(&Service{
		Name: "TestServiceDependenciesListensFromSamePort2",
		Dependencies: []*Dependency{
			{
				Key:   "1",
				Image: "nginx",
				Ports: []string{"80"},
			},
		},
	}, ContainerOption(c))

	fmt.Println(service.Start()) // start correctly
	fmt.Println(service2.Start()) // return an error
}

Error returned:

[] Error response from daemon: rpc error: code = InvalidArgument desc = port '80' is already in use by service 'core-dc12e74f476172dd9a69948259e8e44666c0b12d-1' (wjqankzlrsggpvi28dp1hhqyj) as an ingress port

So the fix is to start the docker service one by one.
We could also add an optional order parameter in the mesg.yml to specify in which order the services need to be start (in some case it could be: main service, then first dependency, etc... In so other case it could be: first dependency and then main service)

What do you think @mesg-foundation/core ?

@ilgooz
Copy link
Contributor Author

ilgooz commented Nov 28, 2018

Maybe we should make an exception for ports that are in use by handling this error in our side so we can return an error on Start(). Note that colliding ports may happen both in the private network of a MESG service and public ports that mapped to host machine.

@NicolasMahe
Copy link
Member

NicolasMahe commented Nov 28, 2018

Maybe we should make an exception for ports that are in use by handling this error in our side so we can return an error on Start(). Note that colliding ports may happen both in the private network of a MESG service and public ports that mapped to host machine.

Port collision can only happen on the ingress and host network because they share the same IP.
On any other network, each service/container has different IPs, so they can use the same port without collision.

@krhubert
Copy link
Contributor

I like the idea to start service one after another. What we must do as well, is to create an issue in docker repo and wait for the fix, then switch to parallel service deploy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants