Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clustering nodes that are on different servers #116

Open
etranger7 opened this issue Sep 12, 2024 · 5 comments
Open

Clustering nodes that are on different servers #116

etranger7 opened this issue Sep 12, 2024 · 5 comments

Comments

@etranger7
Copy link

etranger7 commented Sep 12, 2024

I'm using this docker image and trying to cluster 2 nodes that are on different servers, therefore 2 different public IPs.
Just for testing, I successfully clustered 2 docker containers that are on the same machine.

However, when I try to define a FQDN in ERLANG_NODE_ARG, I get an error that I don't know how to overcome.

This container starts without errors (I'm skipping unrelated lines):

services:
  ej1container:
    hostname: ej1container          # containername works here too
    environment:
      - ERLANG_NODE_ARG=ej1@ej1container

This setup gives me an error

services:
  ej1container:
    hostname: ej1container          # containername works here too
    environment:
      - ERLANG_NODE_ARG=ej1@subdomain.domain.com

It looks like the container starts normally but when I do

docker exec ej1container ejabberdctl status

I get

Failed RPC connection to the node 'ej1@subdomain.domain.com': nodedown

I already pointed the A record of subdomain.domain.com to the public IP of the VPS where this is running.

There was a similar issue #106 but I don't see how the FQDN was integrated and what the solution was.

Any help would be much appreciated.

@etranger7
Copy link
Author

Update:
While the main node is running on Server A as ej1@ej1container, I tried to add Server B to it to form a cluster and ran into these issues:

  • When I use A FQDN, I get
ej3con  | :> ejabberdctl join_cluster ej1@subdomain.domain.com
ej3con  | 
ej3con  | 21:31:47.574 [error] ** System NOT running to use fully qualified hostnames **
ej3con  | ** Hostname subdomain.domain.com is illegal **
ej3con  | 
ej3con  | Error: error
ej3con  | Error: "This node cannot reach that node."
ej3con  | :> FAILURE in command 'join_cluster ej1@subdomain.domain.com' !!! Stopping ejabberd...
  • When I use an IP instead, I get
ej3con  | :> ejabberdctl join_cluster ej1@xxx.xxx.xxx.xx
ej3con  | 
ej3con  | 20:17:39.761 [error] ** System NOT running to use fully qualified hostnames **
ej3con  | ** Hostname xxx.xxx.xxx.xx is illegal **
ej3con  | 
ej3con  | Error: error
ej3con  | Error: "This node cannot reach that node."
ej3con  | :> FAILURE in command 'join_cluster ej1@xxx.xxx.xxx.xx' !!! Stopping ejabberd...
ej3con  | [os_mon] memory supervisor port (memsup): Erlang has closed

@badlop
Copy link
Member

badlop commented Sep 20, 2024

- ERLANG_NODE_ARG=ej1@subdomain.domain.com

That environment variable is read by the ejabberdctl script, and it is passed to the erl virtual machine as the argument -sname (or -name when the value has subdomains with a dot .). As a result, the erlang virtual machine names itself as ej1@subdomain.domain.com.


docker exec ej1container ejabberdctl status
Failed RPC connection to the node 'ej1@subdomain.domain.com': nodedown

I get that same problem with a similar compose file:

version: '3.7'

services:

  main:
    image: ghcr.io/badlop/ejabberd:dependabot
    container_name: ejabberd
    hostname: ej1container
    environment:
      - ERLANG_NODE_ARG=ejabberd@subdomain.domain.com
      - ERLANG_COOKIE=dummycookie123

The solution in my case is to add subdomain.domain.com to /etc/hosts inside the container. That way ejabberdctl is able to connect correctly to the running node and get the status.


ERLANG_NODE_ARG=ej1@ej1container
ejabberdctl join_cluster ej1@subdomain.domain.com
** System NOT running to use fully qualified hostnames **

Right, you used the erlang short node name ej1container, so you cannot later use a long node name like sub.domains

Either use:

ERLANG_NODE_ARG=ej1@ej1container
ejabberdctl join_cluster ej1@ej1container

If you use this in different machines, make sure the second one knows where to find ej1container (by adding it to /etc/hosts for example)

Or use:

ERLANG_NODE_ARG=ej1@ej1container.domain.com
ejabberdctl join_cluster ej1@ej1container.domain.com

In that case, make sure erlang can know what does ej1container.domain.com point to.

@etranger7
Copy link
Author

Thank you for your reply @badlop .
Here is what worked for me to move past the
"Failed RPC connection to the node 'ej1@subdomain.domain.com': nodedown"
message and get a positive STATUS message. In the docker compose file, I used

services:
  ejabberd:
    image: ejabberd/ecs:24.07
    container_name: ejabberd
    hostname: subdomain.domain.com
    environment:
      - CTL_ON_START=status
      - ERLANG_COOKIE=[removed]
      - ERLANG_NODE_ARG=ejabberd@subdomain.domain.com

However, when I try to connect to ejabberd@subdomain.domain.com that's on Server A, from Server B, I get

Error: error
Error: "This node cannot reach that node."

When I

docker exec ejabberd bin/ejabberdctl ping ejabberd@subdomain.domain.com

from Server B, I get pang.

When I ping Server A from Server B, I can reach it with no issues.

When I

docker exec -u root ejabberd ping subdomain.domain.com

from server B to Server A, again Server A is reachable.

I feel like I'm missing something here.
Again, your help is much appreciated.

@etranger7
Copy link
Author

Hi @badlop , should I re-submit this issue under the issues of https://github.com/processone/ejabberd/ ?
I'm wondering whether that's being more closely monitored and whether the issues with the containers should also be submitted there.
Thanks.

@badlop
Copy link
Member

badlop commented Oct 10, 2024

This is a problem with that container image, so here seems a good place for the issue.

On the other hand, it may be a problem related to docker and erlang clustering, not only ejabberd, and you may search for related questions outside of ejabberd places.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants