[BUG] KeyCloak K8s deployments may not be clustered #732

przemyslavic · 2019-11-19T11:25:29Z

Describe the bug
RabbitMQ/KeyCloak deployments do not seem to be clustered. After rescaling with command kubectl scale --replicas=xxx everything works fine. Probably some restart is needed for kubelet service.

To Reproduce
Deploy Epiphany cluster with RabbitMQ/KeyCloak applications configured to be clustered (replicas: 2 or more)
Use flannel CNI plugin

Expected behavior
Applications should be clustered, for example
kubectl logs rabbitmq-cluster-0 -n=queue should contain confirmation that RabbitMQ has N nodes in the cluster.

OS:

OS: RHEL7

Cloud Environment:

Cloud Provider: AWS

Additional context
See log in the comment below: #732 (comment)

The text was updated successfully, but these errors were encountered:

przemyslavic · 2020-04-03T08:31:37Z

queue                  rabbitmq-cluster-0                                                          0/1     CrashLoopBackOff   17         68m
[ec2-user@ec2-35-181-49-111 ~]$ kubectl logs rabbitmq-cluster-0 -n=queue

  ##  ##
  ##  ##      RabbitMQ 3.7.10. Copyright (C) 2007-2018 Pivotal Software, Inc.
  ##########  Licensed under the MPL.  See http://www.rabbitmq.com/
  ######  ##
  ##########  Logs: <stdout>

              Starting broker...
2020-04-03 08:29:00.163 [info] <0.211.0>
 Starting RabbitMQ 3.7.10 on Erlang 21.2.3
 Copyright (C) 2007-2018 Pivotal Software, Inc.
 Licensed under the MPL.  See http://www.rabbitmq.com/
2020-04-03 08:29:00.168 [info] <0.211.0>
 node           : rabbit@10.244.1.2
 home dir       : /var/lib/rabbitmq
 config file(s) : /etc/rabbitmq/rabbitmq.conf
 cookie hash    : LHdDZeqDJjd4t89pwS5zHg==
 log(s)         : <stdout>
 database dir   : /var/lib/rabbitmq/mnesia/rabbit@10.244.1.2
2020-04-03 08:29:01.534 [info] <0.219.0> Memory high watermark set to 1864 MiB (1955366912 bytes) of 3729 MiB (3910733824 bytes) total
2020-04-03 08:29:01.538 [info] <0.221.0> Enabling free disk space monitoring
2020-04-03 08:29:01.538 [info] <0.221.0> Disk free limit set to 50MB
2020-04-03 08:29:01.541 [info] <0.224.0> Limiting to approx 1048476 file handles (943626 sockets)
2020-04-03 08:29:01.541 [info] <0.225.0> FHC read buffering:  OFF
2020-04-03 08:29:01.541 [info] <0.225.0> FHC write buffering: ON
2020-04-03 08:29:01.542 [info] <0.211.0> Node database directory at /var/lib/rabbitmq/mnesia/rabbit@10.244.1.2 is empty. Assuming we need to join an existing cluster or initialise from scratch...
2020-04-03 08:29:01.542 [info] <0.211.0> Configured peer discovery backend: rabbit_peer_discovery_k8s
2020-04-03 08:29:01.542 [info] <0.211.0> Will try to lock with peer discovery backend rabbit_peer_discovery_k8s
2020-04-03 08:29:01.542 [info] <0.211.0> Peer discovery backend does not support locking, falling back to randomized delay
2020-04-03 08:29:01.542 [info] <0.211.0> Peer discovery backend rabbit_peer_discovery_k8s does not support registration, skipping randomized startup delay.
2020-04-03 08:29:09.545 [info] <0.211.0> Failed to get nodes from k8s - {failed_connect,[{to_address,{"kubernetes.default.svc.cluster.local",443}},
                 {inet,[inet],nxdomain}]}
2020-04-03 08:29:09.545 [error] <0.210.0> CRASH REPORT Process <0.210.0> with 0 neighbours exited with reason: no case clause matching {error,"{failed_connect,[{to_address,{\"kubernetes.default.svc.cluster.local\",443}},\n
 {inet,[inet],nxdomain}]}"} in rabbit_mnesia:init_from_config/0 line 164 in application_master:init/4 line 138
2020-04-03 08:29:09.546 [info] <0.43.0> Application rabbit exited with reason: no case clause matching {error,"{failed_connect,[{to_address,{\"kubernetes.default.svc.cluster.local\",443}},\n                 {inet,[inet],nxdomain}]}"} in rabbit_mnesia:init_from_config/0 line 164
{"Kernel pid terminated",application_controller,"{application_start_failure,rabbit,{bad_return,{{rabbit,start,[normal,[]]},{'EXIT',{{case_clause,{error,\"{failed_connect,[{to_address,{\\"kubernetes.default.svc.cluster.local\\",443}},\n
              {inet,[inet],nxdomain}]}\"}},[{rabbit_mnesia,init_from_config,0,[{file,\"src/rabbit_mnesia.erl\"},{line,164}]},{rabbit_mnesia,init_with_lock,3,[{file,\"src/rabbit_mnesia.erl\"},{line,144}]},{rabbit_mnesia,init,0,[{file,\"src/rabbit_mnesia.erl\"},{line,111}]},{rabbit_boot_steps,'-run_step/2-lc$^1/1-1-',1,[{file,\"src/rabbit_boot_steps.erl\"},{line,49}]},{rabbit_boot_steps,run_step,2,[{file,\"src/rabbit_boot_steps.erl\"},{line,49}]},{rabbit_boot_steps,'-run_boot_steps/1-lc$^0/1-0-',1,[{file,\"src/rabbit_boot_steps.erl\"},{line,26}]},{rabbit_boot_steps,run_boot_steps,1,[{file,\"src/rabbit_boot_steps.erl\"},{line,26}]},{rabbit,start,2,[{file,\"src/rabbit.erl\"},{line,815}]}]}}}}}"}
Kernel pid terminated (application_controller) ({application_start_failure,rabbit,{bad_return,{{rabbit,start,[normal,[]]},{'EXIT',{{case_clause,{error,"{failed_connect,[{to_address,{\"kubernetes.defau

Crash dump is being written to: /var/log/rabbitmq/erl_crash.dump...done

mkyc · 2020-07-02T11:30:53Z

@przemyslavic can you update title and description of this bug? Title is saying about AWS/RedHat/flannel and description about RabbitMQ/KeyCloak. Can you describe in points what are you doing (steps to reproduce)?

mkyc · 2020-07-03T08:40:33Z

@to-bar @ar3ndt can you check if that is related to #1395

mkyc · 2020-07-16T09:12:32Z

RabbitMQ part is being handled in #1395

sk4zuzu · 2020-07-24T12:43:42Z

I've been trying to reproduce this issue for Keycloak for 2 days without any luck. We are not sure if this one is a real thing, because nobody recollects seeing it happen.

This is what I've been seeing consitently in logs (for 3 replicas in this case):

12:02:56,955 INFO  [org.infinispan.CLUSTER] (remote-thread--p13-t5) [Context=work] ISPN100010: Finished rebalance with members [as-testauthdb-0, as-testauthdb-1, as-testauthdb-2], topology id 7

We think it's possible it got "added" to the description of the RMQ issue by mistake.

The related RMQ issue #1395 is for sure real, which makes me think it's not really k8s related, but more of a something specific to RMQ itself.

Closing for now.

przemyslavic added the type/bug label Nov 19, 2019

mkyc added this to the 0.7.1 milestone Jul 2, 2020

mkyc added the status/grooming-needed label Jul 2, 2020

to-bar changed the title ~~[apply] AWS/RedHat/flannel - application replicas not talking to each other~~ [BUG] RabbitMQ/KeyCloak K8s deployments may not be clustered Jul 3, 2020

to-bar removed the status/grooming-needed label Jul 3, 2020

toszo changed the title ~~[BUG] RabbitMQ/KeyCloak K8s deployments may not be clustered~~ [BUG] KeyCloak K8s deployments may not be clustered Jul 16, 2020

mkyc modified the milestones: 0.7.1, S20200729 Jul 17, 2020

sk4zuzu self-assigned this Jul 22, 2020

sk4zuzu closed this as completed Jul 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] KeyCloak K8s deployments may not be clustered #732

[BUG] KeyCloak K8s deployments may not be clustered #732

przemyslavic commented Nov 19, 2019 •

edited by to-bar

Loading

przemyslavic commented Apr 3, 2020

mkyc commented Jul 2, 2020

mkyc commented Jul 3, 2020

mkyc commented Jul 16, 2020

sk4zuzu commented Jul 24, 2020

[BUG] KeyCloak K8s deployments may not be clustered #732

[BUG] KeyCloak K8s deployments may not be clustered #732

Comments

przemyslavic commented Nov 19, 2019 • edited by to-bar Loading

przemyslavic commented Apr 3, 2020

mkyc commented Jul 2, 2020

mkyc commented Jul 3, 2020

mkyc commented Jul 16, 2020

sk4zuzu commented Jul 24, 2020

przemyslavic commented Nov 19, 2019 •

edited by to-bar

Loading