Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] KeyCloak K8s deployments may not be clustered #732

Closed
przemyslavic opened this issue Nov 19, 2019 · 5 comments
Closed

[BUG] KeyCloak K8s deployments may not be clustered #732

przemyslavic opened this issue Nov 19, 2019 · 5 comments
Assignees
Labels
Milestone

Comments

@przemyslavic
Copy link
Collaborator

przemyslavic commented Nov 19, 2019

Describe the bug
RabbitMQ/KeyCloak deployments do not seem to be clustered. After rescaling with command kubectl scale --replicas=xxx everything works fine. Probably some restart is needed for kubelet service.

To Reproduce
Deploy Epiphany cluster with RabbitMQ/KeyCloak applications configured to be clustered (replicas: 2 or more)
Use flannel CNI plugin

Expected behavior
Applications should be clustered, for example
kubectl logs rabbitmq-cluster-0 -n=queue should contain confirmation that RabbitMQ has N nodes in the cluster.

OS:

  • OS: RHEL7

Cloud Environment:

  • Cloud Provider: AWS

Additional context
See log in the comment below: #732 (comment)

@przemyslavic
Copy link
Collaborator Author

queue                  rabbitmq-cluster-0                                                          0/1     CrashLoopBackOff   17         68m
[ec2-user@ec2-35-181-49-111 ~]$ kubectl logs rabbitmq-cluster-0 -n=queue

  ##  ##
  ##  ##      RabbitMQ 3.7.10. Copyright (C) 2007-2018 Pivotal Software, Inc.
  ##########  Licensed under the MPL.  See http://www.rabbitmq.com/
  ######  ##
  ##########  Logs: <stdout>

              Starting broker...
2020-04-03 08:29:00.163 [info] <0.211.0>
 Starting RabbitMQ 3.7.10 on Erlang 21.2.3
 Copyright (C) 2007-2018 Pivotal Software, Inc.
 Licensed under the MPL.  See http://www.rabbitmq.com/
2020-04-03 08:29:00.168 [info] <0.211.0>
 node           : rabbit@10.244.1.2
 home dir       : /var/lib/rabbitmq
 config file(s) : /etc/rabbitmq/rabbitmq.conf
 cookie hash    : LHdDZeqDJjd4t89pwS5zHg==
 log(s)         : <stdout>
 database dir   : /var/lib/rabbitmq/mnesia/rabbit@10.244.1.2
2020-04-03 08:29:01.534 [info] <0.219.0> Memory high watermark set to 1864 MiB (1955366912 bytes) of 3729 MiB (3910733824 bytes) total
2020-04-03 08:29:01.538 [info] <0.221.0> Enabling free disk space monitoring
2020-04-03 08:29:01.538 [info] <0.221.0> Disk free limit set to 50MB
2020-04-03 08:29:01.541 [info] <0.224.0> Limiting to approx 1048476 file handles (943626 sockets)
2020-04-03 08:29:01.541 [info] <0.225.0> FHC read buffering:  OFF
2020-04-03 08:29:01.541 [info] <0.225.0> FHC write buffering: ON
2020-04-03 08:29:01.542 [info] <0.211.0> Node database directory at /var/lib/rabbitmq/mnesia/rabbit@10.244.1.2 is empty. Assuming we need to join an existing cluster or initialise from scratch...
2020-04-03 08:29:01.542 [info] <0.211.0> Configured peer discovery backend: rabbit_peer_discovery_k8s
2020-04-03 08:29:01.542 [info] <0.211.0> Will try to lock with peer discovery backend rabbit_peer_discovery_k8s
2020-04-03 08:29:01.542 [info] <0.211.0> Peer discovery backend does not support locking, falling back to randomized delay
2020-04-03 08:29:01.542 [info] <0.211.0> Peer discovery backend rabbit_peer_discovery_k8s does not support registration, skipping randomized startup delay.
2020-04-03 08:29:09.545 [info] <0.211.0> Failed to get nodes from k8s - {failed_connect,[{to_address,{"kubernetes.default.svc.cluster.local",443}},
                 {inet,[inet],nxdomain}]}
2020-04-03 08:29:09.545 [error] <0.210.0> CRASH REPORT Process <0.210.0> with 0 neighbours exited with reason: no case clause matching {error,"{failed_connect,[{to_address,{\"kubernetes.default.svc.cluster.local\",443}},\n
 {inet,[inet],nxdomain}]}"} in rabbit_mnesia:init_from_config/0 line 164 in application_master:init/4 line 138
2020-04-03 08:29:09.546 [info] <0.43.0> Application rabbit exited with reason: no case clause matching {error,"{failed_connect,[{to_address,{\"kubernetes.default.svc.cluster.local\",443}},\n                 {inet,[inet],nxdomain}]}"} in rabbit_mnesia:init_from_config/0 line 164
{"Kernel pid terminated",application_controller,"{application_start_failure,rabbit,{bad_return,{{rabbit,start,[normal,[]]},{'EXIT',{{case_clause,{error,\"{failed_connect,[{to_address,{\\"kubernetes.default.svc.cluster.local\\",443}},\n
              {inet,[inet],nxdomain}]}\"}},[{rabbit_mnesia,init_from_config,0,[{file,\"src/rabbit_mnesia.erl\"},{line,164}]},{rabbit_mnesia,init_with_lock,3,[{file,\"src/rabbit_mnesia.erl\"},{line,144}]},{rabbit_mnesia,init,0,[{file,\"src/rabbit_mnesia.erl\"},{line,111}]},{rabbit_boot_steps,'-run_step/2-lc$^1/1-1-',1,[{file,\"src/rabbit_boot_steps.erl\"},{line,49}]},{rabbit_boot_steps,run_step,2,[{file,\"src/rabbit_boot_steps.erl\"},{line,49}]},{rabbit_boot_steps,'-run_boot_steps/1-lc$^0/1-0-',1,[{file,\"src/rabbit_boot_steps.erl\"},{line,26}]},{rabbit_boot_steps,run_boot_steps,1,[{file,\"src/rabbit_boot_steps.erl\"},{line,26}]},{rabbit,start,2,[{file,\"src/rabbit.erl\"},{line,815}]}]}}}}}"}
Kernel pid terminated (application_controller) ({application_start_failure,rabbit,{bad_return,{{rabbit,start,[normal,[]]},{'EXIT',{{case_clause,{error,"{failed_connect,[{to_address,{\"kubernetes.defau

Crash dump is being written to: /var/log/rabbitmq/erl_crash.dump...done

@mkyc mkyc added this to the 0.7.1 milestone Jul 2, 2020
@mkyc
Copy link
Contributor

mkyc commented Jul 2, 2020

@przemyslavic can you update title and description of this bug? Title is saying about AWS/RedHat/flannel and description about RabbitMQ/KeyCloak. Can you describe in points what are you doing (steps to reproduce)?

@to-bar to-bar changed the title [apply] AWS/RedHat/flannel - application replicas not talking to each other [BUG] RabbitMQ/KeyCloak K8s deployments may not be clustered Jul 3, 2020
@mkyc
Copy link
Contributor

mkyc commented Jul 3, 2020

@to-bar @ar3ndt can you check if that is related to #1395

@toszo toszo changed the title [BUG] RabbitMQ/KeyCloak K8s deployments may not be clustered [BUG] KeyCloak K8s deployments may not be clustered Jul 16, 2020
@mkyc
Copy link
Contributor

mkyc commented Jul 16, 2020

RabbitMQ part is being handled in #1395

@mkyc mkyc modified the milestones: 0.7.1, S20200729 Jul 17, 2020
@sk4zuzu sk4zuzu self-assigned this Jul 22, 2020
@sk4zuzu
Copy link
Contributor

sk4zuzu commented Jul 24, 2020

I've been trying to reproduce this issue for Keycloak for 2 days without any luck. We are not sure if this one is a real thing, because nobody recollects seeing it happen.

This is what I've been seeing consitently in logs (for 3 replicas in this case):

12:02:56,955 INFO  [org.infinispan.CLUSTER] (remote-thread--p13-t5) [Context=work] ISPN100010: Finished rebalance with members [as-testauthdb-0, as-testauthdb-1, as-testauthdb-2], topology id 7

We think it's possible it got "added" to the description of the RMQ issue by mistake.

The related RMQ issue #1395 is for sure real, which makes me think it's not really k8s related, but more of a something specific to RMQ itself.

Closing for now.

@sk4zuzu sk4zuzu closed this as completed Jul 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants