Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Salt-Master Public IP Change #61482

Closed
1 of 6 tasks
Rus-sky opened this issue Jan 18, 2022 · 7 comments
Closed
1 of 6 tasks

[BUG] Salt-Master Public IP Change #61482

Rus-sky opened this issue Jan 18, 2022 · 7 comments
Assignees
Labels
Bug broken, incorrect, or confusing behavior Confirmed Salt engineer has confirmed bug/feature - often including a MCVE doc-rework confusing, misleading, or wrong

Comments

@Rus-sky
Copy link

Rus-sky commented Jan 18, 2022

Description
A clear and concise description of what the bug is.

Given:
Salt-master behind a firewall with Dynamic IP address. Ports 4505 and 4506 are forwarded to the Salt Stack VM (CentOS 8).

Salt-minions that point to the FQDN salt.someprovider.com.

When the public IP of the firewall changes, changes the DNS entry. After that change, the DNS on the salt-minions gets updated automatically. Even after the update, the minion does not communicate to the master until the service is restarted.

Setup
(Please provide relevant configs and/or SLS files (be sure to remove sensitive info. There is no general set-up of Salt.)

Please be as specific as possible and give set-up details.

  • on-prem machine
  • VM (CentOS8 on Proxmox (KVM) ) for salt-master
  • VM running on a cloud service, please be explicit and add details
  • container (Kubernetes, Docker, containerd, etc. please specify)
  • or a combination, please be explicit
  • jails if it is FreeBSD

for salt-minions it's VMs in ESXi 6.7 (CentOS 7)

Steps to Reproduce the behavior
(Include debug logs if possible and relevant)

Change the public IP of the Salt Mater and try running any command.

Expected behavior
A clear and concise description of what you expected to happen.

Expectation is that after the DNS entry gets updated and propagated to all minions to restore "communication" to the master.

Screenshots
If applicable, add screenshots to help explain your problem.

Example:

[root@localhost ~]# salt -v 'overcast*' pkg.version overcast
Executing job with jid 20220118191518213349
-------------------------------------------
overcast-shephardsbeachresort.hotelwifi.com:
    0.2.4-1
overcast-parkwayflorida.hotelwifi.com:
    3.12.0-1
overcast-shoshonecondominiumshotel.hotelwifi.com:
    4.4.0-1
overcast-mercurerockhampton.hotelwifi.com:
    4.4.0-1
overcast-southgatemotel.hotelwifi.com:
    3.12.0-1
overcast-therownyc.hotelwifi.com:
    4.4.0-1
overcast-hotelmetromilwaukee.hotelwifi.com:
    Minion did not return. [No response]
    The minions may not have all finished running and any remaining minions will return upon completion. To look up the return data for this job later, run the following command:

    salt-run jobs.lookup_jid 20220118191518213349
overcast-dylansfo.hotelwifi.com:
    Minion did not return. [No response]
    The minions may not have all finished running and any remaining minions will return upon completion. To look up the return data for this job later, run the following command:

    salt-run jobs.lookup_jid 20220118191518213349
overcast-reefoceanresort.hotelwifi.com:
    Minion did not return. [No response]
    The minions may not have all finished running and any remaining minions will return upon completion. To look up the return data for this job later, run the following command:

    salt-run jobs.lookup_jid 20220118191518213349

Versions Report

salt --versions-report (Provided by running salt --versions-report. Please also mention any differences in master/minion versions.)
[root@localhost ~]# salt --versions-report
Salt Version:
          Salt: 3004

Dependency Versions:
          cffi: Not Installed
      cherrypy: Not Installed
      dateutil: 2.6.1
     docker-py: Not Installed
         gitdb: Not Installed
     gitpython: Not Installed
        Jinja2: 2.10.1
       libgit2: Not Installed
      M2Crypto: 0.35.2
          Mako: Not Installed
       msgpack: 0.6.2
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     pycparser: Not Installed
      pycrypto: Not Installed
  pycryptodome: Not Installed
        pygit2: Not Installed
        Python: 3.6.8 (default, Oct 19 2021, 05:14:06)
  python-gnupg: Not Installed
        PyYAML: 3.12
         PyZMQ: 19.0.0
         smmap: Not Installed
       timelib: Not Installed
       Tornado: 4.5.3
           ZMQ: 4.3.4

System Versions:
          dist: centos 8
        locale: UTF-8
       machine: x86_64
       release: 4.18.0-348.el8.x86_64
        system: Linux
       version: CentOS Stream 8

Additional context
Add any other context about the problem here.

@Rus-sky Rus-sky added Bug broken, incorrect, or confusing behavior needs-triage labels Jan 18, 2022
@welcome
Copy link

welcome bot commented Jan 18, 2022

Hi there! Welcome to the Salt Community! Thank you for making your first contribution. We have a lengthy process for issues and PRs. Someone from the Core Team will follow up as soon as possible. In the meantime, here’s some information that may help as you continue your Salt journey.
Please be sure to review our Code of Conduct. Also, check out some of our community resources including:

There are lots of ways to get involved in our community. Every month, there are around a dozen opportunities to meet with other contributors and the Salt Core team and collaborate in real time. The best way to keep track is by subscribing to the Salt Community Events Calendar.
If you have additional questions, email us at saltproject@vmware.com. We’re glad you’ve joined our community and look forward to doing awesome things with you!

@dmurphy18
Copy link
Contributor

Could be seeing a new version of the occurrence of dns caching not getting updated with glibc, see #21397. Setting up test environment to duplicate the issue

@dmurphy18 dmurphy18 added Confirmed Salt engineer has confirmed bug/feature - often including a MCVE and removed needs-triage labels Feb 2, 2022
@dmurphy18 dmurphy18 added this to the Phosphorus v3005.0 milestone Feb 2, 2022
@dmurphy18
Copy link
Contributor

dmurphy18 commented Feb 2, 2022

Confirmed this occurring in a test environment with Master, Minion and separate DNS server on another VM.
The minion VM sees the new address of the master just fine 10.10.10.7 -> 10.10.10.9

[root@testlab-tc8 david]# ping -c2 testlab-tc8mstr
PING testlab-tc8mstr.testlab (10.10.10.9) 56(84) bytes of data.
64 bytes from 10.10.10.9 (10.10.10.9): icmp_seq=1 ttl=64 time=1.42 ms
64 bytes from 10.10.10.9 (10.10.10.9): icmp_seq=2 ttl=64 time=1.05 ms

--- testlab-tc8mstr.testlab ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 1.054/1.238/1.423/0.187 ms

But after the master's ip address change, minion is not seen

  1 [DEBUG   ] Reading configuration from /etc/salt/master
  2 [DEBUG   ] Missing configuration file: /root/.saltrc
  3 [DEBUG   ] Using importlib_metadata to load entry points
  4 [TRACE   ] The required configuration section, 'fluent_handler', was not found the in the configuration. Not loading the fluent logging handlers module.
  5 [TRACE   ] None of the required configuration sections, 'logstash_udp_handler' and 'logstash_zmq_handler', were found in the configuration. Not loading the Logstash logging handlers module.
  6 [DEBUG   ] Override  __grains__: <module 'salt.loaded.int.log_handlers.sentry_mod' from '/usr/lib/python3.6/site-packages/salt/log/handlers/sentry_mod.py'>
  7 [DEBUG   ] Configuration file path: /etc/salt/master
  8 [WARNING ] Insecure logging configuration detected! Sensitive data may be logged.
  9 [DEBUG   ] Reading configuration from /etc/salt/master
 10 [DEBUG   ] Missing configuration file: /root/.saltrc
 11 [DEBUG   ] MasterEvent PUB socket URI: /var/run/salt/master/master_event_pub.ipc
 12 [DEBUG   ] MasterEvent PULL socket URI: /var/run/salt/master/master_event_pull.ipc
 13 [DEBUG   ] Connecting the Minion to the Master URI (for the return server): tcp://127.0.0.1:4506
 14 [DEBUG   ] Trying to connect to: tcp://127.0.0.1:4506
 15 [TRACE   ] IPCClient: Connecting to socket: /var/run/salt/master/master_event_pub.ipc
 16 [DEBUG   ] Closing AsyncZeroMQReqChannel instance
 17 [TRACE   ] func get_cli_event_returns()
 18 [DEBUG   ] LazyLoaded local_cache.get_load
 19 [DEBUG   ] Reading minion list from /var/cache/salt/master/jobs/8c/f84e83e03d2e886508ffddbaea94f6b565746ac02efc2e29201db6c46fb474/.minions.p
 20 [DEBUG   ] get_iter_returns for jid 20220202185237542771 sent to {'testlab-tc8'} will timeout at 11:52:42.558478
 21 [TRACE   ] Get event. tag: salt/job/20220202185237542771
 22 [TRACE   ] _get_event() waited 0 seconds and received nothing
 23 [TRACE   ] Get event. tag: salt/job/20220202185237542771
 24 [TRACE   ] get_event() received = {'data': {'jid': '20220202185237542771', 'tgt_type': 'glob', 'tgt': 'testlab-tc8', 'user': 'sudo_david', 'fun': 'test.versions', 'arg': [], 'minions': ['testlab-tc8'], 'missing': [], '_stam    p': '2022-02-02T18:52:37.545127'}, 'tag': 'salt/job/20220202185237542771/new'}
 25 [TRACE   ] Get event. tag: salt/job/20220202185237542771
 26 [TRACE   ] _get_event() waited 0 seconds and received nothing
 27 [TRACE   ] Get event. tag: salt/job/20220202185237542771
 28 [TRACE   ] _get_event() waited 0 seconds and received nothing
.
.
.
[TRACE   ] data = {'testlab-tc8': 'Minion did not return. [No response]\nThe minions may not have all finished running and any remaining minions will return upon completion. To look up the return data for this job later, run the following command:\n\nsalt-run jobs.lookup_jid 20220202185237542771'}
[DEBUG   ] Closing IPCMessageSubscriber instance
ERROR: Minions returned with non-zero exit code
testlab-tc8:
    Minion did not return. [No response]
    The minions may not have all finished running and any remaining minions will return upon completion. To look up the return data for this job later, run the following command:
    
    salt-run jobs.lookup_jid 20220202185237542771
[root@testlab-tc8mstr david]#

If the minion is restarted, then it picks up the new master IP Address and communicates fine. Sample section of minion's configuration file

# Set the location of the salt master server. If the master server cannot be
# resolved, then the minion will fail to start.
#master: salt
master: testlab-tc8mstr
id: testlab-tc8

@dmurphy18 dmurphy18 added the Regression The issue is a bug that breaks functionality known to work in previous releases. label Feb 2, 2022
@dmurphy18
Copy link
Contributor

dmurphy18 commented Feb 3, 2022

It appears that this has been broken since Salt 2015.5 (Py 2.7 tested on Centos 7).
Wonder if this has not been seen till now, given how VM's are brought up and down so quickly these days compared to Servers of old, which could be up for years (they probably had static IP Addresses too)

@dmurphy18 dmurphy18 added doc-rework confusing, misleading, or wrong and removed Regression The issue is a bug that breaks functionality known to work in previous releases. labels Feb 3, 2022
@dmurphy18
Copy link
Contributor

dmurphy18 commented Feb 3, 2022

Problem is in-correct documentation for the minion's configuration file, typically /etc/salt/minion
see

salt/conf/minion

Lines 265 to 280 in 2d29d45

# If authentication fails due to SaltReqTimeoutError during a ping_interval,
# cause sub minion process to restart.
#auth_safemode: False
# Ping Master to ensure connection is alive (minutes).
#ping_interval: 0
# To auto recover minions if master changes IP address (DDNS)
# auth_tries: 10
# auth_safemode: False
# ping_interval: 2
#
# Minions won't know master is missing until a ping fails. After the ping fail,
# the minion will attempt authentication and likely fails out and cause a restart.
# When the minion restarts it will resolve the masters IP and attempt to reconnect.

The correct setting for auth_safemode is True, the file and documentation will be updated, that is

david@david-laptop2:~/devcode/dgm_salt_master/salt/conf$ git diff --staged minion
diff --git a/conf/minion b/conf/minion
index b9cc26147d..ba20c8e905 100644
--- a/conf/minion
+++ b/conf/minion
@@ -271,7 +271,7 @@
 
 # To auto recover minions if master changes IP address (DDNS)
 #    auth_tries: 10
-#    auth_safemode: False
+#    auth_safemode: True
 #    ping_interval: 2
 #
 # Minions won't know master is missing until a ping fails. After the ping fail,
david@david-laptop2:~/devcode/dgm_salt_master/salt/conf$ 

In order for the salt-minion to recognize Master IP changes, the minions configuration's file will need at least the following entries:

ping_interval:
auth_safemode: True

@dmurphy18
Copy link
Contributor

Note the documentation mistake is from 55e38a9
which is Salt 2014.7, that is, 8 years old.

@dmurphy18
Copy link
Contributor

Fix in PR #61577

dwoz added a commit to dwoz/salt that referenced this issue Apr 23, 2024
Check for a chainging dns record anytime a minion gets disconnected from
it's master. See github issue saltstack#63654 saltstack#61482.
dwoz added a commit that referenced this issue Apr 25, 2024
Check for a chainging dns record anytime a minion gets disconnected from
it's master. See github issue #63654 #61482.
tacerus pushed a commit to tacerus/salt that referenced this issue Jan 16, 2025
* Minions check dns when re-connecting to a master

Check for a chainging dns record anytime a minion gets disconnected from
it's master. See github issue saltstack#63654 saltstack#61482.

* Regression tests for dns defined masters

Adding tests to validate we check for changing dns anytime we're
disconnected from the currently connected master

* Update docs for master dns changes

Update docs to use master_alive_interval to detect master ip changes via
DNS.

* Remove comment which is not true anymore

* Make minion reconnecting on changing master IP

with zeromq transport

* Don't create schedule for alive if no master_alive_interval

* Skip the tests if running with non-root user

* Skip if unable to set additional IP address

* Set master_tries to -1 for minions

* Fix the tests

---------

Co-authored-by: Daniel A. Wozniak <daniel.wozniak@broadcom.com>
agraul pushed a commit to agraul/salt that referenced this issue Jan 27, 2025
* Minions check dns when re-connecting to a master

Check for a chainging dns record anytime a minion gets disconnected from
it's master. See github issue saltstack#63654 saltstack#61482.

* Regression tests for dns defined masters

Adding tests to validate we check for changing dns anytime we're
disconnected from the currently connected master

* Update docs for master dns changes

Update docs to use master_alive_interval to detect master ip changes via
DNS.

* Remove comment which is not true anymore

* Make minion reconnecting on changing master IP

with zeromq transport

* Don't create schedule for alive if no master_alive_interval

* Skip the tests if running with non-root user

* Skip if unable to set additional IP address

* Set master_tries to -1 for minions

* Fix the tests

---------

Co-authored-by: Daniel A. Wozniak <daniel.wozniak@broadcom.com>

BACKPORT-UPSTREAM=saltstack#66422
BACKPORT-UPSTREAM=saltstack#66757
BACKPORT-UPSTREAM=saltstack#66760
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior Confirmed Salt engineer has confirmed bug/feature - often including a MCVE doc-rework confusing, misleading, or wrong
Projects
None yet
Development

No branches or pull requests

2 participants