Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refresh virtual machine storage information spamming #316

Closed
MarcusCaepio opened this issue Jan 12, 2022 · 12 comments
Closed

Refresh virtual machine storage information spamming #316

MarcusCaepio opened this issue Jan 12, 2022 · 12 comments
Assignees
Milestone

Comments

@MarcusCaepio
Copy link

Hi all,
basically, I have the same problem as described in #296
Sorry that I have to open a new issue for this, but I got no response in the original post.
Our admin team complains about the lot and often entries about "Refresh virtual machine storage information". We have a lot of VMs and these messages hinder their work so much, that they want to disable the module. Is there any way to reduce this?

Cheers,
Marcus

@MarcusCaepio MarcusCaepio changed the title Refresh virtual machine storage information Refresh virtual machine storage information spamming Jan 12, 2022
@slalomsk8er
Copy link
Contributor

I also got some negative comments from my vCenter admins and the log spam has hindered acceptance in the past.

@Thomas-Gelf
Copy link
Contributor

Are you running your vSphereDB daemon with --debug or --warning? This message is being logged with severity info, which shouldn't be logged unless at least --verbose has been requested.

@MarcusCaepio
Copy link
Author

Indeed, it was running with --debug and --trace (my bad).
I changed my systemd file again back to:

vi /etc/systemd/system/icinga-vspheredb.service
[Unit]
Description=Icinga vSphereDB Daemon
Documentation=https://icinga.com/docs/icinga-module-for-vsphere/latest/
Wants=network.target

[Service]
Type=notify
EnvironmentFile=-/etc/default/icinga-vspheredb
EnvironmentFile=-/etc/sysconfig/icinga-vspheredb
ExecStart=/usr/bin/icingacli vspheredb daemon run
; For debugging purposes:
; ExecStart=/usr/bin/icingacli vspheredb daemon run --trace --debug
User=icingavspheredb

NotifyAccess=main
WatchdogSec=30

Restart=on-failure
RestartSec=30

[Install]
WantedBy=multi-user.target

and reloaded + restarted the daemon.
But the messages still appear.

@Thomas-Gelf
Copy link
Contributor

Please share your (anonymized) log lines after the restart

@MarcusCaepio
Copy link
Author

Sure, but it's not much (using journalctl -u icinga-vspheredb). The last 2 lines repeat every few minutes:

Jan 13 14:55:59 icingamaster.domain.com systemd[1]: Stopping Icinga vSphereDB Daemon...
Jan 13 14:55:59 icingamaster.domain.com icinga-vspheredb[51062]: Shutting down
Jan 13 14:55:59 icingamaster.domain.com icinga-vspheredb[51062]: [localdb] database connection has been closed
Jan 13 14:55:59 icingamaster.domain.com systemd[1]: icinga-vspheredb.service: Succeeded.
Jan 13 14:55:59 icingamaster.domain.com systemd[1]: Stopped Icinga vSphereDB Daemon.
Jan 13 14:55:59 icingamaster.domain.com systemd[1]: Starting Icinga vSphereDB Daemon...
Jan 13 14:55:59 icingamaster.domain.com systemd[1]: Started Icinga vSphereDB Daemon.
Jan 13 14:55:59 icingamaster.domain.com icinga-vspheredb[58020]: [configwatch] DB configuration loaded
Jan 13 14:55:59 icingamaster.domain.com icinga-vspheredb[58020]: [db] sending DB config to child process
Jan 13 14:55:59 icingamaster.domain.com icinga-vspheredb[58020]: [db] Running DB cleanup (this could take some time)
Jan 13 14:55:59 icingamaster.domain.com icinga-vspheredb[58020]: [db] DB has been cleaned up
Jan 13 14:55:59 icingamaster.domain.com icinga-vspheredb[58020]: [localdb] reconnecting
Jan 13 14:55:59 icingamaster.domain.com icinga-vspheredb[58020]: [api] launching server 4: vCenterId=2: https://xxx@vcenter1.domain.com
Jan 13 14:55:59 icingamaster.domain.com icinga-vspheredb[58020]: [api] launching server 2: vCenterId=1: https://xxx@vcenter2.domain.com
Jan 13 14:56:00 icingamaster.domain.com icinga-vspheredb[58020]: [api vcenter2.domain.com (id=4)] Storage Pods: Got no returnval / RetrieveResult
Jan 13 14:56:00 icingamaster.domain.com icinga-vspheredb[58020]: [api vcenter1.domain.com (id=2)] Storage Pods: Got no returnval / RetrieveResult

@Thomas-Gelf
Copy link
Contributor

So the spam is gone? You mentioned "Refresh virtual machine storage information": seems that it's no longer there (as it shouldn't be, w/o verbose/debug). The error regarding Storage Pods (in case there is no such) is known, please follow #310 for related updates. In case there is no spam apart from this, I'd close this issue.

@MarcusCaepio
Copy link
Author

No the spam is still there. It's not spamming in the vspheredb daemon log, but in vSphere itself under recent tasks. As same as described in #296. vSphere shows this message for every VM every few minutes, what hides tasks running by an admin.

@Thomas-Gelf
Copy link
Contributor

Now I got it. It's not the vspheredb log you're complaining about, but the task list in your vCenter, as we're calling RefreshStorageInfo() for outdated VMs.

We call this only on VMs where we discover outdated information. On some systems this information is always recent, on others it isn't. We trigger RefreshStorageInfo() only for VMs where it is obviously outdated - based on the timestamp we get from the vCenter.

Problem is: if we do not call this method, the storage usage information we fetch might become completely outdated - and therefore useless.

Workaround for you:

  • comment $this->refreshOutdatedDatastores(); in ObjectSync.php (in the scheduleTasks() method)
  • better: raise the interval in $vms = VmDatastoreUsageSyncStore::fetchOutdatedVms($this->vCenter, 900, 30);, same file, line 180. 900 (seconds) is the interval, VMs refreshed more than 15 minutes ago are considered outdated. Raise this to 7200 to see if this helps.

Both workarounds will result in partially outdated information, this affects the "VM disk on datastore" usage, not the disk usage shipped via guest utilities. Outdated VMs will look as follows:

outdated_datastore

If your vCenter admins have an idea on how to get this information refreshed on a regular interval, this would be very helpful. The required data lives in VirtualMachine.storage, RefreshStorageInfo() is the suggested refresh method.

@Thomas-Gelf
Copy link
Contributor

Possible alternative: calling RefreshDatastoreStorageInfo, and checking Datastore.Datastore.Info for freshness for each Datastore, instead of refreshing single VMs.

According to the documentation, this "Refreshes all storage related information including free-space, capacity, and detailed usage of virtual machines. Updated values are available in summary and info."

  • Pros: less requests/tasks/spam
  • Cons: probably more work for the vCenter, when there are only some outdated VMs on a datastore (but that's a blind guess, by the end of the day, all of them need to be fresh)

@MarcusCaepio
Copy link
Author

The only information I got so far, was sth. about the PropertyCollector. But I guess, this is already used? Calling RefreshDatastoreStorageInfo sounds like a good idea imho. When there is only 1 task running instead of 1 for each VM, it would help a lot.

@MarcusCaepio
Copy link
Author

Hi @Thomas-Gelf
is there any new information in this? I did your recommended changes for the interval, but it did not help. We had to disable the service and the module for now

@Thomas-Gelf Thomas-Gelf added this to the v1.3.0 milestone Apr 6, 2022
@Thomas-Gelf Thomas-Gelf self-assigned this Apr 6, 2022
@Thomas-Gelf
Copy link
Contributor

I investigated some real-world large installations, and while I still haven't been able to figure out WHY disk usage information tends to be so old (and provides refresh methods), at least the systems I investigated have been able to refresh their information at least once every two hours.

For now I'll tune this down to 6 hours, which is IHMO terribly old - but better than nothing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants