Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ups.status: WAIT roughly every 24 hours with CyberPower rack UPS #1689

Closed
adams-family opened this issue Oct 28, 2022 · 18 comments · Fixed by #2722
Closed

ups.status: WAIT roughly every 24 hours with CyberPower rack UPS #1689

adams-family opened this issue Oct 28, 2022 · 18 comments · Fixed by #2722
Labels
CyberPower (CPS) impacts-release-2.7.4 Issues reported against NUT release 2.7.4 (maybe vanilla or with minor packaging tweaks) Incorrect or missing readings On some devices driver-reported values are systemically off (e.g. x10, x0.1, const+Value, etc.) Linux Some issues are specific to Linux as a platform need testing Code looks reasonable, but the feature would better be tested against hardware or OSes raspberry USB
Milestone

Comments

@adams-family
Copy link

I'm running nut on Raspbian with a CyberPower OR600ERM1U rack UPS connected via USB.

After each reboot, nut is working properly for roughly 24 hours, reporting of battery level, runtime, etc. is working fine. I tested it while charging, discharging, floating. However, after a couple of hours (usually about a day) nut reports UPS status changed from OL to WAIT and since that time I am unable to read any information from nut until I completely restart my Raspberry Pi.

Environment:

$ uname -a
Linux raspberrypi 5.15.32+ #1538 Thu Mar 31 19:37:58 BST 2022 armv6l GNU/Linux

$ upsd -V
Network UPS Tools upsd 2.7.4

$ upsc -V
Network UPS Tools upscmd 2.7.4

This is how it looks like after I receive a notification from my NodeRED that ups.status has changed to WAIT:

$ upsc cyberpower-rack@localhost
Init SSL without certificate database
Error: Data stale

$ /etc/init.d/nut-client restart
Restarting nut-client (via systemctl): nut-client.serviceJob for nut-monitor.service failed because the control process exited with error code.
See "systemctl status nut-monitor.service" and "journalctl -xe" for details.
 failed!

$ /etc/init.d/nut-server restart
Restarting nut-server (via systemctl): nut-server.service.

$ upsc cyberpower-rack@localhost
Init SSL without certificate database
ups.status: WAIT

After this I have to reboot my Raspberry Pi and after a full reboot everything starts working:

$ reboot

[...]

$ upsc cyberpower-rack@localhost
Init SSL without certificate database
battery.charge: 100
battery.charge.low: 10
battery.charge.warning: 20
battery.mfr.date: CPS
battery.runtime: 2150
battery.runtime.low: 300
battery.type: PbAcid
battery.voltage: 13.5
battery.voltage.nominal: 12
device.mfr: CPS
device.model: OR600ERM1U
device.serial: GA6MQ******** (stripped)
device.type: ups
driver.name: usbhid-ups
driver.parameter.pollfreq: 30
driver.parameter.pollinterval: 2
driver.parameter.port: auto
driver.parameter.synchronous: no
driver.version: 2.7.4
driver.version.data: CyberPower HID 0.4
driver.version.internal: 0.41
input.voltage: 231.0
input.voltage.nominal: 230
output.voltage: 263.0
ups.beeper.status: disabled
ups.delay.shutdown: 20
ups.delay.start: 30
ups.load: 27
ups.mfr: CPS
ups.model: OR600ERM1U
ups.productid: 0601
ups.realpower.nominal: 360
ups.serial: GA6MQ******** (stripped)
ups.status: OL
ups.test.result: No test initiated
ups.timer.shutdown: -60
ups.timer.start: -60
ups.vendorid: 0764

This happens every day, not just a single occasion. Any ideas what can be wrong?

@jimklimov
Copy link
Member

Output voltage looks odd too at 260V vs. 230V input, though may well be a wrong mapping or reality of your location.

CPS devices were mentioned in NUT issues quite a few times with various quirks, as well as Raspberries. Not sure if that just reflects their popularity or HW/FW problems, though. Still, makes sense to peruse tagged issues to see if anything rings a bell.

I think there was a discussion about eventual USB port resets (possibly firmware reboots) that might be seen in dmesg, causing the OS to find the USB HID device again and default udev to grab it, so the NUT drivers could not reconnect when they tried softly. There were changes in current master (maybe after 2.8.0 release) to let it try harder.

FWIW, beside Pi reboot - does restarting nut-driver service help unwedge it - did you try? If not, might be the Pi USB stack got stuck...

@adams-family
Copy link
Author

@jimklimov Thanks for your insights.

I did try to restart nut-server and nut-client, but it did not help (one of them threw an error message above). This is what you meant by "does restarting nut-driver service help unwedge it"?

There is nothing in dmesg for that time.

There are, however, error messages in /var/syslog starting at the time when the NUT driver failed. Although, they are not really useful:

Oct 31 04:41:15 raspberrypi upsd[366]: Data for UPS [cyberpower-rack] is stale - check driver
Oct 31 05:08:30 raspberrypi upsd[366]: Send ping to UPS [cyberpower-rack] failed: Resource temporarily unavailable
Oct 31 05:08:32 raspberrypi upsd[366]: Connected to UPS [cyberpower-rack]: usbhid-ups-cyberpower-rack
Oct 31 05:08:49 raspberrypi upsd[366]: Data for UPS [cyberpower-rack] is stale - check driver
Oct 31 05:35:58 raspberrypi upsd[366]: Send ping to UPS [cyberpower-rack] failed: Resource temporarily unavailable
Oct 31 05:36:00 raspberrypi upsd[366]: Connected to UPS [cyberpower-rack]: usbhid-ups-cyberpower-rack
Oct 31 05:36:17 raspberrypi upsd[366]: Data for UPS [cyberpower-rack] is stale - check driver
Oct 31 06:03:26 raspberrypi upsd[366]: Send ping to UPS [cyberpower-rack] failed: Resource temporarily unavailable
Oct 31 06:03:28 raspberrypi upsd[366]: Connected to UPS [cyberpower-rack]: usbhid-ups-cyberpower-rack
Oct 31 06:03:45 raspberrypi upsd[366]: Data for UPS [cyberpower-rack] is stale - check driver

I'm trying this workaround right now, although it's really just shooting into the dark:
https://raspberrypi.stackexchange.com/questions/66611/nut-cyberpower-data-stale

@jimklimov jimklimov added CyberPower (CPS) raspberry Linux Some issues are specific to Linux as a platform USB labels Nov 1, 2022
@jimklimov
Copy link
Member

I suppose "resource unavailable" and a few seconds later "connected to" mean that the driver tried to restart. Not sure however why it took roughly half an hour after "stale" status...

@jimklimov
Copy link
Member

jimklimov commented Nov 1, 2022

@adams-family
Copy link
Author

Probably off-topic here but out of curiosity I checked the output voltage 1) on the UPS device and 2) with a multimeter and both show a correct 234V - 235V reading. Therefore 263.0V reported by nut is currently incorrect.

I think that I found another GitHub topic on that: #439

image

@jimklimov
Copy link
Member

With a 2.7.4 build running, yes that issue may be it.

Can you try building the current NUT master branch to run the driver? Might suffice to:

:; mkdir -p nut-tmp && cd nut-tmp \
  && git clone https://github.com/networkupstools/nut . \
  && ./ci_build.sh \
  && NUT_CONFPATH=/etc/nut ./drivers/usbhid-ups -a cyberpower-rack -d1
### (assuming your configs are in `/etc/nut/ups.conf`)

...to get a single data collection walk and dump afterwards similar to the upsc report. Stop the original nut-driver service before this so the USB device node is available in the OS :)

You may have to run the test-mode driver via sudo and with -x user=... option to run it either as packaged nut account or to keep root. either way so that udev assigned permissions allow the program to attach to device node.

@adams-family
Copy link
Author

adams-family commented Nov 5, 2022

It seems that suggestion from https://raspberrypi.stackexchange.com/questions/66611/nut-cyberpower-data-stale might have helped. I'm pretty sure that I haven't changed anything else and the system has been up for the past 3 days straight.

To clarify, this is what I did:

$ vi /etc/nut/ups.conf

[cyberpower-rack]
    driver = usbhid-ups
    port = auto
    desc = "CyberPower UPS"
    pollinterval = 15                      # ADDED this
$ vi /etc/nut/upsmon.conf

DEADTIME 25                                # ADDED this
MAXAGE 25                                  # and this, as well
$ reboot

I'll report back if it goes wrong. Do you think @jimklimov that it's worth for me to revert back and try building from the master branch?

@jimklimov
Copy link
Member

I suppose current NUT master does not address this, but a PR -- to detect CPS (maybe others too, maybe just by ids) and default to lower pollrate if unspecified by user -- might help for better out of the box experience :D

@adams-family
Copy link
Author

@jimklimov Ok, I agree. I will give it a few more days just to make sure it's stable. Then I revert my changes to reproduce that the WAIT/STALL issue happens again. This way we'll be sure that these changes exactly address the issue. Will report back here. Would you agree?

@jimklimov
Copy link
Member

Yes, sounds right - thanks!

@adams-family
Copy link
Author

@jimklimov Everything has been holding up well until now, reverting my changes, trying to reintroduce the issue again.

@adams-family
Copy link
Author

@jimklimov Strange as it seems, the problem doesn't seem to come back after reverting my changes. Maybe what I should do is: reinstall Raspberry PI and follow the exact steps again. That sounds more like a weekend project, I will keep it running though. I don't like the "it works, but I don't know why" effect 😂

@adams-family
Copy link
Author

@jimklimov All right, reproduced, got the "WAIT" state today. What would you suggest next? Will you create a branch/PR that I download and install instead of the out-of-box version?

@jimklimov
Copy link
Member

Revising old notes, remembered to suggest pollonly for CPS devices.

@MilhouseVH
Copy link

Hopefully only suggest pollonly in the documentation! It's probably not necessary to make any code changes...

I've been running a CyberPower 1500EPFCLCD-UK for the last 2 years with 2.74 on RPi with the following settings:

[ups]
    driver = usbhid-ups
    vendorid = 0764
    productid = 0501
    port = auto
    desc = "CyberPower 1500EPFCLCD-UK in closet"

    ignorelb
    offdelay = 120

and the only issue is that the UPS will randomly stop responding, it could be weeks or months, but a restart of the driver after a COMMBAD will restore communication for another few weeks/months.

While investigating this issue with 2.74 I tried a self build of 2.80 and that didn't appear to have the same issue, so it may already be fixed.

As it happens a week ago I switched to the latest master build (2.82+) so if there's any issues I'll report back. So far, so good.

@jimklimov
Copy link
Member

Thanks for the update!

FWIW, in a brewing commit, it would loudly "only suggest" pollonly for CPS HID, as it is a flag so we would need to introduce an opposite flag (or handle a zero value as an option or something) to disable it where it breaks things :)
It would however speed up default pollfreq (full vs small updates) from 30 to 12 sec for CPS devices, if not specified in config.

jimklimov added a commit to jimklimov/nut that referenced this issue Dec 16, 2024
…NG.adoc: introduce DEFAULT_POLLFREQ_CPS and suggest pollonly on CPS devices [networkupstools#1689]

Signed-off-by: Jim Klimov <jimklimov+nut@gmail.com>
@jimklimov jimklimov added this to the 2.8.3 milestone Dec 16, 2024
@jimklimov
Copy link
Member

jimklimov commented Dec 18, 2024

PR #2718 is presumed to have fixed the broken CPS input/output voltage reports mentioned in this issue (although it did not focus on cases of "voltage too high" like here).

PR #2722 addressed default poll frequency for CPS HID.

@jimklimov jimklimov added need testing Code looks reasonable, but the feature would better be tested against hardware or OSes Incorrect or missing readings On some devices driver-reported values are systemically off (e.g. x10, x0.1, const+Value, etc.) labels Dec 18, 2024
@MilhouseVH
Copy link

All good so far with a CP1500EPFCLCD (0501/0764) using the new default polling frequency. 👍

With this UPS and 2.74 it could take anywhere from a few days to couple of months for the UPS to stop communicating (but a driver restart - systemctl restart nut-driver etc. in response to a COMMBAD - would always fix that).

Settings:

[ups]
    driver = usbhid-ups
    vendorid = 0764
    productid = 0501
    port = auto
    desc = "CyberPower 1500EPFCLCD-UK in closet"

    ignorelb

    override.ups.mfr.date = 2023-03-28
    override.battery.mfr.date = 2023-03-28
    override.battery.charge.low = 3
    override.battery.runtime.low = 180

System log:

Dec 18 12:36:16 raspberrypi systemd[1]: Starting Network UPS Tools - device driver for NUT device 'ups'...
Dec 18 12:36:16 raspberrypi nut-driver@ups[26837]: Network UPS Tools upsdrvctl - UPS driver controller 2.8.2.1751-1751-g6ae5fead6 (development iteration after 2.8.2)
Dec 18 12:36:16 raspberrypi nut-driver@ups[26837]: Network UPS Tools 2.8.2.1751-1751-g6ae5fead6 (development iteration after 2.8.2) - Generic HID driver 0.60
Dec 18 12:36:16 raspberrypi nut-driver@ups[26837]: USB communication driver (libusb 1.0) 0.50
Dec 18 12:36:16 raspberrypi nut-driver@ups[26837]: Using subdriver: CyberPower HID 0.82
Dec 18 12:36:16 raspberrypi nut-driver@ups[26837]: Defaulting 'pollfreq' to 12 for CPS devices
Dec 18 12:36:16 raspberrypi nut-driver@ups[26837]: You may want to set 'pollonly' flag on CPS devices
Dec 18 12:36:16 raspberrypi nut-driver@ups[26837]: using 'battery.charge' to set battery low state
Dec 18 12:36:16 raspberrypi nut-driver@ups[26837]: using 'battery.runtime' to set battery low state
Dec 18 12:36:16 raspberrypi nut-driver@ups[26837]: Startup successful
Dec 18 12:36:16 raspberrypi systemd[1]: Started Network UPS Tools - device driver for NUT device 'ups'.
Dec 18 12:36:18 raspberrypi nut-driver@ups[26837]: sock_connect: enabling asynchronous mode (auto)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CyberPower (CPS) impacts-release-2.7.4 Issues reported against NUT release 2.7.4 (maybe vanilla or with minor packaging tweaks) Incorrect or missing readings On some devices driver-reported values are systemically off (e.g. x10, x0.1, const+Value, etc.) Linux Some issues are specific to Linux as a platform need testing Code looks reasonable, but the feature would better be tested against hardware or OSes raspberry USB
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants