Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix JK BMS connection restart when bluetooth fails. #941

Merged
merged 1 commit into from
Jan 28, 2024

Conversation

cupertinomiranda
Copy link
Contributor

This fix installs a new thread to monitor the state of the original scraping thread.
If scraping thread dies, it verifies that it did not because the scraping was intentionally stopped by calling stop_scrapping. When restarting the scrapper, it first calls the bluetooth reset lambda function that was passed in the class contructor, such that bluetooth is ready to make a proper connection.

This problem was reported by me in issue #915 which got closed since it was a duplicate of #720.

Looking forward to your comments !

This fix installs a new thread to monitor the state of the original
scraping thread.
If scraping thread dies, it verifies that it did not because the
scraping was intentionally stopped by calling stop_scrapping.
When restarting the scrapper, it first calls the bluetooth
reset lambda function that was passed in the class contructor, such that
bluetooth is ready to make a proper connection.
@mr-manuel
Copy link
Collaborator

Thank you very much for contributing! Did you already test it? A warning message when the treat crashed would be useful :-)

@cupertinomiranda
Copy link
Contributor Author

Thank you very much for contributing!

Thank you for contributing the project ! 😄

Did you already test it?

I have force tested it by causing an explicit exit from the thread. It did reboot the thread properly in that case.
I am currently using it to make sure that it restarts the thread in case the problem reported in #915 happens.
It would be nice to have the confirmation from issue #720 creators as well, to check if it solves that problem too.

A warning message when the treat crashed would be useful :-)

The original code already prints a message for both normal and exception thread exits.

@mr-manuel mr-manuel merged commit 3c2401a into Louisvdw:dev Jan 28, 2024
@mr-manuel
Copy link
Collaborator

You think it would be possible to integrate your fix globally that it works for all BLE drivers?

@cupertinomiranda
Copy link
Contributor Author

Very likely, I did only debug the problem in the context of the JK BMS. However, if other BLE implementations have similar designs it is likely they somehow suffer from the same problem.
I will attempt to give it a look and if I find some more generic solution, I will make another pull request.

@mr-manuel
Copy link
Collaborator

Best would be, if we add a thread here

# try using active callback on this battery
if not battery.use_callback(lambda: poll_battery(mainloop)):
# if not possible, poll the battery every poll_interval milliseconds
gobject.timeout_add(battery.poll_interval, lambda: poll_battery(mainloop))

that works similar as this function, if no data is fetched

def publish_battery(self, loop):
# This is called every battery.poll_interval milli second as set up per battery type to read and update the data
try:
# Call the battery's refresh_data function
result = self.battery.refresh_data()
if result:
# reset error variables
self.error["count"] = 0
self.battery.online = True
# unblock charge/discharge, if it was blocked when battery went offline
if utils.BLOCK_ON_DISCONNECT:
self.block_because_disconnect = False
else:
# update error variables
if self.error["count"] == 0:
self.error["timestamp_first"] = int(time())
self.error["timestamp_last"] = int(time())
self.error["count"] += 1
time_since_first_error = (
self.error["timestamp_last"] - self.error["timestamp_first"]
)
# if the battery did not update in 10 second, it's assumed to be offline
if time_since_first_error >= 10 and self.battery.online:
self.battery.online = False
# check if the cell voltages are good to go for some minutes
self.cell_voltages_good = (
True
if self.battery.get_min_cell_voltage() > 3.25
and self.battery.get_max_cell_voltage() < 3.35
else False
)
# reset the battery values
self.battery.init_values()
# block charge/discharge
if utils.BLOCK_ON_DISCONNECT:
self.block_because_disconnect = True
# if the battery did not update in 60 second, it's assumed to be completely failed
if time_since_first_error >= 60 and (
utils.BLOCK_ON_DISCONNECT or not self.cell_voltages_good
):
loop.quit()
# if the cells are between 3.2 and 3.3 volt we can continue for some time
if time_since_first_error >= 60 * 20 and not utils.BLOCK_ON_DISCONNECT:
loop.quit()
# This is to mannage CVCL
self.battery.manage_charge_voltage()
# This is to mannage CCL\DCL
self.battery.manage_charge_current()
# publish all the data from the battery object to dbus
self.publish_dbus()
except Exception:
traceback.print_exc()
loop.quit()

@cupertinomiranda
Copy link
Contributor Author

cupertinomiranda commented Jan 28, 2024

I am working on a general approach to better control driver monitoring for the drivers that contains threads etc.
I will define a is_active method in Battery to default to true and move the monitoring to the Battery itself.
If something goes wrong with the driver, is_active should return false and the main process should reinitialize the driver.

What do you think ?

@mr-manuel
Copy link
Collaborator

What if the driver if the bms is stuck and therefore can't update Is_active to false? This is what the current issue is. Therefore we need only a monitoring for the non polling drivers, since for the polling drivers there is already a check.

@cupertinomiranda
Copy link
Contributor Author

cupertinomiranda commented Jan 29, 2024

Actually I was referring to the thread to be created within the Battery code, and any subclass of Battery (such as Jkbms_Ble) would implement the specifics of this monitoring. The is_active idea was already dropped. The solution would be something similar to what was done in my patch but generic, by extending Battery class instead. I almost have it done. It takes me a while since I do not have a testing environment, would need to test it in production. 😞

I would appreciate if you can give me some pointers on how to implement it on some sort of simulated environment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants