Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add thread locking to TEDAPI #148

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Nexarian
Copy link
Contributor

@Nexarian Nexarian commented Feb 23, 2025

Summary

  • Using a dictionary with an api lock is not thread safe. Given that this is frequently called from the proxy, a more thread safe option is appropriate, especially because I am in the process of modifying the proxy to call multiple gateways.
  • A large amount of the "changes" are simply indenting existing lines for the "with" statement that the api lock uses.
  • Update some typing
  • Switch to enumerate instead of manually incrementing indicies.

Testing

  • Using my Raspberry PI. This has been working well for about a week.

@Nexarian Nexarian force-pushed the add_api_lock_to_tedapi branch 2 times, most recently from 24dd7f5 to f925a6c Compare February 24, 2025 00:50
@jasonacox
Copy link
Owner

Thanks @Nexarian - I'll start testing. Out of curiosity, did you see any thread locking issues as a result of the current design? It would be interesting to correlate this to issues people see (if they do).

@Nexarian
Copy link
Contributor Author

The origin of this fix was the race condition issues we were seeing with the animation. That is now very clearly in the realm of something to do with JavaScript contention and browsers, but as part of debugging it I tried this.

Dictionaries are not thread-safe, and any thread safety you see is incidental from implementation and Python version, so this is a correct fix, though not the only fix. You could argue that exponential backoff and jitter isn't necessary here, and my use of a contextmanager that forces a with-indent might be over-engineered. I'll let you all think about it.

But as best I can tell, this works with no ill side effects and makes the code cleaner. The magic of the with block makes it so the lock is always released and we don't have to manually think about it anymore.

@Nexarian
Copy link
Contributor Author

I just realized this has a problem: I said dictionaries aren't thread safe and you shouldn't use them for that and I am still doing that here. defaultdict doesn't mean it'll be thread-safe. I think we need another way to store the locks... I'll think about it.

@mccahan
Copy link
Contributor

mccahan commented Feb 24, 2025

The origin of this fix was the race condition issues we were seeing with the animation. That is now very clearly in the realm of something to do with JavaScript contention and browsers, but as part of debugging it I tried this.

This is relevant to my interests. I was having difficulty with the animation JS absolutely hammering the endpoint because it runs the HTTP request every X seconds regardless of whether it has one running at the time or not, but with the minified code it was tedious to edit.

I've started rebuilding my own version of it without the dependency on the unminified existing code here: https://github.com/mccahan/power-flow . Using TanStack Query to make sure we're only running one HTTP request to pypowerwall at a time and being a bit more tolerant with some error situations.

@Nexarian
Copy link
Contributor Author

@mccahan 😮 That's awesome! Did you run the code through some sort of de-minifier or are you rebuilding from scratch?

@mccahan
Copy link
Contributor

mccahan commented Feb 24, 2025

Started from scratch, it seems like a whole technician setup app and everything so it didn't seem worth sorting through.

I'm torn on which way to go with it API-wise, with the issues I've had with timeouts and stalls I have a local copy of pypowerwall modified to add a websocket endpoint so it can broadcast whenever it gets new load/battery numbers to try and get around some things. This PR might change that plan a bit, though.

@Nexarian
Copy link
Contributor Author

@mccahan Feel free to test it and let me know! I still need to fix the dictionary thing, but overall it should be an improvement in locking capability.

@jasonacox
Copy link
Owner

I just realized this has a problem: I said dictionaries aren't thread safe and you shouldn't use them for that and I am still doing that here. defaultdict doesn't mean it'll be thread-safe. I think we need another way to store the locks... I'll think about it.

Ok, no rush. Let me know when it is ready to test. 🙏

@Nexarian
Copy link
Contributor Author

@jasonacox Should be ready now. There's perhaps more cleanup/hygiene to do here, but the functional aspect should be fine.

I was able to add a "thread safe" dictionary with minimal additional refactoring.

- Using a dictionary with an api lock is not thread safe. Given that this is frequently called from the proxy, a more thread safe option is appropriate, especially because I am in the process of modifying the proxy to call multiple gateways.
- Use a dictionary (ApiLockPool) that IS thread safe.
- Update some typing
- Switch to enumerate instead of manually incrementing indicies.
@Nexarian Nexarian force-pushed the add_api_lock_to_tedapi branch from f43a5d7 to c1f639a Compare February 27, 2025 04:34
@mccahan
Copy link
Contributor

mccahan commented Feb 27, 2025

@Nexarian I haven't tested the version from 2 hours ago, but the previous version I was still having an issue with request spamming. I didn't do a lot of investigation, though I do wonder if having separate locks for every kind of request is an issue.

This is a work-in-progress, but I do have part of my suggestion on the matter here: https://github.com/jasonacox/pypowerwall/compare/main...mccahan:pypowerwall:feature/queue-tedapi-requests?expand=1

  • Uses a queue and a worker thread to make sure we're running one query at a time
  • Deduplicates a lot of the lock/cache code out to that thread (should be thread safe because the cache belongs to that thread)
  • Since it's a queue, if you get multiple requests in the queue waiting on the same request it'll serve them all the same thing when the first resolves (within lifetimes)
  • Should make it a bit easier to handle multiple Powerwalls if that's your goal, since you could have a separate queue thread per PW
  • Refactoring a lot of the protobuf stuff into separate "Message" classes to clean up some mess in that massive file

@Nexarian
Copy link
Contributor Author

@mccahan The scope of this PR is very narrow: The way API locks were implemented was not correct or thread safe. That's all I intend to fix here. I had, in passing, mentioned that I found this as a result of investigating the behavior of the Tesla animation, but this wasn't directly intended to address anything there.

I don't fully understand what issues with request spamming means, and this PR wouldn't be intended to fix that anyway. I recommend starting a separate issue on the exact symptoms you're seeing.

I'm also not entirely convinced that making sure pypowerwall/proxies query the inverter serially is strictly necessary. There are provisions in the code to already handle backoff if we're getting timeouts from the gateways, and without more concrete data about how they scale, I believe this to be premature optimization.

Nexarian added 2 commits March 5, 2025 00:23
Rather than store the lock in a dictionary, which at best uses the
Global Interpreter Lock to get a value from a dictionary (and dictionary
reads are not guaranteed thread safe), store the lock as an attribute on
the function, which should never have access issues. It also removes a
"global" parameter from the object.
@Nexarian
Copy link
Contributor Author

Nexarian commented Mar 5, 2025

@jasonacox I updated the way the threading.Lock is stored. I think this is better... It uses a decorator on the functions that need api locking. I originally wanted to do an array, but this is more "automatic." After a lot of thought, dictionaries aren't great for storing these, they consume the Global Interpreter Lock for reads and are not guaranteed thread safe anyway.

In this way, the lock is defined once, has thread-safe access, is defined in a generic way, and is relatively easy to access.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants