-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I2C/Wire library: Make Wire library non-blocking #42
Comments
I have just integrated my proposal from arduino/Arduino/issues/1476. |
Possible implementations:
Looking at these, I suspect that the version by @wmarkow and my own might be the best starting points, but neither completely solve the problem AFAICS. One particular challenge I've found (but I'm not sure if it is really properly documented yet), is that the AVR TWI hardware is particularly sensitive to noise that makes it look like there is a second master on the bus. Since I²C is a multi-master bus, when the hardware detects an external start condition, or an arbitration error, it will assume there is another master active and hold off any bus activity until it sees a stop condition. However, when there is no other master but just noise on the bus, this will probably never happen. When the hardware ends up in such a state, there is no way to actually detect this state (i.e. no status bit or anything), other than detecting a timeout (i.e. detecting that the hardware hasn't finished in reasonable time, so probably never started in the first place). There is a arbitration lost interrupt that can detect the start of this state in some cases (but not the end). The only way I've found to recover from this situation is to disable and re-enable the TWI hardware. So adding timeouts is probably the only way to fix this. However, just having hardcoded timeouts as most of these implementations (including my own) have is problematic, because:
I suspect the only really correct way to handle this is to let the sketch specify custom timeout values (or perhaps specify the max clock stretching time, whether multi-master should be supported and if so, the max transaction time of other masters). However, this requires API changes that made implementing this a lot more tricky, which is why I didn't submit my fixes for inclusion (and instead also ended up with hardcoded timeouts tailored to my specific case without multi-master and with limited clock stretching). |
Custom timeout values are perfectly fine, and I believe API change is justified. Current situation leads to permanent lock-up of the microcontroller, and is completely unacceptable. |
These lockups are murdering me right now! |
I've just tried https://github.com/3devo/ArduinoLibraryWire on my I2C lockups and it doesn't solve them either :-( |
https://github.com/wmarkow/Arduino/tree/issue_%231476 seems to prevent my firmware from locking up! 🎉
|
I've been looking at these lockups in my scope very closely for a few days now. See 3devo/ArduinoLibraryWire#1 for my details with scope traces if you're interested. I have some ideas for changing the Wire library to prevent them in the first place, but I haven't been able to figure out how to actually implement those fix ideas in the code yet. |
Hello @greyltc,
I just wanted to propose you to check out my code. Good that it works for you. However not everything seems to be covered there. It is nice that you give a few more cases to take a look into:
Indeed, when a timeout condition is met, then I restart the TWI hardware (
Yes, my code sets the timeout in millis but it seems to be no problems to rework it into microseconds. |
@wmarkow, I took your https://github.com/wmarkow/Arduino/tree/issue_%231476 branch and changed the timeout argument to microseconds and made any changes the user might have made to slave address or bitrate (the only two register values exposed by the Wire library) re-applied after the reset and put it in PR arduino/Arduino#107 |
I'm using a DS3231 external clock and noticed it 'freezes' when I disconnect power to the component. I want the program to continue looping even if the clock is disconnected. I narrowed it down to a Wire.endTransmission() line in the module code, and then found this page. Forgive my possible ignorant novice question, but which implementation listed by @matthijskooijman works best for this issue? Or is this a situation where there's some line of code I can add that will check whether the clock device is connected/powered before reaching the Wire.endTransmission() line? A lot of this discussion is over my head so I wasn't sure if this has a simpler solution than the issues discussed above? |
Use |
|
It probably depends on your circuitry, mine works when the other device is off. The reference manual says it sends a stop signal and releases the line, but it apparently doesn't. Or it could be that something else is wrong and we are not adept enough to understand it (electronics can be very complicated) |
It's been 9 FUCKING YEARS since that bug was first discussed (2011), countless people pulled their hair trying to understand why their Arduino was freezing, why would it work normally, then suddenly stop, a dozen of times, it's been raised in here, dozens of times people had been told to get lost, use something else, etc ... If the fix was difficult, if it compromised compatibility or added other problem, it would be understandable, but it's not the case, all that shit is caused by those 2 damn lines of code that obviously can create an infinite loop if for some reason the read operation does not complete : // wait for read operation to complete Yes, I know, this state is not supposed to happen according to the I2C protocol, but guess what ? electrical glitches didn't get the memo. Countless people, after losing hours or days kind of solved the issue either by making a modified version for themselves, or switching to another library, so there are several implementations of a timeout that are simple and would easily solve the issue. But arduino developers stubbornly refuse to fix it ! You think I'm rude ? After 9 years of giving the finger on this issue to the whole community for absolutely no reason, I couldn't care less. How long before you close it again ? |
Basically, @ermtl is right, this is ridiculous, WTF? |
Nah I don't think it... you are, period. Also posting the same message 5 times is not nice, especially for the 99.9% of people subscribed to this repo that are not interested in this discussion. Anyway, now you did it, so let's just skip over this.
We haven't closed anything... the real problem is that there isn't a one-fit-all solution. If you read the previous discussions all the proposed fix were always barely tested and tailored to the OP specific use case. I personally tested two PR in the past that, in their promises, will add timeout "without breaking anything" but you know what? at the times when I tested in my setup (an I2C display with a bunch of env sensors) they just broke my previously perfectly working sketch or made it unreasonably slow. Don't underestimate the complexity of this issue, it's not just "2 damn lines of code". Said that, I ask you, did you read the previous discussions? |
@cmaglie sure the @ermtl is rude, but his assessment is broadly accurate. I am willing to bet my house that someone is going to use Arduino to build a ventilator to deal with CORVID-19, and it will hang and kill someone. I tried to follow the discussion, but it's spread over many threads. Let's consider a strategy here:
I think the frustration is not just from lack of a solution - that's understandable if the problem is complex. I think it stems from lack of clarity and leadership, there does not appear to be a plan on how to get it fixed. If someone wanted to contribute, is it not clear what they should do, at least to me. |
@VladimirAkopyan: You would win that bet. I was amazed and mortified when one of the authors of this (https://github.com/ermtl/Open-Source-Ventilator) asked me a question about one of my libraries. Fortunately, I just noticed this in their README.md: |
bxparks : Well, it's a small world out there and I'm the main author of this Open Source Ventilator controller as part of an international team trying to design and assemble complete open source, easy to replicate emergency ventilators. A few month ago I had the problem on a commercial product, and I went all the way, from adding capacitors to better filter supply electrical noise, to lowering the value of the pullup resistors to 1Ko, adding little serie resistors just in case, to re designing the wiring, to thinking that I might have had a bad lot of ICs, ordering new ones, waiting for them to arrive, seeing they were no better, adding a watchdog (but it was just an ugly fix), searching for a new similar sensor, ordering the new sensor, trying it, understanding that it could not possibly be 2 different kind of sensors that are both bad the same way, finally suspecting something could be wrong with the library and finally finding this utterly stupid bug ! This is just what happened to me, and the bug was caught fast as the device was a noisy environment and it would lockup every few hours / minutes, but some are designing products that are not in an electrically noisy environment, and they might never see the bug, start selling their stuff and people place them close to motors, or old blinking fluo tubes, and suddenly, the products start to fail ! People use Arduinos as the glue between sensors and actuators, and a large proportion of them are I2C circuits. Having such a vicious bug lurking, ready to lockup the entire board undermines the whole platform and confines it to "hobby" status. Even then, when newcomers assemble their first gizmos, seeing it die on them for apparently no reason is really frustrating and might drive them away. In the Open-Source-Ventilator project, many people must cooperate, we're in a hurry, people are trying several sensors, so telling them they can't use any of the most usual libraries without rewriting them first is totally unrealistic. After spending a day on the problem (again), I found the jm_wire library (available in the library manager) that's compatible with Wire and the author simply made minimal changes to implement a fucking timeout, and, call that rocket science if you will, but yes, the timeout length can be changed, no less ! Crazy ... just what everyone asked for 9 years. However, it does not entirely solve the issue as I2C sensor libraries will reference the wire library within their code with When also including jm_Wire, all the library's function will be defined twice, and you can bet the compiler will yell and fail. So the less ugly way I found is to look at the source files of each and every I2C related library searching for #include <Wire.h> and manually replacing it with: #if __has_include("jm_Wire.h") It does the trick ... until you update the library ! Since I'm working on a device that needs utmost reliability, not just a gizmo that blinks, it's ok for me to warn people about the stupid problem, and tell them how to overcome it. Here's my explanation: There is just a thing I would change in the library, and it's that you have to set jm_Wire.twi_readFrom_wait and jm_Wire.twi_writeTo_wait to true manually. If that was by default, there would be zero impact on the API, the only change would be that in case of electrical error / glitch, the automagic random self hanging of the controller would be gone. And if people want multi master (given the flaws in the current library, I bet not many successfully ever did that, it can barely work with a single sensor) they can add delays by setting twi_readFrom_timeout and twi_writeTo_timeout ! If, after 9 FUCKING YEARS, Arduino developers still refuse to understand the problem and it's impact, I don't know what will wake them up. People tried asking, alerting, complaining, begging, and every time, the pull requests have been ignored, the bug have been closed, replaced with a new one so that newcomers have no idea this is such an old problem, and that countless others before them also complained, here and elsewhere about it. So yes, I'm rude and if cmaglie and others don't like it, they know where they can shove it up ! |
Guys, From a larger picture, all these fast and furious incremental code updates and suggestion for updates are, IMO, hacks. i.e. coding on the fly while sitting at a keyboard, with no overall design and not enough thought behind them.
I have no idea what you describing. In terms of inlining the i/o register updates vs using digitalWrite() In this case, using digitalWrite() is better than what you have proposed. |
@greyltc :
|
You're right. It will be correct.
I2C-bus specification and user manual I suggest solving the problem in a similar way. |
Obviously, this (TWI_READY != twi_state) check is done to make sure the second device is ready to receive / transmit. |
It seems that not everyone understands what. |
I think further convoluting this issue with discussion of the throughput performance of the I2C timeout implementation we've used is not such a great idea. I understand it's only still open because there's still some documentation left to do (which of course is the boring part, so that lingers). If anyone has a specific I2C performance problem that they'd like to raise, it seems best to file a new issue to cover that. Feel free to tag me there, especially if it's related to the newish timeout stuff and I'll do my best to chime in. If anyone has a specific idea for making an improvement to the library, you could submit that as a PR and we can discuss that specific change there. Though in making your PR, consider that API changes need to present some pretty compelling evidence to have a chance at being merged, especially backwards incompatible ones (and rightly so). Maybe less formal ideas for improvement could be hashed out on the forums or something before they're ready for a PR? I definitely think some of the specific performance improvement ideas that we've thrown around here show promise and I'll try to find some time to do some apples to apples oscilloscope informed benchmarking on them myself and submit them as a PR if I think the numbers I see warrant that. I'd tag a few of you on such a PR so we could review that together. |
@bperrybap |
@Aleev2007 The way to work around the potential optimization issues, would be to declare the 8 bit counters volatile, so that consistent code is always generated. Also there is still a 32 bit comparison in the runtime loop of when timeouts are enabled. To speed this up an additional 8 bit flag could be used to indicate that timeouts are enabled (set to true if twi_timeout_us is non zero. |
@bperrybap |
@bperrybap timeout switch now uses only 10 MSU ticks ;-)
|
Coming back to the orignal topic of eliminating the potential of a loop lockup in the Wire library, Having looked at the 1.8.13 "timeout" code now in a bit more detail, I think it would be useful for everyone to step back and think about the loop lockup issue and look at the lockup/timeout problem with a fresh set of eyes. IMO, the current implementation is a bit too heavy so it impacts i/o performance a bit too much. Would a simple hard coded non configurable timeout, on the order of say 1/2 or 1 second, be good enough? Since the WIRE library and WIRE API is somewhat a universal API implemented across nearly all platforms, running on processors at various speeds with various resources, the timeout functionality and any new WIRE API functions should carefully consider their impact on other platforms or slower processors. In other words would it be better in the grand scheme of things to have a simpler implementation that could be adopted more widely? vs trying to make something that has such granular configurability? I bring this up because if it becomes desirable to make some significant changes to how loop timeouts are handled to simplify things or offer better performance, now would be the time to do it, before the existing 1.8.13 "timeout" code and API starts to get documented and gets too many dependencies. Anybody that is working on this type of code and making these types of changes should be capable of testing their stuff themselves first before they make it available to others. @Aleev2007 You have made significant changes to the low level twi.c code in places that are not related to this loop timeout. Also, nearly all the comments have been stripped out. There is no need to be making these types of changes, particularly when 1) they are in places that will not affect the runtime i/o performance, and 2) breaks things. The twi.c code also has several warnings in it and I believe that one is indicating a calculation error due to a math/expression precedence order. |
@bperrybap Even if you have no use for a timeout faster than 500ms or 1 second, it's still useful for others. |
For me personally, none of this really matters as I don't have anything that requires and demands maximum i/o throughput or the ability to set very short state change timeouts. I do think of utmost critical importance is that there needs to be some kind of macro that user code can test at compile time so the user code call tell if this new loop abort/timeout capability exists, especially if it is not enabled by default which means the user code do something to enable the timeouts. Whenever you guys have something that works, I'll be happy to try it. In terms of performance, as of right now, from what I've seen, the approach that @Aleev2007 has gone down or something similar to it (i.e. not using micros()), can reduce state change latency and/or state change loop timeouts, especially when loop timeouts are enabled, more than an approach that uses micros() |
I probably got tired yesterday, from many attempts, to force the compiler to make code in which I can calculate the delay in the loop. I decided to go over from the beginning and have now made minimal changes to the old version 1.8.12 of the library. And I haven't looked at ISR yet, I also have ideas about it. |
@Aleev2007 There is too much keyboard hacking going on for this project by multiple parties. |
In commit deea929 (Introduce non compulsory Wire timeout), some timeout-related functions were added. To allow writing portable sketches, it is important for those sketch to know whether they are using a Wire library version that is new enough to offer these functions without having to rely on version number checking (since other Arduino cores have their own versioning schemes). This adds a WIRE_HAS_TIMEOUT macro, similar to the existing WIRE_HAS_END macro, to facilitate that. This relates to arduino#42.
I had the same error, I'm a newb too and haven't used cpp for a while. I'm the kinda newb that hits every bug and had a similar error when I used interrupts a few weeks ago. I found out somewhere that if you place a lot of code in the vector code blocks that it can cause the code to hang. I don't remember where I came across the advice but the recommended approach was to set a flag only in the vector and to use the loop to poll for the flag and carry out the necessary changes. Reading through the datasheet for the UNO, it says that the TWI is an interrupt based system. I used this approach and it overcame my issue. Thought it might be useful for someone here! 🤷♂️ |
How about int16 for the timeout, default +25ms (and enabled, use millis()), use < 0 for advanced use i.e. micros() resolution? |
I just need a way to skip/abandon a line of code hanging after a short period of time... I honestly don't have this fully spec-ed out yet, but maybe we could provide the community with an idempotent interrupt implementation pattern to externally manage a hanging The execution surrounding the interrupt(s) may include:
EDIT: The following also seems to work using https://github.com/rambo/I2C #define DEV_ADDR 0x50
#define DEV_TIMEOUT_MS 3000
void WireCheck() {
Serial.print("Checking wire status...");
I2c.timeOut(DEV_TIMEOUT_MS);
I2c._start();
I2c._sendAddress(DEV_ADDR);
byte wireStatus = I2c._stop();
switch (wireStatus) {
case 0:
Serial.println("WIRE_OK");
return;
case 1:
Serial.println("WIRE_TIMEOUT");
return;
case 0xff:
Serial.println("WIRE_ERR");
return;
}
} Might have a bug in here, but it is at least timing something out... |
@mattborja, this timeout feature has actually been implemented (two years ago) in arduino/Arduino#107, but it is not enabled by default. Did you try turning it on using https://www.arduino.cc/reference/en/language/functions/communication/wire/setwiretimeout/ ? Since we seem to have forgotten closing this issue, I'll do that now. For future reference, arduino/Arduino#107 implemented the feature, arduino/Arduino#362 is still pending to add feature detection macros like |
@matthijskooijman I hadn't come across that yet, but will give it a shot. I do prefer native libraries whenever possible. Thanks! |
@matthijskooijman Yeah, that seems to be working great. Aside: Extra "s" in heading Parameters here: https://www.arduino.cc/reference/en/language/functions/communication/wire/endtransmission/#_parameterss Thanks again! |
Thanks for reporting that @mattborja. I see the "syntaxx" section heading suffers from the same error. Please submit a correction via a pull request here: |
This is a placeholder issue to cover an arduino/Arduino/issues/1476 improvement.
It looks like arduino/ArduinoCore-avr repository is the correct one to cover that case, doesn't it?
The text was updated successfully, but these errors were encountered: