-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bulk ADC Sensors #6555
Bulk ADC Sensors #6555
Conversation
Looks like the build failed due to the increased binary size on small platforms. I'll have to do an update to exclude the HX71x on those. Edit 1: looks like only the AR100 checks the size? |
1be4db5
to
de8d3e8
Compare
Not sure I follow how that is relevant to this PR? |
Not at all, this was supposed to go into my Pr. Sorry :-( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! In general it looks good to me. I have some comments.
Procedurally it would be preferable if the hx71x and ads1220 code were in separate commits. Breaking up the commits helps with tools like git blame
and git bisect
. It seems like the two are mostly separated anyway, so hopefully it is not hard to create two atomic commits. I can also help with this.
I noticed a few, mostly minor, things in the code. See my comments below.
-Kevin
klippy/extras/ads1220.py
Outdated
# TODO: is a manual 50us delay needed here? Testing says it's not. | ||
# read startup register state and validate | ||
val = self.read_reg(0x0, 4) | ||
if hexify(val) != hexify(RESET_STATE): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, it's certainly seems odd to encode two numeric objects into a string to compare them..
klippy/extras/bulk_adc_sensor.py
Outdated
# Helper for ClockSyncRegression that handles generating timestamps | ||
# while processing a batch of samples | ||
class TimestampHelper: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it would be good to use shared code for the intricate timestamp manipulation that the bulk sensors implement. However, I think this should be done for all bulk sensors and implemented in bulk_sensor.py . I'll put together a separate PR for discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll wait to refactor on top of that PR
|
||
// hx71x ADC query | ||
void | ||
hx71x_read_adc(struct hx71x_adc *hx71x, uint8_t oid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment on using 4-bytes when 3-bytes is sufficient. Same comment on not invoking shutdown().
src/sensor_hx71x.c
Outdated
int32_t counts = 0; | ||
hx71x_time_t start_time = timer_read_time(); | ||
for (uint_fast8_t sample_idx = 0; sample_idx < 24; sample_idx++) { | ||
hx71x_pulse_clocks(hx71x); | ||
// read 2's compliment int bits | ||
counts = (counts << 1) | gpio_in_read(hx71x->dout); | ||
} | ||
|
||
// bit bang 1 to 4 more bits to configure gain & channel for the next sample | ||
for (uint8_t gain_idx = 0; gain_idx < hx71x->gain_channel; gain_idx++) { | ||
hx71x_pulse_clocks(hx71x); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a minor note, this code could be optimized a bit. Something to look at after merging though.
I have a busy few weeks coming up and I'm not sure when I'm going to get to all the comments. If I can grab a few minutes here and there ill knock one out and just tack on a commit. When its all done we can squash git history into 2 commits as requested. I'm not sure how I'm going to prove the sensor reset stuff works. I cant force an error on command without adding extra code. I might put in a counter that forces a reset from the MCU side every 30 seconds or something and then take that our for the submission. Will probably save this for last. |
Thanks. I'm not sure what the "sensor reset" stuff is. There were a bunch of stuff discussed above, but putting aside various code commentary, I'd say only the following are "blockers":
-Kevin |
Looking at the accelerometer code, it seems to ignore the errors. I don't see where Here is a primer on the problem: These chips fail at runtime due to ESD. The big metal load cell acts like a ground path and they are very susceptible to static. This causes the chip to reboot, erasing any settings it had from klippy startup. Continuing to sample after an error wont recover the sensor. Even if it does produce data it is going to be the wrong channel/sample rate/gain etc. So once the chip resets the data stream is dead. Some chips in this class send a reboot flag with their measurements but these chips are too cheap for that. So all of these checks are heuristics to detect a reset. Even taking too long to collect a sample can change the settings on the HX71x so the next sample is garbage. When an error is detected we have to stop sampling, re-program the chip and restart sampling. I avoided doing this because its complex. and when installed in a properly grounded printer I don't see these errors. But I'm sure at a large scale they will happen at some non zero rate. I agree with you that sending the errors to the Host is the right thing to do. But if all the Host does it shutdown on error its a marginal improvement. Ideally I'd like to try and recover so that a stray ESD doesn't ruin a print. |
Thanks. I can only give high-level comments as I'm not familiar with the details of these sensors - so take my comments with "a grain of salt".
I was referring to how
For what it is worth, as a user, I'd not want any automatic error recovery like that. To me, restarting the sensor sounds like it is very likely to lead to overall poor measurement results. It may be possible to restart the sensor, but there is a very high chance the resulting values will not be of high quality - the nozzle/bed contact may have been missed or distorted, the timing correlation is likely off, etc. So, it may be possible to restart the sensor, but there is a very good chance I'll get a terrible first level layer height. As a users, I'd prefer the action to be stopped and to receive an error that says "your hardware is malfunctioning and needs to be fixed", so that I can then fix the hardware and go on to get great first layer quality. As a developer, I would also prefer not to maintain complex code that, to wit, "makes it easy for users to get bad results". So, my concern is not with reporting of errors, it is with using Again, I can't give definitive advice on how to handle errors for these sensors. It seems to me though, that if it's a questionable sensor reading we would want to mark it as a questionable reading and let the high-level code determine what action to take. If the sensor gets in a state where it can't produce any readings, then we can signal that as well (so that the high-level code knows it wont get further readings and can raise an appropriate error message). For this initial PR of sensor support, with a goal of getting sensor data to the api server, a Thanks again. Hope that helps. |
Thanks, I think I can work with that guidance.
This is going to happen via the LoadCellEndstop. When the sensor stops reporting data during a homing event, its a critical failure. The LoadCellEndstop has a watchdog timer and if no data is received for 2 sample windows it will issue a I think an axis moving without a functioning endstop counts as a We don't have to shutdown in the sensor code to achieve that, so they will be removed from this PR.
I also don't want to implement this due to the complexity. (Users should ground their printers!!) The next best thing is to just stop measurements when the Host gets an error from the sensor. So inside the sensor it will not re-schedule the sampling timer on error. On the Host, I can flush the error to clients and then run the finish measurements logic which will disconnect all of the consumers from For something non-critical like a Filament scale it should be possible to restart measurements without halting the print. Or just let the filament scale be broken until the Host restarts. Ideally a non-critical use shouldn't ruin a print. |
A bit off topic here, but the best way to rapidly stop a homing/probing operation is to issue a -Kevin |
ahh good to know. Will get that fixed up. |
2179b87
to
45fc1be
Compare
I got the C part in, shutdowns are gone. When an error happens the sensor just turns off its pending flag and flushes the error. That will be the last packet it sends before its restarted (if its ever restarted). I used the I have to do the re-base to mainline before I can tackle the Python side |
0d3bcb3
to
8d7acd7
Compare
I rebased and now the ROM for stm32f042 is too large. How are we deciding what to cut from these space limited chips? |
There's no documented strategy - generally we disable whatever new feature caused the build to go over in size. It's really useful to do build all the targets that we have config files for - it catches lots of inadvertent errors. There isn't a need for these smaller chips to support every possible option though. -Kevin |
rn both revisions fail on imports :( should I start worrying and trying to finish this pr, or am I missing something obvious?) |
|
Yes, and I mean both commits in the branch |
I just need to get some time to work on it. Looks like I have to exclude the hx71x sensor from all the places that would support it but don't have enough space left in their image to do so. I don't have the local tooling on my Mac setup to test this so I'm doing commits and having github do the testing. |
I don't want to rush you, sorry if that sounded like it — I just wanted to ask if any help is needed. Thank you for working on this! |
Okay, thanks. I had a couple of additional ideas you may want to consider. I added a few commits to the branch at https://github.com/KevinOConnor/klipper-dev/tree/work-hx71x-20240611 . Briefly:
Just ideas to consider. Cheers, |
#1: I'm scared this is getting complicated The place this would help is when homing, it would make the trigger delay as short as possible, reducing overshoot. 1/10th the sample period is ~300us at 320SPS and 1250us at 80SPS. But the pin checks for both sample rates take the same amount of time. What may be optimal is to set a fixed interval in microseconds for pin checks for all sample rates, (e.g. 100us). Then, say you want to do 3 pin checks, on average, before seeing a signal.
On the ADS1220 it can do 2000 SPS. 1/10th of its interval is 50us, so maybe the PIN_CHECK_INTERVAL needs to be shorter than I put here. I also though about making the PIN_CHECK_WINDOW adaptive. If the sample is ready on check 1 then the window should have been larger. If its ready on sample 5 then it should get smaller... but that is getting complicated. Lower MCU load and faster sample rates are much better answers to these problems. #2 OK with me! In my head I though that I had implemented only sending back 24 bits per sample. But given that its 32bits (and you are OK with that) we can totally use the high bits for error codes. #3 Would you be OK with just sending both?
The floating point value doesn't properly convey the saturation point of the sensor :
So checking for Secondly rounding discards data, e.g. 9 digits isn't enough to represent the precision of a 32 bit sensor:
|
1 - I was thinking of this: static uint_fast8_t
hx71x_event(struct timer *timer)
{
struct hx71x_adc *hx71x = container_of(timer, struct hx71x_adc, timer);
uint32_t rest_ticks = hx71x->rest_ticks;
if (hx71x->pending_flag) {
hx71x->sb.possible_overflows++;
rest_ticks *= 4;
} else if (hx71x_is_data_ready(hx71x)) {
// New sample pending
hx71x->pending_flag = 1;
sched_wake_task(&wake_hx71x);
rest_ticks *= 8;
}
hx71x->timer.waketime += rest_ticks;
return SF_RESCHEDULE;
} Which I don't think is overly complex. I don't think we need to worry about responsiveness as we'll already be delayed by several milliseconds due to filtering (and worst case multi-mcu homing situations). So, no point in optimizing for microseconds. I also think it is fine to hardcode the x4 and x8 in the mcu - the host and mcu code are tightly coupled anyway. If you think the above is too obscure or too complex then I think the simple unchanging 2 - okay. 3 - I'm fine with it any way - counts, float, or both. It's not a big concern. FWIW, I don't know of any adc sensor that's more accurate than about 15bits in practice, so I don't think precision really matters. (Sensors claiming more accuracy are just using creative marketing as far as I'm aware.) Cheers, |
I got a change stack built up today. I got the machine to home but not probe. I need to work through all the refactoring that have happened in |
e23d022
to
6aec417
Compare
I've had a chance to try artificially breaking the sensor and forcing the host to restart it and this worked just fine. It goes into the error state on the MCU and just keeps broadcasting errors until it gets reset. This dovetails with the later endstop implementation where it wont pass any data to the endstop once it enters the error state. I still need to do this testing on the ADS1220. Once the HX71x sensors encounter a detectable error I don't think we should trust its results until its reset and thats what is implemented. If there is a desync the sensor can produce results that are from a different channel than configured and that could get passed to the endstop as valid data to prolong a homing move and damage the machine. On the ADS1220 I've seen it just produce garbage bits when reading the settings registers so I don't trust its output either. I have a series of branches that build up the upcoming load cell PR sequence (pr-load-cell-gram-scale, pr-load-cell-endstop, pr-probe-load-cell)) and I have test machine printing again on top of the last one. |
6aec417
to
bd722b9
Compare
The latest update has a small change to not return an empty report when an overflow is encountered and instead return whatever data was reported. I've created a test branch that can probe with the sensor set to fail every n seconds: test-sensor-errors. I have the load cell endstop set up to trigger the trsync with an error code now instead of a shutdown. There are several failure modes and the errors they generates are not always clearly of the form "your sensor had an error". Probably the most insidious one happens when the sensor resets and then homing starts. This causes the tearing code to get no data for the first bit of the tear, resulting in not enough samples being collected before it times out. But since the reset already happened the tear process doesn't get any errors to explain the root cause. I need to consider the sensor "offline" if it sends out an error packet and only consider it back "online" when the next non-error packet is sent. |
Thanks. What's the status of this PR? Is it ready for merging? I have a few questions/comments about the latest implementation. These can be adjusted post-merge if changes are needed.
Thanks again, |
Also, just "thinking out loud" - have you considered halting any ongoing homing on a sensor error, while still allowing data reports to continue? (That is, continue to populate the bulk sensor reports, but immediately notify the homing code that it should abort.) -Kevin |
My idea is that this works like a temperature sensor. They all start when klipper starts. The |
This PR only exports the loadcell information via the API Server. If a client connects to Klipper and requests loadcell information then sampling is automatically started at that point. Thus, there is no reason to start sampling at Klipper startup. I expected sampling to occur when there are users of the data, and sampling to be disabled when there are no users. See klippy/extras/ldc1612.py as an example. Cheers, |
I'll look at just having the sensor stop sampling and having Does that sound like a good path forward? |
Thanks. My main question is - do you feel this PR is ready for merging? If so, I think we'd be better off merging as is and then look to make any remaining changes on top of it.
Well, I'd prefer to merge this PR before going on to the main loadcell work. I'd also prefer for PRs to provide some user measurable benefit, and providing the ads1220/hx71x measurements via the api server (as the current stubbed out loadcell.py code does) would meet that minimum goal.
I'm not sure I understand this, but FWIW it sounds complex. Cheers, |
Great! Lets do that.
For this PR I mean to just have the sensors stop sampling on error and terminate all clients. Easy enough. I'll get your other concerns (logging and the |
Thanks. I did not fully understand your last response. Would you prefer that I merge what is here now, or would you prefer that I wait for additional changes? -Kevin |
I am working on this today. Will post changes by end of day. |
62939ce
to
f97365e
Compare
* Create the load_cell host module skeleton to create the sensors and start taking samples. * Add support for the HX717 and HX711 ADC sensors. Signed-off-by: Gareth Farrington <gareth@waves.ky>
Add support for the ADS1220 as an alternative to HX71x that supports SPI and higher sample rates. Signed-off-by: Gareth Farrington <gareth@waves.ky>
f97365e
to
bb6647a
Compare
I'm good to merge. I looked at raising an error in the hx71x code but I left it as-is. When an exception is raised the Batch Bulk Helper drops all the clients but doesn't notify them they were dropped. I think its better to notify clients with errors and make them responsible for deciding if the sensor is stable enough to use.
|
Thanks. I modified the commit messages of the two commits here and committed them. (I changed the first line of each commit to be in the standard I also committed two additional changes on top. Commit 13c75ea fixes the Config_Reference.md chapter hierarchy. Commit cb15d0f alters load_cell.py so that it does not automatically start sensors at startup and does not export a dummy Thanks again. |
Just wanna say @garethky - Thank you! I've been following your work for absolutely ages. Hardware to follow 😄 |
Add Bulk ADC Sensors and Load Cell skeleton
load_cell
skeleton to create the sensors and start taking samples.Is this Bug Free / Reliable?
Versions of the HX711/HX717 code have been running for weeks at a time on my test system. Others have tested this code as well and we are pretty confident this version doesn't suffer form any timing defects.
The ADS1220 code is newer but has been running for at least a week non-stop without an error. Since its SPI based it should be more reliable. The only quirk with this chip is that a reset is required before configuring it.
The only issue I'm aware of is with
_finish_measurements()
running when the printer experiences a SHUTDOWN. This causes the communication to the MCU to fail and logs an exception message. This is confusing for users as its at the end of the log but not the root cause of the shutdown. All of the other Bulk Sensors share this behavior. It would be nice if we could detect a shutdown in progress and skip the serial command in_finish_measurements()
.Build Changes
Bulk Sensor assumes that a sensor will be SPI or I2C. Currently it would be automatically excluded on small platforms that don't have these interfaces. But the HX71x chip family uses bit-banging and can run on anything with GPIO. So I altered the build to allow the HX71x chips to be turned off. If they are included they also bring in Bulk Sensor. I'm not sure which platforms we might want to exclude this on or if it was worth bothering about.
What doesn't this do?
The implementation of a digital scale is cut from this first PR to keep the size small. The plan is to put that code in the next PR.
These sensors are all "multiplex" ADCs, meaning they have a single measurement core that is connected to a multiplexer that allows the input channel to be switched at runtime. But they can only produce 1 measurement at a time and some delay is required to switch channels (often a couple of measurement cycles). This code does not make this feature usable, in fact for the ADS1220 I didn't even include an option to select an input channel. This feature has been used by Prusa in their MK4 and XL products. They use the 'A' channel on the HX717 to real the load cell and the 'B' channel to read the the analog filament sensor. They switch inputs at runtime at a regular duty cycle and rout the measurements to separate components. I decided this is out of scope due to the complexity it would create.