Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk ADC Sensors #6555

Closed
wants to merge 2 commits into from
Closed

Conversation

garethky
Copy link
Contributor

@garethky garethky commented Apr 8, 2024

Add Bulk ADC Sensors and Load Cell skeleton

  • Add support for the ADS1220, HX717 and HX711 ADC sensors.
  • Add a load_cell skeleton to create the sensors and start taking samples.

Is this Bug Free / Reliable?

Versions of the HX711/HX717 code have been running for weeks at a time on my test system. Others have tested this code as well and we are pretty confident this version doesn't suffer form any timing defects.

The ADS1220 code is newer but has been running for at least a week non-stop without an error. Since its SPI based it should be more reliable. The only quirk with this chip is that a reset is required before configuring it.

The only issue I'm aware of is with _finish_measurements() running when the printer experiences a SHUTDOWN. This causes the communication to the MCU to fail and logs an exception message. This is confusing for users as its at the end of the log but not the root cause of the shutdown. All of the other Bulk Sensors share this behavior. It would be nice if we could detect a shutdown in progress and skip the serial command in _finish_measurements().

Build Changes

Bulk Sensor assumes that a sensor will be SPI or I2C. Currently it would be automatically excluded on small platforms that don't have these interfaces. But the HX71x chip family uses bit-banging and can run on anything with GPIO. So I altered the build to allow the HX71x chips to be turned off. If they are included they also bring in Bulk Sensor. I'm not sure which platforms we might want to exclude this on or if it was worth bothering about.

What doesn't this do?

  • The implementation of a digital scale is cut from this first PR to keep the size small. The plan is to put that code in the next PR.

  • These sensors are all "multiplex" ADCs, meaning they have a single measurement core that is connected to a multiplexer that allows the input channel to be switched at runtime. But they can only produce 1 measurement at a time and some delay is required to switch channels (often a couple of measurement cycles). This code does not make this feature usable, in fact for the ADS1220 I didn't even include an option to select an input channel. This feature has been used by Prusa in their MK4 and XL products. They use the 'A' channel on the HX717 to real the load cell and the 'B' channel to read the the analog filament sensor. They switch inputs at runtime at a regular duty cycle and rout the measurements to separate components. I decided this is out of scope due to the complexity it would create.

@garethky
Copy link
Contributor Author

garethky commented Apr 8, 2024

Looks like the build failed due to the increased binary size on small platforms. I'll have to do an update to exclude the HX71x on those.

Edit 1: looks like only the AR100 checks the size?
Edit 2: looks like it should also be excluded on platforms that don't WANT_GPIO_BITBANGING

@garethky garethky force-pushed the bulk-adc-sensors branch 3 times, most recently from 1be4db5 to de8d3e8 Compare April 8, 2024 17:29
@garethky
Copy link
Contributor Author

Not sure I follow how that is relevant to this PR?

@eazrael
Copy link

eazrael commented Apr 10, 2024

Not at all, this was supposed to go into my Pr. Sorry :-(

Copy link
Collaborator

@KevinOConnor KevinOConnor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! In general it looks good to me. I have some comments.

Procedurally it would be preferable if the hx71x and ads1220 code were in separate commits. Breaking up the commits helps with tools like git blame and git bisect. It seems like the two are mostly separated anyway, so hopefully it is not hard to create two atomic commits. I can also help with this.

I noticed a few, mostly minor, things in the code. See my comments below.

-Kevin

docs/Config_Reference.md Outdated Show resolved Hide resolved
klippy/extras/ads1220.py Outdated Show resolved Hide resolved
klippy/extras/ads1220.py Outdated Show resolved Hide resolved
# TODO: is a manual 50us delay needed here? Testing says it's not.
# read startup register state and validate
val = self.read_reg(0x0, 4)
if hexify(val) != hexify(RESET_STATE):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, it's certainly seems odd to encode two numeric objects into a string to compare them..

Comment on lines 27 to 9
# Helper for ClockSyncRegression that handles generating timestamps
# while processing a batch of samples
class TimestampHelper:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it would be good to use shared code for the intricate timestamp manipulation that the bulk sensors implement. However, I think this should be done for all bulk sensors and implemented in bulk_sensor.py . I'll put together a separate PR for discussion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll wait to refactor on top of that PR


// hx71x ADC query
void
hx71x_read_adc(struct hx71x_adc *hx71x, uint8_t oid)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment on using 4-bytes when 3-bytes is sufficient. Same comment on not invoking shutdown().

Comment on lines 164 to 199
int32_t counts = 0;
hx71x_time_t start_time = timer_read_time();
for (uint_fast8_t sample_idx = 0; sample_idx < 24; sample_idx++) {
hx71x_pulse_clocks(hx71x);
// read 2's compliment int bits
counts = (counts << 1) | gpio_in_read(hx71x->dout);
}

// bit bang 1 to 4 more bits to configure gain & channel for the next sample
for (uint8_t gain_idx = 0; gain_idx < hx71x->gain_channel; gain_idx++) {
hx71x_pulse_clocks(hx71x);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a minor note, this code could be optimized a bit. Something to look at after merging though.

src/sensor_hx71x.c Outdated Show resolved Hide resolved
src/sensor_hx71x.c Outdated Show resolved Hide resolved
src/sensor_hx71x.c Outdated Show resolved Hide resolved
@garethky
Copy link
Contributor Author

I have a busy few weeks coming up and I'm not sure when I'm going to get to all the comments. If I can grab a few minutes here and there ill knock one out and just tack on a commit. When its all done we can squash git history into 2 commits as requested.

I'm not sure how I'm going to prove the sensor reset stuff works. I cant force an error on command without adding extra code. I might put in a counter that forces a reset from the MCU side every 30 seconds or something and then take that our for the submission. Will probably save this for last.

@KevinOConnor
Copy link
Collaborator

Thanks. I'm not sure what the "sensor reset" stuff is.

There were a bunch of stuff discussed above, but putting aside various code commentary, I'd say only the following are "blockers":

  1. This PR should be rebased onto the latest Klipper master (which contains the bulksensor.py FixedFreqReader refactor).
  2. The mcu bulk sensor code should not call shutdown(). It should check for errors and propagate those errors to the host code.
  3. Don't modify setup_query_command() in bulk_sensor.py.
  4. The mcu "pending flag" code didn't look correct to me. Either I'm missing something or the code should be corrected. (This may already be done.)
  5. The code should be split into 2 commits, but I can do that if it's a pain for you to do that.

-Kevin

@garethky
Copy link
Contributor Author

I'm not sure what the "sensor reset" stuff is.
Well we need to agree on this so I know what to do.

Looking at the accelerometer code, it seems to ignore the errors. I don't see where AccelCommandHelper or ResonanceTester actually check if the data contains errors. This "carry on" approach can't work for load cells.

Here is a primer on the problem:

These chips fail at runtime due to ESD. The big metal load cell acts like a ground path and they are very susceptible to static. This causes the chip to reboot, erasing any settings it had from klippy startup. Continuing to sample after an error wont recover the sensor. Even if it does produce data it is going to be the wrong channel/sample rate/gain etc. So once the chip resets the data stream is dead.

Some chips in this class send a reboot flag with their measurements but these chips are too cheap for that. So all of these checks are heuristics to detect a reset. Even taking too long to collect a sample can change the settings on the HX71x so the next sample is garbage.

When an error is detected we have to stop sampling, re-program the chip and restart sampling. I avoided doing this because its complex. and when installed in a properly grounded printer I don't see these errors. But I'm sure at a large scale they will happen at some non zero rate.

I agree with you that sending the errors to the Host is the right thing to do. But if all the Host does it shutdown on error its a marginal improvement. Ideally I'd like to try and recover so that a stray ESD doesn't ruin a print.

@KevinOConnor
Copy link
Collaborator

Thanks. I can only give high-level comments as I'm not familiar with the details of these sensors - so take my comments with "a grain of salt".

I don't see where AccelCommandHelper or ResonanceTester actually check if the data contains errors

I was referring to how src/sensor_adxl345.c:adxl_query() checks for a read error and populates a response of 0xff 0xff 0xff 0xff 0xff, and how klippy/extras/adxl345.py:_convert_samples() looks for that code and drops the sample (and increments the _process_batch() error counter). Also see how klippy/extras/ldc1612.py:_convert_samples() checks for warning flags in the top 4 bits of the sample, increments the "errors" counter, but continues to process the data.

When an error is detected we have to stop sampling, re-program the chip and restart sampling. I avoided doing this because its complex. and when installed in a properly grounded printer I don't see these errors.

For what it is worth, as a user, I'd not want any automatic error recovery like that. To me, restarting the sensor sounds like it is very likely to lead to overall poor measurement results. It may be possible to restart the sensor, but there is a very high chance the resulting values will not be of high quality - the nozzle/bed contact may have been missed or distorted, the timing correlation is likely off, etc. So, it may be possible to restart the sensor, but there is a very good chance I'll get a terrible first level layer height. As a users, I'd prefer the action to be stopped and to receive an error that says "your hardware is malfunctioning and needs to be fixed", so that I can then fix the hardware and go on to get great first layer quality.

As a developer, I would also prefer not to maintain complex code that, to wit, "makes it easy for users to get bad results".

So, my concern is not with reporting of errors, it is with using shutdown() to convey all errors. Shutdown is useful for internal errors (to alert developers of a coding error), for safety issues (heater outside temperature range), and for situations where the mcu is hopelessly unable to perform the requested actions (eg, very far behind on processing of scheduled timed actions). For a malfunctioning sensor, we ideally want to stop the action (eg, "sorry we couldn't perform that probe because the load-cell sensor is reporting a ground fault"), but ideally we don't want to completely turn off the motors/heaters with an arcane diagnostic message (eg, "ADS1220: Possible bad read").

Again, I can't give definitive advice on how to handle errors for these sensors. It seems to me though, that if it's a questionable sensor reading we would want to mark it as a questionable reading and let the high-level code determine what action to take. If the sensor gets in a state where it can't produce any readings, then we can signal that as well (so that the high-level code knows it wont get further readings and can raise an appropriate error message).

For this initial PR of sensor support, with a goal of getting sensor data to the api server, a shutdown() on questionable data seems a drastic response.

Thanks again. Hope that helps.
-Kevin

@garethky
Copy link
Contributor Author

Thanks, I think I can work with that guidance.

As a users, I'd prefer the action to be stopped and to receive an error that says "your hardware is malfunctioning and needs to be fixed", so that I can then fix the hardware and go on to get great first layer quality.

This is going to happen via the LoadCellEndstop. When the sensor stops reporting data during a homing event, its a critical failure. The LoadCellEndstop has a watchdog timer and if no data is received for 2 sample windows it will issue a SHUTDOWN. (not in this PR)

I think an axis moving without a functioning endstop counts as a SHUTDOWN "for safety issues", this is just like a thermistor becoming unplugged.

We don't have to shutdown in the sensor code to achieve that, so they will be removed from this PR.

As a developer, I would also prefer not to maintain complex code that, to wit, "makes it easy for users to get bad results".

I also don't want to implement this due to the complexity. (Users should ground their printers!!)

The next best thing is to just stop measurements when the Host gets an error from the sensor. So inside the sensor it will not re-schedule the sampling timer on error. On the Host, I can flush the error to clients and then run the finish measurements logic which will disconnect all of the consumers from bulk_sensor and run the shutdown logic on the MCU sensor. (I don't think bulk_sensor informs consumers that they were disconnected, but in any case they will get the error first)

For something non-critical like a Filament scale it should be possible to restart measurements without halting the print. Or just let the filament scale be broken until the Host restarts. Ideally a non-critical use shouldn't ruin a print.

@KevinOConnor
Copy link
Collaborator

I think an axis moving without a functioning endstop counts as a SHUTDOWN "for safety issues", this is just like a thermistor becoming unplugged.

A bit off topic here, but the best way to rapidly stop a homing/probing operation is to issue a trsync_do_trigger() with an error code as the "reason". It's faster than shutdown() for multi-mcu homing (as it is propagated between mcus without entering the Python code), and it also allows for better error reporting.

-Kevin

@garethky
Copy link
Contributor Author

It's faster than shutdown() for multi-mcu homing (as it is propagated between mcus without entering the Python code), and it also allows for better error reporting.

ahh good to know. Will get that fixed up.

@garethky garethky force-pushed the bulk-adc-sensors branch 2 times, most recently from 2179b87 to 45fc1be Compare May 1, 2024 03:36
@garethky
Copy link
Contributor Author

garethky commented May 1, 2024

I got the C part in, shutdowns are gone. When an error happens the sensor just turns off its pending flag and flushes the error. That will be the last packet it sends before its restarted (if its ever restarted).

I used the possible_overflows fields to send back an error code, since that free payload that these sensors don't use.

I have to do the re-base to mainline before I can tackle the Python side

@garethky garethky force-pushed the bulk-adc-sensors branch 3 times, most recently from 0d3bcb3 to 8d7acd7 Compare May 7, 2024 00:11
@garethky
Copy link
Contributor Author

garethky commented May 9, 2024

I rebased and now the ROM for stm32f042 is too large. How are we deciding what to cut from these space limited chips?

@KevinOConnor
Copy link
Collaborator

How are we deciding what to cut from these space limited chips?

There's no documented strategy - generally we disable whatever new feature caused the build to go over in size. It's really useful to do build all the targets that we have config files for - it catches lots of inadvertent errors. There isn't a need for these smaller chips to support every possible option though.

-Kevin

@cab404
Copy link
Contributor

cab404 commented May 19, 2024

rn both revisions fail on imports :(

should I start worrying and trying to finish this pr, or am I missing something obvious?)

@garethky
Copy link
Contributor Author

rn both revisions fail on imports :(

should I start worrying and trying to finish this pr, or am I missing something obvious?)
Sorry, is this comment related to this PR (Bulk ADC Sensors)?

@cab404
Copy link
Contributor

cab404 commented May 19, 2024

rn both revisions fail on imports :(

should I start worrying and trying to finish this pr, or am I missing something obvious?)
Sorry, is this comment related to this PR (Bulk ADC Sensors)?

Yes, and I mean both commits in the branch

@garethky
Copy link
Contributor Author

Yes, and I mean both commits in the branch

I just need to get some time to work on it. Looks like I have to exclude the hx71x sensor from all the places that would support it but don't have enough space left in their image to do so. I don't have the local tooling on my Mac setup to test this so I'm doing commits and having github do the testing.

@cab404
Copy link
Contributor

cab404 commented May 20, 2024

I don't want to rush you, sorry if that sounded like it — I just wanted to ask if any help is needed.

Thank you for working on this!

@KevinOConnor
Copy link
Collaborator

Okay, thanks.

I had a couple of additional ideas you may want to consider. I added a few commits to the branch at https://github.com/KevinOConnor/klipper-dev/tree/work-hx71x-20240611 . Briefly:

  1. If we do want to use dynamic timing, then I think we could probably do 1/10th of a nominal period for standard retries, 8/10ths of a period after sample detection, and 4/10ths of a period if an overflow may have occurred.
  2. We could use 0x40000000 as an error indicator (instead of 0x80000000) as that should still be unique and it avoids confusing issues with the sign bit.
  3. It would be nice if we exported the adc values over the API Server as floating point numbers (instead of counts). I understand you need counts in other parts of the code, but it should be trivial to convert back to a count from a float (int(adc * (1<<23) + .5)).

Just ideas to consider.

Cheers,
-Kevin

@garethky
Copy link
Contributor Author

#1: I'm scared this is getting complicated

The place this would help is when homing, it would make the trigger delay as short as possible, reducing overshoot. 1/10th the sample period is ~300us at 320SPS and 1250us at 80SPS. But the pin checks for both sample rates take the same amount of time. What may be optimal is to set a fixed interval in microseconds for pin checks for all sample rates, (e.g. 100us). Then, say you want to do 3 pin checks, on average, before seeing a signal.

#define PIN_CHECK_INTERVAL timer_from_us(100)
#define PIN_CHECK_WINDOW PIN_CHECK_INTERVAL * 3
rest_ticks_nominal = (hx71x->rest_ticks * 10) - PIN_CHECK_WINDOW;
rest_ticks_retry = PIN_CHECK_INTERVAL;
rest_ticks_overflow = hx71x->rest_ticks * 4;

On the ADS1220 it can do 2000 SPS. 1/10th of its interval is 50us, so maybe the PIN_CHECK_INTERVAL needs to be shorter than I put here.

I also though about making the PIN_CHECK_WINDOW adaptive. If the sample is ready on check 1 then the window should have been larger. If its ready on sample 5 then it should get smaller... but that is getting complicated. Lower MCU load and faster sample rates are much better answers to these problems.

#2 OK with me!

In my head I though that I had implemented only sending back 24 bits per sample. But given that its 32bits (and you are OK with that) we can totally use the high bits for error codes.

#3 Would you be OK with just sending both?

samples[count] = (round(ptime, 6), round(val * adc_factor, 9), val)
The conversion is sensor dependent (the next sensor doesn't have to be 24 bits, could be 16, 32 etc.). I didn't want sensor specific code in the load_cell class. And my concerns with the float value are:

The floating point value doesn't properly convey the saturation point of the sensor :

>>> adc_factor = 1. / (1<<23)
>>> max = 0x7FFFFF # max positive value
>>> round(max * adc_factor, 9)
0.999999881

So checking for val >=1.0 or val <= -1.0 doesn't work. Deciding how much "wiggle room" to use to detect saturation would be sensor dependent.

Secondly rounding discards data, e.g. 9 digits isn't enough to represent the precision of a 32 bit sensor:

>>> adc_factor = 1. / (1<<31)
>>> round(1 * adc_factor, 9)
0.0

@KevinOConnor
Copy link
Collaborator

1 - I was thinking of this:

static uint_fast8_t
hx71x_event(struct timer *timer)
{
    struct hx71x_adc *hx71x = container_of(timer, struct hx71x_adc, timer);
    uint32_t rest_ticks = hx71x->rest_ticks;
    if (hx71x->pending_flag) {
        hx71x->sb.possible_overflows++;
        rest_ticks *= 4;
    } else if (hx71x_is_data_ready(hx71x)) {
        // New sample pending
        hx71x->pending_flag = 1;
        sched_wake_task(&wake_hx71x);
        rest_ticks *= 8;
    }
    hx71x->timer.waketime += rest_ticks;
    return SF_RESCHEDULE;
}

Which I don't think is overly complex. I don't think we need to worry about responsiveness as we'll already be delayed by several milliseconds due to filtering (and worst case multi-mcu homing situations). So, no point in optimizing for microseconds. I also think it is fine to hardcode the x4 and x8 in the mcu - the host and mcu code are tightly coupled anyway.

If you think the above is too obscure or too complex then I think the simple unchanging rest_ticks=.5 * nominal_period is fine.

2 - okay.

3 - I'm fine with it any way - counts, float, or both. It's not a big concern. FWIW, I don't know of any adc sensor that's more accurate than about 15bits in practice, so I don't think precision really matters. (Sensors claiming more accuracy are just using creative marketing as far as I'm aware.)

Cheers,
-Kevin

@garethky
Copy link
Contributor Author

I got a change stack built up today. I got the machine to home but not probe. I need to work through all the refactoring that have happened in probe.py, looking at probe_eddy_current.py for guidance. Experimenting with jj to make the commit stacking easier, so far its better than Git.

@garethky garethky force-pushed the bulk-adc-sensors branch 4 times, most recently from e23d022 to 6aec417 Compare June 17, 2024 07:16
@garethky
Copy link
Contributor Author

garethky commented Jun 26, 2024

I've had a chance to try artificially breaking the sensor and forcing the host to restart it and this worked just fine. It goes into the error state on the MCU and just keeps broadcasting errors until it gets reset. This dovetails with the later endstop implementation where it wont pass any data to the endstop once it enters the error state. I still need to do this testing on the ADS1220.

Once the HX71x sensors encounter a detectable error I don't think we should trust its results until its reset and thats what is implemented. If there is a desync the sensor can produce results that are from a different channel than configured and that could get passed to the endstop as valid data to prolong a homing move and damage the machine. On the ADS1220 I've seen it just produce garbage bits when reading the settings registers so I don't trust its output either.

I have a series of branches that build up the upcoming load cell PR sequence (pr-load-cell-gram-scale, pr-load-cell-endstop, pr-probe-load-cell)) and I have test machine printing again on top of the last one.

@garethky garethky force-pushed the bulk-adc-sensors branch from 6aec417 to bd722b9 Compare July 8, 2024 03:34
@garethky
Copy link
Contributor Author

garethky commented Jul 8, 2024

The latest update has a small change to not return an empty report when an overflow is encountered and instead return whatever data was reported.

I've created a test branch that can probe with the sensor set to fail every n seconds: test-sensor-errors. I have the load cell endstop set up to trigger the trsync with an error code now instead of a shutdown.

There are several failure modes and the errors they generates are not always clearly of the form "your sensor had an error". Probably the most insidious one happens when the sensor resets and then homing starts. This causes the tearing code to get no data for the first bit of the tear, resulting in not enough samples being collected before it times out. But since the reset already happened the tear process doesn't get any errors to explain the root cause. I need to consider the sensor "offline" if it sends out an error packet and only consider it back "online" when the next non-error packet is sent.

@KevinOConnor
Copy link
Collaborator

Thanks. What's the status of this PR? Is it ready for merging?

I have a few questions/comments about the latest implementation. These can be adjusted post-merge if changes are needed.

  • It's not a good idea to call logging.error() in _convert_samples() as doing so could flood the log (possibly escalating a minor error into a severe error).
  • Is it necessary for the host to "restart" measurements on an error? Seems like things would be simpler if the host raised an error and stopped measurements on an error (the data is suspect after that point anyway).
  • Why does the ads1220 check for ERROR_READ_TOO_LONG? I understand why hx71x does (a bad read could mess up the transfer protocol), but I don't see what would go wrong on the ads1220. Seems it would just make a minor error (a lost sample) into a notable error (extended sensor outage).
  • I don't think sampling should be automatically started on klipper startup.

Thanks again,
-Kevin

@KevinOConnor
Copy link
Collaborator

Also, just "thinking out loud" - have you considered halting any ongoing homing on a sensor error, while still allowing data reports to continue? (That is, continue to populate the bulk sensor reports, but immediately notify the homing code that it should abort.)

-Kevin

@garethky
Copy link
Contributor Author

  • have you considered halting any ongoing homing on a sensor error, while still allowing data reports to continue? (That is, continue to populate the bulk sensor reports, but immediately notify the homing code that it should abort.)
    That's the way the first version worked. I decided this was more complex, requiring an extra call to report errors. And if there is some bug that doesn't report errors (as might happen in development) then it would just crash the toolhead. So I ended up with just the watchdog timer on the MCU.

  • I don't think sampling should be automatically started on klipper startup.
    Please explain what you have in mind.

My idea is that this works like a temperature sensor. They all start when klipper starts. The load_cell object reports its recent gram weight as part of its status. For the folks that want to use this to measure the weight of something during a print (like a spool) this is the behavior they expect.

@KevinOConnor
Copy link
Collaborator

I don't think sampling should be automatically started on klipper startup.

Please explain what you have in mind.

This PR only exports the loadcell information via the API Server. If a client connects to Klipper and requests loadcell information then sampling is automatically started at that point. Thus, there is no reason to start sampling at Klipper startup.

I expected sampling to occur when there are users of the data, and sampling to be disabled when there are no users. See klippy/extras/ldc1612.py as an example.

Cheers,
-Kevin

@garethky
Copy link
Contributor Author

garethky commented Jul 13, 2024

I expected sampling to occur when there are users of the data, and sampling to be disabled when there are no users. See klippy/extras/ldc1612.py as an example.

hx71x and ads1220 fulfill that contract as you desire, they are comparable to ldc1612. load_cell is a different kind of object, its comparable to temperature_sensor or heaters. If this isn't obvious I suggest that we remove load_cell from this PR and just focus on the sensor code. load_cell is the whole focus of the next PR in the series anyway.

I'll look at just having the sensor stop sampling and having load_cell detect that and trigger a sensor restart.

Does that sound like a good path forward?

@KevinOConnor
Copy link
Collaborator

Thanks. My main question is - do you feel this PR is ready for merging? If so, I think we'd be better off merging as is and then look to make any remaining changes on top of it.

If this isn't obvious I suggest that we remove load_cell from this PR and just focus on the sensor code.

Well, I'd prefer to merge this PR before going on to the main loadcell work. I'd also prefer for PRs to provide some user measurable benefit, and providing the ads1220/hx71x measurements via the api server (as the current stubbed out loadcell.py code does) would meet that minimum goal.

I'll look at just having the sensor stop sampling and having load_cell detect that and trigger a sensor restart.

I'm not sure I understand this, but FWIW it sounds complex.

Cheers,
-Kevin

@garethky
Copy link
Contributor Author

Well, I'd prefer to merge this PR before going on to the main loadcell work. I'd also prefer for PRs to provide some user measurable benefit, and providing the ads1220/hx71x measurements via the api server (as the current stubbed out loadcell.py code does) would meet that minimum goal.

Great! Lets do that.

Is it necessary for the host to "restart" measurements on an error? Seems like things would be simpler if the host raised an error and stopped measurements on an error (the data is suspect after that point anyway).

I'll look at just having the sensor stop sampling and having load_cell detect that and trigger a sensor restart.

For this PR I mean to just have the sensors stop sampling on error and terminate all clients. Easy enough.

I'll get your other concerns (logging and the ERROR_READ_TOO_LONG check) addressed as well.

@KevinOConnor
Copy link
Collaborator

Thanks. I did not fully understand your last response. Would you prefer that I merge what is here now, or would you prefer that I wait for additional changes?

-Kevin

@garethky
Copy link
Contributor Author

Thanks. I did not fully understand your last response. Would you prefer that I merge what is here now, or would you prefer that I wait for additional changes?

-Kevin

I am working on this today. Will post changes by end of day.

@garethky garethky force-pushed the bulk-adc-sensors branch 2 times, most recently from 62939ce to f97365e Compare July 21, 2024 03:35
garethky added 2 commits July 20, 2024 20:37
* Create the load_cell host module skeleton to create the sensors and start taking samples.
* Add support for the HX717 and HX711 ADC sensors.

Signed-off-by: Gareth Farrington <gareth@waves.ky>
Add support for the ADS1220 as an alternative to HX71x that supports SPI and higher sample rates.

Signed-off-by: Gareth Farrington <gareth@waves.ky>
@garethky
Copy link
Contributor Author

I'm good to merge.

I looked at raising an error in the hx71x code but I left it as-is. When an exception is raised the Batch Bulk Helper drops all the clients but doesn't notify them they were dropped. I think its better to notify clients with errors and make them responsible for deciding if the sensor is stable enough to use.

  • Cleaned up the error logging in hx71x.
  • consecutive_fails is reset to 0 when a payload contains no errors
  • Payload with errors from hx71x only has the first error reported, ignoring the duplicates.
  • Removed the time check around the SPI call in the ADS1220 and deleted the associated error handling code.
  • Cumulative errors and overflows are reported, to match what other sensors are doing.

@KevinOConnor
Copy link
Collaborator

Thanks. I modified the commit messages of the two commits here and committed them. (I changed the first line of each commit to be in the standard module: description format.)

I also committed two additional changes on top.

Commit 13c75ea fixes the Config_Reference.md chapter hierarchy.

Commit cb15d0f alters load_cell.py so that it does not automatically start sensors at startup and does not export a dummy get_status() method. Both can be added back during future commits as future functionality requires them. I'm just a little concerned about subtle regressions, so I'd prefer to not take these actions prior to them being utilized.

Thanks again.
-Kevin

@EllaFoxo
Copy link

EllaFoxo commented Sep 4, 2024

Just wanna say @garethky - Thank you! I've been following your work for absolutely ages. Hardware to follow 😄

@bwnance bwnance mentioned this pull request Sep 18, 2024
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants