Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Klipper support #263

Closed
leptun opened this issue Oct 8, 2020 · 105 comments · Fixed by #274 or #282
Closed

[FEATURE] Klipper support #263

leptun opened this issue Oct 8, 2020 · 105 comments · Fixed by #274 or #282
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed upstream Something that should be fixed or submitted upstream
Milestone

Comments

@leptun
Copy link
Collaborator

leptun commented Oct 8, 2020

@vintagepc I'd like to get Klipper working in the SIM as I need to do some comparison to PrusaFirmware. Only problem is that it currently doesn't quite work. The connection to the host software times out. I checked in uart_pty.cpp and data is received and sometimes even read by the code, but at some point the XOFF hook is engaged and the firmware gives up entirely. I'm quite unsure where to start looking at this issue. I suspect that there might be some unimplemented function that Klipper uses and Marlin/Prusa-Firmware doesn't, but I would expect some kind of warning for that. There might also be a deadlock in the firmware, but I don't know how to test for that. Before we had the "D" key for getting the PC, but now that option is gone and you have to use GDB (no idea how to do this) to get the position in the code where it locks up.

To get klipper running I followed the following tutorial: https://github.com/KevinOConnor/klipper/blob/master/docs/Installation.md
I also had to do some python2 trickery since I was using Ubuntu 20.04 and had to use some renamed packages.

MK404 run command: ./MK404 Prusa_MK3S -m -s -f ~/klipper/out/klipper.elf.hex. -n didn't help.
Configuration file that I use on my MK3S: printer.cfg.zip

I hope that you will find some free time to help me with this issue/request

@leptun leptun added enhancement New feature or request help wanted Extra attention is needed labels Oct 8, 2020
@vintagepc
Copy link
Owner

I can take a look at this - can you post the ELF you built? That will potentially save me some initial setup time figuring out how to build it.

I'll trade you - if you have some time to look at #253 and verify the Allegro behaviour against hardware behaviour for me it would be much appreciated and I can close out that too - I seem to remember you do have a MK2.

@vintagepc
Copy link
Owner

Also, I drafted this quick how-to on debugging AVR firmware with gdb:
https://github.com/vintagepc/MK404/wiki/Debugging-Printer-Firmware-with-MK404

@leptun
Copy link
Collaborator Author

leptun commented Oct 8, 2020

Here you go. Not sure how helpful this will be since you also need the host software for it to work properly (and maybe octoprint as well if you actually want to test anything). Building it is quite easy especially since the host software and avr firmware are bundled together. Fot the AVR firmware, the default configuration in make menuconfig is ok.
klipper.elf.zip

Eh... I do have a mostly stock MK2.5S on which I can do some testing, but it will take a bit till I have access to it since my high school got closed for 2 weeks because of corona. I can look at what went wrong though and maybe I'll find the issue.

I'll try to follow the Debugging instructions and maybe I'll find something interesting/useful

@vintagepc vintagepc added this to the Wishlist milestone Oct 8, 2020
@vintagepc vintagepc removed the Wishlist label Oct 8, 2020
@vintagepc
Copy link
Owner

Much appreciated. On that other issue it essentially looks like the MS inputs might just need to be inverted but that doesn't seem right as it would cause it to step only at full-step resolution (but my inspection of the firmware suggests it should be running at 1/16th.)

@vintagepc
Copy link
Owner

Looks like the Klipper FW is stuck in a WDR loop for me - If you run MK404 with -vvv you get endless

WATCHDOG: timer fired.
WATCHDOG: timer fired.
WATCHDOG: timer fired.
WATCHDOG: timer fired.
WATCHDOG: timer fired.
WATCHDOG: timer fired.

That might explain why it won't start... :)

@vintagepc
Copy link
Owner

I did some more digging... looks like klipper isn't responding properly. I don't see any "uart_pty_in_hook" log messages that indicate the AVR is sending serial data. Only incoming data from the host.

@leptun
Copy link
Collaborator Author

leptun commented Oct 9, 2020

It’s not sending data back, but at least it’s reading it up to some point.
Doesn’t klipper actually use the wdt? That would be interesting.

@leptun
Copy link
Collaborator Author

leptun commented Oct 9, 2020

@vintagepc Interesting detail that might help you: klipper actually uses also the ReadyToSend interrupt, not only the DataReceived interrupt. Maybe that one is broken.

@vintagepc
Copy link
Owner

unfortunately that may doom this from the start.

I'm hoping it might be possible to just bypass that (since the PTY is essentially always able to receive data) but that won't work if klipper expects to be able to use that signal as a means of controlling the device.

RTS/CTS are hardware control lines, and much like DTR cannot be simulated with a PTY. We'd have to write a fake serial
port device driver with those IOCTL calls, and that would immediately break our compatibility with anything but Linux.

Another possibility would be to find a way to bind the sim to a real serial port and set up some sort of serial loopback connection (which would require two ports and physical hardware)

Option 3 is hacking up both projects to implement some sort of out-of-band flow control to mimic RTS/CTS.

@leptun
Copy link
Collaborator Author

leptun commented Oct 9, 2020

Wait... I think I sent you on a wrong path. I accidentally used the Serial equivalent names instead of the actual interrupt vector names.
USARTx_RX_vect is used for receiving data.
USARTx_UDRE_vect is used for transmitting data.
Sorry for causing confusion. The atmega2560 doesn't even have RTS/CTS support. The only chip I know that supports this is the atmega32u4 and similar.

@vintagepc
Copy link
Owner

hmmm.. I may need to look at that in depth then. I know for the MK3 I had to implement a workaround because the UDRE bit was not cleared after changing the serial port config - and it would hang on boot. I wonder if there is a similar occurrence here and it gets stuck high too.

The other thing I noticed is it expects the baud rate to be much higher than default. It shouldn't cause any issues with PTYs, but could be that it's not handled correctly somewhere.

@leptun
Copy link
Collaborator Author

leptun commented Oct 9, 2020

Klipper has a workaround for this UDRE issue:

// Tx interrupt - data can be written to serial.
ISR(USARTx_UDRE_vect)
{
    uint8_t data;
    int ret = serial_get_tx_byte(&data);
    if (ret)
        UCSRxB &= ~(1<<UDRIEx);
    else
        UDRx = data;
}

Btw, the MK3 firmware doesn't actually use the UDRE interrupt. Instead it just does the polling method and blocks the main loop.
The baud rate should not be a limiting factor, especially since we pipe data with no delay.

@vintagepc
Copy link
Owner

Makes sense - though whether we are polling or using the interrupt to wait for it to go low... the problem is the same if it never does 🤣

@leptun
Copy link
Collaborator Author

leptun commented Oct 9, 2020

printer.cfg.zip
Updated config that adds the ambient sensor on the einsy (according to latest push to master)

@leptun
Copy link
Collaborator Author

leptun commented Oct 9, 2020

Just tried writing some basic arduino code with which I could test the serial.c from klipper.
I am disappointed to report back that it works as it should (both TX and RX), so there might be something else that breaks the firmware.

@vintagepc
Copy link
Owner

yes, that's both good news and bad news.

When I inspected it yesterday, the place the firmware was looping was in sched.c

                do {
                    irq_wait();
                } while (tasks_status != TS_REQUESTED);

@leptun
Copy link
Collaborator Author

leptun commented Oct 9, 2020

That is the main sleep loop. It runs continuously until an interrupt occurs that changes the tasks_status to TS_REQUESTED with sched_wake_tasks(). One interrupt that executes this function is the UDRE vector when a full command is received.
I have some suspicions regarding the irq_wait() function. What it does is it does asm("sei\n nop\n cli" : : : "memory");. I don't know about you, but maybe simavr fails to successfully trigger the existing interrupts in the right order in that tiny window. It's also interesing that the sleep function disables all interrupts and only does the small nop window instead of having the interrupts always enabled.

Aaaaand I just attempted to cli() the whole arduino code and used irq_wait() and now it won't send data back. If I extent the nop section (like a lot) it works and the connection is established in klipper as well with the patch... and then MK404 crashes :(
modified irq_wait function:

static inline void irq_wait(void) {
//    asm("sei\n    nop\n    cli" : : : "memory");
#include <avr/cpufunc.h>
  sei();
  _NOP();
  _NOP();
  _NOP();
  _NOP();
  _NOP();
  _NOP();
  _NOP();
  _NOP();
  _NOP();
  _NOP();
  _NOP();
  _NOP();
  _NOP();
  cli();
}

To me this really looks like a simavr bug since the real hardware should handle this correctly, but simavr doesn't and needs a hack. This many NOPs are excessive, but at least it's a workaround.
MK404 crash error:

terminate called without an active exception
Aborted (core dumped)

Updated elf files:
klipper.elf.zip

@vintagepc
Copy link
Owner

I recall reading something about there being a built-in 2 instruction delay surrounding interrupt enable/disable.... The other thing that comes to mind is the way SimAVR terminates; there's a special case where if you sleep with interrupts disabled, it will exit the AVR run routine. So perhaps this is in some way falling afoul of that.

The crash sounds like we hit an assertion failure and/or an over/underflow somewhere. But definitely a starting point for figuring out what is up.

Sounds like there might be an opportunity for an upstream simavr patch out of this too.

@vintagepc vintagepc added the upstream Something that should be fixed or submitted upstream label Oct 9, 2020
@vintagepc
Copy link
Owner

vintagepc commented Oct 9, 2020

found it
buserror/simavr#376

The example code snippet looks pretty darn similar to what you posted above...

@leptun
Copy link
Collaborator Author

leptun commented Oct 9, 2020

Btw, simavr crashes after the connection is established. The LCD is initialized and the steppers are disabled and the fans spin a bit. I suspect that the simulator crashes while enabling some functionality of the MCU. The elf file provided is more or less generic. The firmware only knows it's pinout once the connection is established

@vintagepc
Copy link
Owner

👍

the -vvvv argument may prove useful there; that will print some internal logging channels that can indicate whether it's on our end or a SimAVR thing.

@leptun
Copy link
Collaborator Author

leptun commented Oct 9, 2020

klippy.log
Just for reference

@leptun
Copy link
Collaborator Author

leptun commented Oct 9, 2020

@vintagepc How can I make the GL information stop being reported when using -vvvv? Even if I ignore the GL stuff, nothing extra is printed when simavr crashes

@vintagepc
Copy link
Owner

vintagepc commented Oct 9, 2020

There's an if() in MK404.cpp that checks for the verbosity and calls GLDebugcallback to get notified on GL logging events. You can just comment that out temporarily.

Sounds like the issue is probably on our end then; just running with GDB/debug mode should break at the offending line. I'm betting it's the GSL library throwing an OOB or a narrowing conversion somewhere

@vintagepc vintagepc added this to the v1.1 milestone Oct 17, 2020
@3d-gussner
Copy link
Collaborator

3d-gussner commented Oct 26, 2020

I have been able to print ONE simple print with MK404 and Klipper, the rest just fails.

Here my MK404 Klipper configs:

  • gcodes
  • finished print ply file
  • failed print ply file
  • Klipper "printer.cfg" from @leptun
  • PrusaSlicer 2.2 Klipper config
  • Klipper hex file
  • Klippy.log files

I got one print finished after:

  • Deleting Prusa_MK3S_eeprom/flas/xflash.bin files in MK404 (not sure if that helped)
  • restarting Klipper/Octoprint service before every print
  • Minimizing MK404 both windows
  • Do nothing else on computer

Klipper version commit 2bcf06a295e5c9696b1bf556a0520b49957a369f
MK404 version commit 208ef25

@KevinOConnor Do you have any idea what I can do to get it working?
@vintagepc As my PC isn't the most performant one and a 2h real time 3D Benchy takes 18+h to finish in MK404, do you have any idea how we could improve the performance?

MK404-Klipper.zip

@3d-gussner
Copy link
Collaborator

Short update: Even without -g lite Klipper crashes after short time on the 3D Benchy print.

@vintagepc
Copy link
Owner

Looks like your files did not attach, the link is invalid.

Was MK404 built in release mode? (-DCMAKE_BUILD_TYPE=Release)? It is much more performant with that.

@3d-gussner
Copy link
Collaborator

Upload fixed.
I could print a 3D Benchy up to Z2.80mm and it crashed again.

@3d-gussner 3d-gussner reopened this Oct 26, 2020
@vintagepc
Copy link
Owner

Hmm... "Move queue empty" is indeed a new one. I'm not sure what would cause that, since there isn't a serial bottleneck like there is on real hardware... clock frequency looks good and should be stable enough for klipper to not hit timer-related errors.

I'm not an expert in Klipper logs but it almost looks like it's having issues reading the source .gcode file on the klipper host.

@KevinOConnor
Copy link

A "Move queue empty" error indicates the host was not able to make accurate predictions of the mcu clock. (Specifically, the host is tasked with queuing movement only when the mcu has space for it - if the host can't predict when the queue entries will become free it will result in a "move queue empty" error.)

-Kevin

@vintagepc
Copy link
Owner

OK, still a clock issue then. Perhaps our clock skew correction is not granular enough and there are still local excursions.
I'll play with the skew correction period and see if I can narrow the variance without incurring too much additional overhead from the clock checking.

@vintagepc
Copy link
Owner

vintagepc commented Nov 1, 2020

@KevinOConnor Feels like we are close.

I managed to get lucky and get 30% of a benchy before it failed with "lost communication". However, I suspect that was due to hitting the preallocated vector size and the simulator needing to allocate more space. (The fact we can get that far, yet sometimes it fails right on the first layer leads me to think we may be riding right on the edge of sufficient timing synchronization.)

Forgive me if this is a stupid question in the context of how Klipper operates - but is it possible to have it queue up a few more entries in advance with the assumption they will happen at exactly the predicted time and reduce the worry about divergence from the expected timing?

That combined with some improvements to the skew correction I am working on may be sufficient to help it smooth over the bumps. Or - if you have other ideas for liberties we could take since it is a simulated environment then I'm all ears. An out-of-band feedback loop of some kind might also work.

Ugly as it is I've stopped trying to sleep to match simulation time to wall time; I'm getting a more stable clock by spinlocking and rechecking the clock until we meet or exceed the wall time.

For @3d-gussner since you were trying to get a Benchy in HRQ mode - here's 30% of one...

Export.ply.zip

@KevinOConnor
Copy link

KevinOConnor commented Nov 1, 2020

is it possible to have it queue up a few more entries in advance with the assumption they will happen at exactly the predicted time and reduce the worry about divergence from the expected timing?

The "move queue empty" error is a bit misleading - it probably should report "move queue full". The error indicates that the host tried to queue another move on the micro-controller, but there was no available memory to store it. The host is tasked with only queuing moves when there is space for them (by tracking when old moves complete).

As to your question though, could the host not use as much mcu memory so it doesn't hit the limit? The host code does try to give sufficient extra time to avoid that. The code is at https://github.com/KevinOConnor/klipper/blob/cef9cc29b8f150681cd24ca6f051fc9f03fc924b/klippy/clocksync.py#L115 - it adds TRANSMIT_EXTRA (.001s) plus three times the measured standard deviation of the frequency.

FWIW, in my previous tests with simulavr, I didn't find tweaking that extra time helped with stability.

That combined with some improvements to the skew correction I am working on may be sufficient to help it smooth over the bumps. Or - if you have other ideas for liberties we could take since it is a simulated environment then I'm all ears. An out-of-band feedback loop of some kind might also work.

Can you describe the algorithm you are using to pace the clock?

Ugly as it is I've stopped trying to sleep to match simulation time to wall time; I'm getting a more stable clock by spinlocking and rechecking the clock until we meet or exceed the wall time.

FWIW, I got around that in my simulavr code by using an offset in the usleep time - that is, usleep() will always sleep a little more than requested, but it's harmless to always sleep slightly less to account for that: https://github.com/KevinOConnor/klipper/blob/cef9cc29b8f150681cd24ca6f051fc9f03fc924b/scripts/avrsim.py#L102

Separately, does simavr use accurate timing for serial port reads? That is, on a 16Mhz avr with 250000 baud, will it take at least 640 cycles for each byte to be read and transmitted? If not, that could also be messing up Klipper's timing.

-Kevin

@vintagepc
Copy link
Owner

vintagepc commented Nov 1, 2020

OK, so that means we're not finishing things as fast as we should be - as opposed to running out of "things to do"

Can you describe the algorithm you are using to pace the clock?

All I've been experimenting with are variants of checking the "simulation time" in ns against the "wall time" and then doing something to make up the difference at regular intervals. The size of those intervals and specifically what I am doing to idle are the knobs I've been twiddling. I'm using the graph script to plot Klippy's calculated MCU frequency; this is one of the better ones I've managed to get but clearly it's still too unpredictable. I'm guessing with a proper micro this line is very flat.

6

Many look more like this, where it dips and recovers after the virtual print stops.

5000

Separately, does simavr use accurate timing for serial port reads? That is, on a 16Mhz avr with 250000 baud, will it take at least 640 cycles for each byte to be read and transmitted? If not, that could also be messing up Klipper's timing.

Good thought - I know it will clock in data bytes at a time but I will have to look into that. We've definitely encountered some other peripheral I/F issues like that before (with I2C and SPI) given the nature of the simulation we're doing.

Edit: Yes, it appears to throttle at the appropriate rate. There is code for it and enabling verbose logs reports:
UART: 0 configured to 0007 = 250000.0000 bps (x2), 8 data 1 stop
UART: Roughly 44 usec per byte
Reversing 44 usec into cycle counts gives ~702 cycles

@vintagepc
Copy link
Owner

Hmm... I wonder if we are chasing the wrong underlying cause. I just managed to get a frequency graph that looks like this and it still failed with the queue error.

13

@KevinOConnor
Copy link

KevinOConnor commented Nov 2, 2020

FYI, I don't think the charts (or the data in the log in general) will help, as the frequency is only logged once per second. What's going to cause problems is sub-second bursts and sub-second slow downs. So, for example, if the simulation runs 20ms of simulation in 35ms, but then runs the next 20ms of simulation in 5ms it will show up fine in the graphs, but there's a very good chance it would cause problems internally in Klipper.

All I've been experimenting with are variants of checking the "simulation time" in ns against the "wall time" and then doing something to make up the difference at regular intervals. The size of those intervals and specifically what I am doing to idle are the knobs I've been twiddling.

FWIW, I didn't have luck with that approach in simulavr - the problem I had was that after a "slow down" the pacing would try to "catch up" and that would ultimately cause more timing havoc. The scheme I finally came up with ( https://github.com/KevinOConnor/klipper/blob/cef9cc29b8f150681cd24ca6f051fc9f03fc924b/scripts/avrsim.py#L102 ) was to check pacing every 100us and to reset timing every 5ms. That is, I allowed the simulation to "slow down" and "catch up" every 100us, but never allow it to "catch up" more than a couple of milliseconds. If the simulation fell behind by a couple of milliseconds then that time is lost (the host can usually adopt).

I did also have to run the simulation very slowly (at 30% normal rate). I suspect simavr is much faster than simulavr so that may not be an issue. I would verify that the simulation thread isn't using any more than about 75% of a cpu though.

Also, you may want to wait 30 seconds after startup before starting a print - just so the host has a good sync on the clock. (I suspect that wont help though.)

-Kevin

EDIT: Actually the code checks pacing every 10us instead of every 100us, but that does seem a bit extreme.

@vintagepc
Copy link
Owner

Hmm, that might explain why I am having more luck with my current approach, shy of checking the clock every simulated cycle. Still variable but I did get 80% of the way through a benchy once.

Rather than adding delay time every x cycles, this approach takes advantage of the fact that checking the nanosecond clock is relatively expensive (on the order of a few hundred ns). Then, depending on whether we are running ahead or behind, it increases or decreases the frequency of these checks one cycle count at a time. I think this should work as it smears the correction over time and does not result in significant jumps but a smoothed correction.

I haven't checked where it ends up settling, but with prior attempts I was certainly attempting to correct for the time on roughly a 500 cycle count scale (about 31 microseconds), and in some cases even more often.

One thing I keep forgetting to look at is the SimAVR cycle ticking; I don't think it "visits" every cycle count and if the CPU has small idle periods it will jump over those. I need to get a good picture of how big those jumps are.

@vintagepc
Copy link
Owner

It also occurs to me that SimAVR supports external clocking of AVR timers. Perhaps we can make use of that to clock the simulation at a more regular rate.

@vintagepc
Copy link
Owner

Not finished yet but I pushed up the branch that let me get 80% of the way through a benchy, in case it helps with stability for others. If attempting large prints you'll likely want to adjust the preallocation in GLPrint.h as when we hit that limit Klipper will likely choke because the MCU does not respond in time.

https://github.com/vintagepc/MK404/commits/263-Klipper-take-2

@vintagepc
Copy link
Owner

OK - I took a look at the timing on a microscale level - this is a plot of the "wall time (ns)" per cycle as time progresses with the current scaling algorithm.

This sets me up to get some more useful information; now I can visually see how we are doing over time. In a perfect situation this chart should be a flat line at 62.5.

image

So we can see some major excursions that are sporadic, and also that my current skew correction is complete crap and way too slow to respond 🤣 So the 80% benchy is a complete fluke.

@KevinOConnor
Copy link

FWIW, calling clock_gettime(CLOCK_MONOTONIC_RAW) is higher overhead than a regular gettimeofday() or clock_gettime(CLOCK_MONOTONIC) call (because on most machines the latter use the Linux vDSO). You shouldn't need microsecond clock accuracy (but will need millisecond accuracy). So, the normal Linux clocks should work okay.

-Kevin

@vintagepc
Copy link
Owner

@KevinOConnor So you're saying that it should be able to handle small excursions so long as we're tracking accurately somewhere around the millisecond scale? I am currently working at about a microsecond scale to hopefully get a better picture of what's going on - but once I understand the issue in depth I should be able to pull back a bit and operate on a larger timescale.

I think part of it is there are ocassional excursions that are what are biting us; so I am currently trying to see if I can identify what is causing those - the bulk of the time the simulation seems to be pretty good at matching pace with the wall clock.

@KevinOConnor
Copy link

So you're saying that it should be able to handle small excursions so long as we're tracking accurately somewhere around the millisecond scale?

Yes.

The current Klipper code measures the clock frequency by querying the current clock value once per second. (The resulting offset/frequency could be skewed by comms jitter and the frequency can change over time, so Klipper performs a bunch of "linear regression" math to obtain better estimates from the raw measurements.) Klipper can eventually get to an estimate with micro-second accuracy, but it isn't necessary.

I think part of it is there are ocassional excursions that are what are biting us

I agree. There are two main failure cases: the clock estimate is too low, or the clock estimate is too high.

For the too low estimate, the mcu may "race ahead" and not have necessary data buffered by the time it needs it. This problem is often seen as a "Timer too close" error. It's unlikely to strike here though, because Klipper tries to buffer a minimum of 100ms (and up to 2s) in the micro-controller. So, this problem should really only occur it the host scheduling falls behind and mcu races ahead for a total of 100ms.

The other error is due to the micro-controller going too slow. In this case, Klipper may try to buffer content in the mcu that can't fit because the mcu was too slow to process the previous content (often seen as a "move queue empty" error). To avoid this case, Klipper uses TRANSMIT_EXTRA (in klippy/clocksync.py) along with 3*measured_clock_jitter to avoid that case.

I suspect that Klipper is observing a pretty steady clock rate over one second intervals (and thus measured_clock_jitter is pretty low). However, if a 7ms micro-slow-down happens to occur when Klipper is buffering some content then it'll trip up and throw an error.

You could always try increasing TRANSMIT_EXTRA from 1ms to 10ms if you think your micro-slow-downs are only in the ~7ms range. There shouldn't be any harm in doing that.

Cheers,
-Kevin

@vintagepc
Copy link
Owner

Status update: I've added more logging code and can now see that we are indeed losing handfuls of ms at random intervals. It's not consistent, and there are a few culprits. usleep/nanosleep is a particularly bad offender since it will sporadically sleep 10x or more what was asked. Mutexes from the various cross-thread operations are another. I also suspect disk I/O is hurting things too as it might be blocking the files used as PTYs.

In any case I'm making adjustments and adjusting TRANSMIT_EXTRA, combined with the removal of some sleep calls and display mutexes, have so far gotten one complete benchy (second one is currently 64% after cleanup). However, this is without the full 3d visualizations. Those definitely have some mutexes and bottlenecks so I will look at whether I can address those next. If not it might be possible to make a lightweight version that tracks the model being printed but does not actually render anything to screen so that it does not have to worry about cross thread memory access.

@vintagepc
Copy link
Owner

Victory! I think thinks are in good shape now. May still find a few lingering issues but I've just completed two benchys without issue.
image

@KevinOConnor
Copy link

Nice. Out of curiosity, did you also end up changing TRANSMIT_EXTRA in Klipper? And if so, what did you ultimately set it to?

-Kevin

@vintagepc
Copy link
Owner

Yes - I did end up changing it to 10ms as suggested and that seems to have done the trick to help with smaller irregularities (after I got rid of the larger ones caused by the sleeps and cross thread mutexes)

It still chokes on underpowered computers (we know that >4 cores/threads are required - since MK404 takes at least 4 threads by itself, not accounting for the Klipper process and whatever you are using to actually run the print). We had some issues on @3d-gussner's machine initially but observed it would print just fine if the 3d graphics were disabled.

In the end I successfully completed three benchy prints, one of which was even at 200% feedrate.

@KevinOConnor
Copy link

I did end up changing it to 10ms as suggested and that seems to have done the trick to help with smaller irregularities

Ah, makes sense. I was thinking of changing the default in Klipper from 1ms to 5ms. I'm a little leery of going to 10ms for the default, just because I'm afraid of introducing a regression (and I've not seen the problem outside of simulators). I'm not sure if 5ms would help here though.

It still chokes on underpowered computers

FWIW, it's fine to simulate a slower speed (eg, pace to 8mhz on a nominal 16mhz chip). Klipper should produce identical results (other than the simulation taking longer) as Klipper times actions based on the main micro-controller clock. Klipper doesn't really care about the actual clock speed - it just wants a clock it can accurately predict.

Cheers,
-Kevin

@vintagepc
Copy link
Owner

It might be sufficient for the simulation to use 5, I added logging as a feature and it will report "lost" time; usually I only see <5ms with any regularity - but I can definitely make it have significant losses by doing things that load the CPU cores or disk.

I think the losses aren't necessarily the simulation speed but the context swapping and concurrency of other processes if your system doesn't have a dedicated core for each thread that isn't idle. At least in this case, we are spinlocking in order to get accurate time correction - which means that even if you run the simulated AVR slower it will still use the same amount of CPU time.

Certainly it's already very impressive it works as well as it does at full speed.

BTW - if there are features I can add that would make the simulator more useful to you don't hesitate to let me know. I appreciate your involvement in helping to get it running for our own purposes so I'm happy to reciprocate that if I can make it useful for you too.

@KevinOConnor
Copy link

Thanks. It does seem like a useful tool. Unfortunately, I haven't gotten a chance to spend time working with MK404, and I probably wont get much time in the short-term.

I do appreciate the offer and your efforts!
-Kevin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed upstream Something that should be fixed or submitted upstream
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants