Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corrupt data using SPort soft serial for external modules #3285

Closed
1 task done
mha1 opened this issue Mar 1, 2023 · 19 comments · Fixed by #3329
Closed
1 task done

Corrupt data using SPort soft serial for external modules #3285

mha1 opened this issue Mar 1, 2023 · 19 comments · Fixed by #3329
Labels
bug 🪲 Something isn't working

Comments

@mha1
Copy link
Contributor

mha1 commented Mar 1, 2023

Is there an existing issue for this problem?

  • I have searched the existing issues

What part of EdgeTX is the focus of this bug?

Transmitter firmware

Current Behavior

Working with an external module using SPort serial input for telemetry I noticed unexpected glitches.

The glitches are caused by soft serial sampling. To prove this claim I have a simple test setup which uses an existing external module protocol using SPort telemetry (LemonRX DSMP - physical module not required) and feeding arbitrary data to the SPort Pin. The data is read back using telemetry mirror and compared to the fed in data.

Note: 2.8.1 works fine, no data corruption

Test setup:

TX16s
set External Module to mode LemonRX DSMP
set AUX1 to Telem Mirror
connect FTDI GND to module bay GND
connect FTDI TX to module bay SPort pin
connect FTDI RX to AUX1 TX

Feed a 22 byte data frame at 115200 every 100ms, e.g. using HTerm (ASend 0,1) to the module bay SPort pin:
02 01 1B 22 1B 23 04 05 06 46 3A 5B 00 00 6E 01 14 15 16 17 18 03

Log of Telemetry Mirror output:

02011B221B23040506463A5B00006E01141516171803
02011B221B23040506463A5B00006E01141516171803
02011B221B23040506463A5B00006E01141516171803
02011B221B23040506463A5B00006E01141516171803
02011B221B23040506463A5B00006E01141516171803
02011B221B23040506463A5B00006E01141516171803
02011B221B23040506463A5B00006E01141516171803
02011B221B23040506463A5B00006E01141516171803
02011B221B23040506463A5B00006E01141516171803
02011B221B23040506463A5B00006E01141516171803
02011B221B23040506469D2B006E01141516171803       <-- Data corrupt, frame length not ok, byte missing
02011B221B23040506463A5B00006E01141516171803
02011B221B23040506463A5B00006E01141516171803
02011B221B23040506463A5B00006E01141516171803
02011B221B23040506463A5B00006E01141516178CE0     <-- Data corrupt, frame length ok
02011B221B23040506463A5B00006E01141516171803
02011B221B23040506463A5B00006E01141516171803
02011B221B23040506463A5B00006E01141516171803
02011B221B23040506463A5B00006E01141516171803
02011B221B23040506463A5B00006E01141516171803
02011B221B23040506463A5B00006E01141516171803
02011B221B23040506463A5B00006E01141516171803
02011B221B23040506463A5B00006E01141516171803
02011B221B23040506463A5B00006E01141516171803
02011B221B23040506463A5B00006E01141516171803
02011B221B23040506463A5B00006E01141516171803
02011B221B23040506463A5B00006E01141516171803
02011B221B23040506463A5B00006E01141516171803
02011B221B23040506463A5B00006E01141516171803
02011B221B23040506463A5B00006E01141516171803
02011B221B2304829094DA01006E01141516171803        <-- Data corrupt, frame length not ok, byte missing
02011B221B23040506463A5B00006E01141516171803
02011B221B23040506463A5B00006E01141516171803

Expected Behavior

No data corruption using module bay Sport pin.

Steps To Reproduce

see above

Version

2.9 latest nightly, been in there since the major driver and ports overhaul

Transmitter

Radiomaster TX16S / TX16SMK2

Anything else?

No response

@mha1 mha1 added bug 🪲 Something isn't working triage Bug report awaiting review / sorting labels Mar 1, 2023
@mha1
Copy link
Contributor Author

mha1 commented Mar 1, 2023

First of all I was wrong tagging this as a 2.8.1 problem. It's a main problem tested with the latest nightly and I think got introduced with the module ports renovation. 2.8.1 is working fine.

Back to the problem I suspected the sampling might be off but actually looks pretty good. The start bit is detected after 500ns (first pulse generated at entry into the falling edge ISR static void _softserial_exti()) and the following 8 samples (pulses generated in the timer ISR at the point where the GPIO is read - void stm32_softserial_rx_timer_isr()) are positioned right at half the bit time. So the soft serial driver is working fine. But even with this perfect sampling example sporadic data corruption occurs. Most likely some higher priority interrupt hitting in between bit samples throwing the timing off.

@pfeerick @raphaelcoeffic any ideas?

image

PS: With a little patience I caught one

image

The first UART line is the feed in at the SPort pin, the second line is telemetry mirror output. Note the first 4 bytes. In: 02 55 AA 22, Out: 02 55 AA 91. The 22 was not sampled correctly. Check out the 9,75us delay instead of the usual 500ns delay detecting the falling edge of the start bit. We missed an entire bit and everything else after that goes wrong because we are one bit out of sequence. It's gotta be a high priority interrupt that delayed firing the the falling edge detection ISR.

image

If you want to see for yourself please use the attached DSLogic data file. Can be opened in DSView which is available free of charge.
DSLogic-la-230301-211300.zip
https://www.dreamsourcelab.com/download/

@mha1 mha1 mentioned this issue Mar 5, 2023
2 tasks
@richardclli
Copy link
Collaborator

I think it is the change of IRQ priority, old code uses 0 for both EXTI and TIM irq priority, but new code did not.

@mha1
Copy link
Contributor Author

mha1 commented Mar 8, 2023

Yes, I changed the falling edge detection IRQ prio from 5 to 0 for target Horus a few days ago after talking with @raphaelcoeffic. This solved the problem for Horus type radios. Tested on TX16s and X10express ok.

Not sure about the other targets Taranis and NV14. Need feedback from @raphaelcoeffic

@richardclli
Copy link
Collaborator

Yes, I changed the falling edge detection IRQ prio from 5 to 0 for target Horus a few days ago after talking with @raphaelcoeffic. This solved the problem for Horus type radios. Tested on TX16s and X10express ok.

Not sure about the other targets Taranis and NV14. Need feedback from @raphaelcoeffic

Already replied you in your PR, please go ahead to fix taranis target as well.

@mha1
Copy link
Contributor Author

mha1 commented Mar 10, 2023

I'd like to do some testing on this. Do you have a Taranis type radio and a FTDI?

@richardclli
Copy link
Collaborator

richardclli commented Mar 10, 2023

I do not have a Taranis, need somebody have it to test anyway, however it didn't blocks you from changing the taranis as well in your PR. The old code uses 0 priority as well.

@mha1
Copy link
Contributor Author

mha1 commented Mar 10, 2023

@pfeerick I'm sure you have a Taranis type radio and a FTIDI, haven't you?

@pfeerick
Copy link
Member

I plead the fifth.

@mha1
Copy link
Contributor Author

mha1 commented Mar 10, 2023

Overruled, there is no 5th in Australia

@mha1
Copy link
Contributor Author

mha1 commented Mar 10, 2023

Let me know the Cmake for your radio and I'll send you 2.9 firmware to test soft serial and a corresponding Windows command line tool to stimulate soft serial via the SPort pin.

@pfeerick
Copy link
Member

Welp... tough crowd! 😮

The X9D+2019 will be more reliable atm, so -DPCB=X9D+ -DPCBREV=2019 . If you want to risk the X9D+ as well, just -DPCB=X9D+ ... I'll pencil you in for Sunday 🤣

@mha1
Copy link
Contributor Author

mha1 commented Mar 10, 2023

Thanks for your precious time. Attached is a .zip with test firmware for X9D+2019 and X9D+ with changed IRQ priority for soft serial, the Windows command line tool and notes.txt. Please read notes.txt on how to set this up.
SoftSTest.zip

As a reference the same build for TX16s which tested ok on my TX16s with the setup described in notes.txt
fw-tx16s_softS-1.zip

@pfeerick
Copy link
Member

pfeerick commented Mar 13, 2023

For the X9D+2019, when switching between 2.8.1 and your build, this seemed to have no change in behaviour - all sensors, but rolling values for some.
For the X9D+, 2.8.1 => your build, this went from not discovering any sensors, to discovering sensors, but as with the X9D+2019, rolling values for some. So, interestingly something seems to have been wrong in 2.8.1 for the X9D+

So in my view this is still an improvement in it's own right, but not 100% resolved? It certainly isn't a step backwards, so there is no need for you to invest in a Etch A Sketch just yet 🤭 Assuming, of course, the main sport simulator code was behaving properly? I switched between two computers and a couple different USB/serial adapters with no change in behaviour... and then tried the TX16S build and it is behaving exactly the same as the other radios... so at least they are all consistent! 😇 (edit: tester error, need to retry)

X9D+2019
https://user-images.githubusercontent.com/5500713/224583372-7875d742-8416-4eba-b166-d05a8e58aeba.mp4

X9D+
https://user-images.githubusercontent.com/5500713/224583834-a727f280-97e9-4105-b62f-5ba42932ce59.mp4

X9D+ (including discovery)
https://drive.google.com/file/d/1fXN3DTSKVQiXEXMEi10cixNTw0iRVr3-/view?usp=share_link

TX16S
https://drive.google.com/file/d/1DRuc0JB5cAbF5AsQhPfcaUsN9Vdm4CVj/view?usp=share_link

@pfeerick pfeerick removed the triage Bug report awaiting review / sorting label Mar 13, 2023
@mha1
Copy link
Contributor Author

mha1 commented Mar 13, 2023

Thank you very much for testing.

Can you please verify the fw versions for your comparisons 2.8.1 -> my self build. There is no way you'd be able to set the external module to subtype MLink (PPM MLINK) in 2.8.1 for any radio. This option of selecting a subtype only exists in my self build. And please make sure after my self build is flashed you go into settings/external module to select PPM MLINK. If that's the case you should see only the sensors discovered with main.exe running I listed in notes.txt

Flashing 2.8.1 and selecting PPM in settings/ external module (no subtype selection possible) might show some sensors after discovery with main.exe running but sensors should be different to the 10 as listed in notes.txt

@mha1
Copy link
Contributor Author

mha1 commented Mar 13, 2023

I should have explained this better: my self build implements two major differences to 2.8.1 and 2.9. In 2.8.1/2.9 you can only select PPM as external module type without further specifying the telemetry protocol that is fed to the SPort pin. 2.8.1 is special here as by default it interprets any serial data coming in through the SPort pin as FrSky SPort telemetry. The switch to the new serial/module API in 2.9 disabled this feature. So if you'd try a recent 2.9 nightly you wouldn't be able to discover any telemetry sensors with a PPM module. I think it was never deliberately chosen to decode SPort telemetry with PPM. The port was just open. Fact is 2.9 after the serial/module API has disabled decoding SPort telemetry which I think is correct.

My change uses 2.9 and starts fresh with allowing deliberate telemetry protocol selection for module type PPM. I started with implementing a new protocol for a Multiplex PPM JR type module that is able to deliver serial telemetry too. Main.exe simulates this protocol, not SPort telemetry. Feeding main.exe's data to a 2.8.1 radio will produce random results as 2.8.1 tries to decode this as SPort telemetry (this is what your videos show). Having my self build flashed and having set the external module to PPM NOTLM should show no discovered sensors, setting it to PPM MLINK should shown exactly the 10 sensors I listed in notes.txt. Make sure you delete all sensors and rediscover sensors after switching firmware versions.

While implementing based on a recent 2.9 versions I found this serial problem which has it's roots in the IRQ priority change that came with the big serial/module API renovation. This is also corrected in my self build and is the test focus.

The major changes in my self build vs 2.9:
UI: it is now possible to select a sub type for external modules (NOTLM and MLINK) which allows to select a specific telemetry protocol for a PPM type module.
Telemetry: implemented a new protocol to decode Multiplex JR module telemetry (subtype MLINK)

Expected results:
2.8.1:

  • with external module set to PPM (no sub type selection possible) --> garbage as shown in the video as main.exe's output is decoded as SPort telemetry

2.9 current nightly (any after the serial/module API renovation)

  • with external module set to PPM (no sub type selection possible) --> no telemetry sensors detected

2.9. Michael's self build:

  • with external module set to PPM NOTLM --> no sensors detected with main.exe running
  • with external module set to PPM MLINK --> 10 sensors (RSSI, VFAS (ID 1), Curr, Capa, VFAS (ID 4), Alt VSpd, Acc (ID 8), Acc (ID 9), RxBt) correctly detected. If that is the case it proves the IRQ change worked!

So if you really wanted to do a comparison you should compare a 2.9 nightly build after Raphael's big change to my self build.

@pfeerick
Copy link
Member

pfeerick commented Mar 13, 2023

Ok, this monkey didn't read the instructions properly... I still had Lemon DSMP in my head when I was trying this, so had the module type set to that... regardless... on this X9D+ something changed between 2.8.1 and your build for the better (even with the wrong setting 😆) ... which is actually no surprise given it was playing up before but another user couldn't repo it... so will try again and will put my glasses on this time 😇

@mha1
Copy link
Contributor Author

mha1 commented Mar 13, 2023

No problem. Knowing I am good at confusing people, I should have explained the context much better. Here's my new attempt to confuse you but with good intentions: to save your time 😇

  1. Why did I choose such a strange setup:
    a. Because I wanted to make sure soft serial is actually used to verify the solution to this issue. Using SPort telemetry doesn't use soft serial as it serves inverted serial data and therefore can use the HW UART together with the hardware inverters on the PCB (damn you FrSky). The thing I implemented in my branch feeds a new normal polarity telemetry protocol to the SPort pin and uses (has to) soft serial to process the data. Main.exe simulates the new protocol and hence only provides meaningful data for my new protocol decoder. Feeding main.exe to any other protocol generates garbage at best.
    b. Main reason: because I detected and analyzed the soft serial problem with the exact same setup I then used for verification too on my TX16s.

  2. Why does comparing to 2.8.1 not make a lot of sense:
    a. 2.8.1 has a special quirk that if you select PPM as external module, the SPort pin is active by default and will decode everything you feed it as SPort protocol. So feeding 2.8.1 produces garbage because main.exe doesn't deliver SPort data
    b. The problem with soft serial is a 2.9 problem. 2.9 soft serial was working fine until the big serial/module API do-over. New soft serial driver plus changes to IRQ priorities made soft serial unusable due to the start bit detection interrupt having a to low priority, see above analysis
    c. Btw, the quirk described in 1a. also vanished after the re-do. Which I think is good, because if an external module type has the capability to process more than one protocol it should be user selectable which one to choose. Do you know of any external PPM module that delivers SPort telemetry?

  3. Does make comparing a recent nightly build to this setup make sense:
    a. Only in so far as to verify there is no more SPort data decoded in 2.9 for PPM type external modules, see 2.c. But this setup doesn't provide the means for that as main.exe doesn't stimulate SPort telemetry
    b. Not as far as this issue is concerned due to section 1. If you want I can send versions of this setup with the current main branch IRQ priority. I omitted this because I went through the exercise for the Horus targets (on my TX16s).

  4. What's the real test case here?
    a. The test case is to check if the change of the start bit detection interrupt stops soft serial from providing corrupt data.
    b. A check with X9D+softS.bin and X9D+2019softS.bin is sufficient enough to prove the IRQ change is working for Taranis targets

  5. Whats the test procedure and the expected result:
    a. See the following video that was done with my TX16s, using fw-tx16s-sofS.bin and main.exe. If the result is the same using the same procedure as in the video on both X9D+ radios I'm happy and know the IRQ priority change works for Taranis targets too (I can confirm this for TX16s, see video). If the IRQ priority wouldn't work you'll see ghost sensor, sensors going on/off and so on.

https://www.youtube.com/watch?v=Yx7RB6kY2KA

@pfeerick
Copy link
Member

Here's my new attempt to confuse you but with good intentions

You'll have to try harder - that made sense 🤭 TX16, X9D+2019 and X9D+ with the correct protocol setting worked perfectly this time...

https://drive.google.com/file/d/1DmCd00J95WnSCzcN8PxjrbEeEyJ3IrcZ/view?usp=sharing, https://drive.google.com/file/d/1DomngIV4dqANCyqNpFkiBLNvCyORBmwJ/view?usp=sharing, https://drive.google.com/file/d/1DpgTSfdR9y2mtgSqhYaa_ANM_i9Hg-hz/view?usp=sharing

@mha1
Copy link
Contributor Author

mha1 commented Mar 14, 2023

Hi Peter, thanks again for your time and patience. I have updated PR #3285 accordingly and think it's ready to go.

You actually tested quite a bit more than just the fix for this issue. The test setup and firmware files you used for testing fully reflect PR #3352 (Support for external Multiplex MLink JR type module) and PR #3328 (fix: UI - not all detected sensors displayed at detection time). I'd appreciate you checking out both PRs and hopefully green light them too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🪲 Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants