Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pixhawk i2c bus lost (after missing ACK/NAK ?) #7968

Closed
bartslinger opened this issue Sep 15, 2017 · 14 comments
Closed

Pixhawk i2c bus lost (after missing ACK/NAK ?) #7968

bartslinger opened this issue Sep 15, 2017 · 14 comments

Comments

@bartslinger
Copy link
Contributor

@davids5

This is a new bug which appeared after the fix from: #7957. Before this fix, the drone would drop out of the sky. After this fix, it does not do that anymore. However, if the packet is interrupted in a very specific way, all communication on the i2c bus is lost.

My test setup to reproduce the problem is similar to the one used in issue #7951. On the i2c bus, I now attached a HMC5883 compass and an rgb LED.

The logic analyzer screenshots shows the last transmission before the bus was lost (no data and no clock signals seen after this). It appears that the lack of the ACK/NAK bit triggers the failure. These are some captures of failure cases. The marker indicates when the SDA line is pulled low.

missing nak

missing nak 2

This is another case where the bus broke down:
logicanal

And another one:
ackmissing2

@davids5
Copy link
Member

davids5 commented Sep 15, 2017

@bartslinger are you on skype?

@davids5
Copy link
Member

davids5 commented Sep 22, 2017

@bartslinger - I tested PX4 on upstream nuttx and found one more issue. But with master I can not get it to fail. I am using the auav air sensor and 3dr compass.

I have code that slides a 10us pulse down all the bits of the messages. I have it using random delays 1-12 from a clock edges and there are no hangs. ( I also had a pure random test)

I am injecting errors and can see that the bus reset code is called and working. So that leave me wondering if this is related to the LED and it not coming out of a bad state on the bus reset.

@bartslinger
Copy link
Contributor Author

bartslinger commented Sep 22, 2017

I made a piece of hacky arduino code that can break it more consistently. https://gist.github.com/bartslinger/d68aff3b867483c20c57678e60fb10e5

The hacky part is in the timings, I'm not sure if the timers are configured correctly on different types of Arduino. I'm using an arduino nano at the moment. Also you need an HMC5883 for this, because it actively monitors that address.

However the idea is simple: Just break the NAK bit by pulling the data line LOW in the middle of the clock pulse.

Here you can see I only had to do it only once: (channel 2 shows the mosfet driver gate)

Closeup:
closeup

@davids5
Copy link
Member

davids5 commented Sep 22, 2017

@bartslinger - how long is the gate pulse?

@davids5
Copy link
Member

davids5 commented Sep 22, 2017

The reason I ask is the if it is longer then the retry total retries, the sensor may be take off line. Can you do a run with a 30uS clobber puse?

@bartslinger
Copy link
Contributor Author

bartslinger commented Sep 22, 2017

If I understand you correctly, you suspect that the compass is disabled by PX4 after a maximum number of retries? I've though of this too and that's why I also added the LED in a previous test. Communication to the LED was also lost.

The gate pulse time I used now is 1 ms. This time was chosen because the time in between two packets was always more than 1ms. That means the SDA line is released before the next packet is expected. I would also expect a clock signal for a retry, but that was not observed.

I will try with 30us on Monday, although I don't think it's going to make a difference.

I'm quite sure that the ACK/NACK stuff is broken. I've been reading a little bit about i2c, and it seems that the NACK condition is defined by a high data line throughout the entire clock pulse. However, what I'm doing here, is pulling the data line low during the clock pulse. That basically renders the NACK invalid. The handling of this unusual case must be broken somewhere.

@bartslinger
Copy link
Contributor Author

I tried with a 18us pulse. Same behavior.

@davids5
Copy link
Member

davids5 commented Sep 25, 2017

@bartslinger - Thank you for testing it. I have ordered a nano, so I can replicate 100% your setup.

@davids5
Copy link
Member

davids5 commented Oct 7, 2017

Hi @bartslinger
My Arduino nano came. yippy!

Tested on build of master (nuttx 7.22) post upgrade

HW arch: PX4FMU_V2
HW type: V2
HW version: 0x00090008
HW revision: 0x00000000
FW git-hash: 32424c96b76c10c6194cdfa59f047d58c5c3e996
FW version: 1.6.5 0 (17171712)
OS: NuttX
OS version: Release 7.22.0 (118882559)
OS git-hash: 85a88c073130c0f9dd83af0a67e61c445ab0b3fa
Build datetime: Oct  6 2017 12:08:37
Build uri: localhost
Toolchain: GNU GCC, 5.4.1 20160609 (release) [ARM/embedded-5-branch revision 237715]
MFGUID: 303533333036510400410026
MCU: STM32F42x, rev. 3
UID: 410026:30365104:30353333
nsh>

I wired up the same circuit.

Top Black Trace is SCL
Brown Trace is SDA
Red Gate of N-Channel FET that has the Drain to kill the SDA (With a switch to disconnect Drain)

I have only a 3DR GPS/Compass with HMC5883 connected. 1000 uS gate signal.

Zoomed out Running Drain Disconnected (no kill)
image

Zoomed in Running Drain Disconnected (no kill)
image

Zoomed in more Running Drain Disconnected (no kill)

image

Zoomed in more Running Drain connected (kill)

image

Zoomed out Drain connected (kill)
image

There is no hang.

@davids5
Copy link
Member

davids5 commented Oct 7, 2017

@bartslinger - would you please retest with current master, and get one close up like "zoomed in more Running Drain connected (kill)" with a marker on the gate edge.

@bartslinger
Copy link
Contributor Author

Did something change in between on master? Anyway, as long as it works I'm happy!
I'll try to replicate your results on Monday.

@davids5
Copy link
Member

davids5 commented Oct 9, 2017

Yes. I fixed an issue in upstream nuttx's I2C driver, then we upgraded master to use Nuttx 7.22+

@bartslinger
Copy link
Contributor Author

When I flash master using QGC, I still see NuttX 7.21

HW arch: PX4FMU_V2
FW git-hash: ef07c3be200c84ef6bbb9ee2e92f3dcd1e557ec0
FW version: 1.6.5 0 (17171712)
OS: NuttX
OS version: Release 7.21.0 (118817023)
OS git-hash: 85a88c073130c0f9dd83af0a67e61c445ab0b3fa
Build datetime: Oct 10 2017 07:13:36
Build uri: localhost
Toolchain: GNU GCC, 5.4.1 20160919 (release) [ARM/embedded-5-branch revision 240496]
MFGUID: 323833363335510c003d002e
MCU: STM32F42x, rev. 3
UID: 3D002E:3335510C:32383336 

Anyway, I don't see the issue anymore with this firmware. Does that make sense?

@davids5
Copy link
Member

davids5 commented Oct 10, 2017

Yes it does. There is an open issue for a missing tag.

The fact that it works also makes sense. I fixed it upstream NuttX and tested it with my version of the I2C killer that knocked down every bit and your version of the I2C killer and it was working here.

So I think the problem is resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants