Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix message reading causing OOM on a busy bus #28

Merged
merged 2 commits into from
Jan 23, 2025

Conversation

SamantazFox
Copy link
Contributor

@SamantazFox SamantazFox commented Jan 20, 2025

There is a logic error in the unread_message_count property getter that causes the library to buffer messages twice as fast as it empties the queue when receive() is called.

To give numbers, I simulated a busy bus (2* full length 8 bytes messages every ms) and it caused an OOM error in about 20 seconds. [1]

Assuming a fresh start (empty queue) and a moderately busy bus (the two Rx buffers of the MCP2515 are refilled between calls to receive()), here is what the code did:

  • user calls listener.in_waiting()
  • library reads the two RX buffers (queue size = 2)
  • user calls can.receive() to read the first message
  • library reads the two RX buffers and provides one message to the user app (queue size = 3)
  • user calls can.receive() to read the second message
  • library reads the two RX buffers and provides one message to the user app (queue size = 4)
  • repeat...
  • program runs out of memory

After this PR, here is what the code does:

  • user calls listener.in_waiting()
  • library reads the two RX buffers (queue size = 2)
  • user calls can.receive() to read the first message
  • library provides one message to the user app (queue size = 1)
  • user calls can.receive() to read the second message
  • library provides one message to the user app (queue size = 0)
  • user calls listener.in_waiting()
  • library reads the two RX buffers (queue size = 2)
  • repeat...

Here is the code used on either sides. Please note that the Canbus class is just boilerplate for initializing the CAN hardware. Switching between sender/receiver code requires (un)commenting the relevant lines in main().

Details

import asyncio
import board
import digitalio

# CAN modules
try:
	from canio import BusState, CAN, Match, Message
except ImportError:
	from adafruit_mcp2515 import MCP2515 as CAN
	from adafruit_mcp2515.canio import BusState, Match, Message


###
### Canbus class
###

class Canbus:
	def __init__(self, baudrate=250_000):
		# If the CAN transceiver has a standby pin, bring it out of standby mode
		if hasattr(board, 'CAN_STANDBY'):
			standby = digitalio.DigitalInOut(board.CAN_STANDBY)
			standby.switch_to_output(False)

		# If the CAN transceiver is powered by a boost converter, turn on its supply
		if hasattr(board, 'BOOST_ENABLE'):
			boost_enable = digitalio.DigitalInOut(board.BOOST_ENABLE)
			boost_enable.switch_to_output(True)


		# Init the actual CAN hardware (either embedded or external)
		if hasattr(board, 'CAN_RX') and hasattr(board, 'CAN_TX'):
			self.bus = CAN(
				rx=board.CAN_RX, tx=board.CAN_TX,
				baudrate=baudrate, auto_restart=True
			)
		elif hasattr(board, 'CAN_CS'):
			# When have an SPI CAN chip, set the CS pin appropriately
			cs_pin = digitalio.DigitalInOut(board.CAN_CS)
			cs_pin.switch_to_output(True)

			self.bus = CAN(
				spi_bus=board.SPI(), cs_pin=cs_pin,
				baudrate=baudrate
			)
		else:
			raise Exception("Error: no supported CAN hardware found")


		self.listener = None
		self.stop_request = False

		self.callback = None
		self.cb_args = []
		self.cb_kwargs = {}

		print("CAN init done!")


	def send(self, msg):
		self.bus.send(msg)


	def start(self, matches = None):
		self.stop_request = False
		self.listener = self.bus.listen(matches=matches, timeout=2)

	def stop(self):
		self.stop_request = True


	def register_callback(self, callback, *args, **kwargs):
		self.callback = callback
		self.cb_args = args
		self.cb_kwargs = kwargs


	async def listen(self):
		if self.listener is None:
			raise Exception("Use Canbus.start() to initialize a listener first!")

		if self.callback is None:
			raise Exception("Use Canbus.register_callback() first!")

		while True:
			if self.stop_request:
				self.listener.deinit()
				self.listener = None
				return

			if self.bus.state != BusState.ERROR_ACTIVE:
				print("Bus error!")
				await asyncio.sleep(0.5)
				continue

			waiting = self.listener.in_waiting()
			if waiting > 0:
				for i in range(waiting):
					message = self.listener.receive()
					self.callback(message, *self.cb_args, **self.cb_kwargs)

			await asyncio.sleep(0)


###
### CAN initialization
###

def can_listener_callback(message):
	print("0x{:08X} --".format(message.id), end='')
	for b in message.data: print(" {:02X}".format(b), end='')
	print("")


can = Canbus(500_000)
can.register_callback(can_listener_callback)
can.start(matches=[])



###
### Main code loop
###

async def bit_shift():
	shift_msg = Message(0x18FF1069, b'', extended=True)
	debug_msg = Message(0x420, b'')

	while True:
		for shift in range(64):
			shift_msg.data = struct.pack("!Q", 1 << shift)
			debug_msg.data = b"bit = {:02d}".format(shift)

			can.send(shift_msg)
			can.send(debug_msg)

			await asyncio.sleep_ms(1)


async def main():
	# Receiver side
	await asyncio.gather(can.listen())

	# Sender side
	#await asyncio.gather(bit_shift())


print("Init done!")
asyncio.run(main())


[1]: Fun fact, I initially wanted to simulate a moderately busy bus (about 40 to 60% load), but I left the baudrate to 250 kbit/s (instead of 500) from previous experiments, for which 2 fames/ms is almost a 100% bus load. My feather RP2040 CAN board handled that Tx rate nicely, though. The Rx board not so well ^^

@tannewt
Copy link
Member

tannewt commented Jan 21, 2025

Why doesn't _read_status() prevent the buffers from being added a second time? Shouldn't the bits be unset during the second read_rx_buffers?

@SamantazFox
Copy link
Contributor Author

The code uses the READ RX BUFFER SPI command, which will automatically clear the appropriate RXnIF bit in the CANINTF register once the CS pin goes high.

If the code used the regular READ command, then yes, the RXnIF bit would need to be cleared to prevent re-reading the same buffer over and over again.

Source: Table 12-1 of the MCP2515 datasheet
https://ww1.microchip.com/downloads/aemDocuments/documents/APID/ProductDocuments/DataSheets/MCP2515-Family-Data-Sheet-DS20001801K.pdf

@tannewt
Copy link
Member

tannewt commented Jan 22, 2025

Ok, so this isn't a problem that we're loading the same rx buffers again. It is a control flow problem where we handle one message and load two over and over.

Copy link
Member

@tannewt tannewt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the explanation!

adafruit_mcp2515/__init__.py Show resolved Hide resolved
Copy link
Member

@tannewt tannewt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

adafruit_mcp2515/__init__.py Show resolved Hide resolved
@tannewt tannewt merged commit ff2aa62 into adafruit:main Jan 23, 2025
1 check passed
@SamantazFox SamantazFox deleted the patch-1 branch January 24, 2025 07:55
adafruit-adabot added a commit to adafruit/Adafruit_CircuitPython_Bundle that referenced this pull request Jan 28, 2025
Updating https://github.com/adafruit/Adafruit_CircuitPython_APDS9960 to 3.1.14 from 3.1.13:
  > Merge pull request adafruit/Adafruit_CircuitPython_APDS9960#48 from FoamyGuy/fix_docs_build

Updating https://github.com/adafruit/Adafruit_CircuitPython_CCS811 to 1.3.19 from 1.3.18:
  > Merge pull request adafruit/Adafruit_CircuitPython_CCS811#53 from FoamyGuy/remove_imp_usage

Updating https://github.com/adafruit/Adafruit_CircuitPython_MCP2515 to 1.1.8 from 1.1.7:
  > Merge pull request adafruit/Adafruit_CircuitPython_MCP2515#28 from SamantazFox/patch-1

Updating https://github.com/adafruit/Adafruit_CircuitPython_PIOASM to 1.3.2 from 1.3.1:
  > Merge pull request adafruit/Adafruit_CircuitPython_PIOASM#76 from adafruit/fix-print-c-program
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants