Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Memory leak in addApbChangeCallback() #3560

Merged
merged 4 commits into from
Dec 30, 2019
Merged

Conversation

stickbreaker
Copy link
Contributor

@stickbreaker stickbreaker commented Dec 10, 2019

  • ledcWriteTone() added a apbcallback() evertime the tone value was non zero.
  • addApbChangeCallback() did not detect duplicate callbacks.
  • changed the apbcallback list to a double link to support roll forward, roll back execution. This made the sequences of clock change callback start with the newest registered -> to oldest on the before then oldest -> newest after the clock change. This made the UART debug log output have minimal gibberish during the clock change.
  • change how the UART callback handled the MUTEX because if any apbchangeCallback() executed a log_x() a deadlock would occur.

This fixes #3555

code was allocating a semaphore handle on every write access.
stickbreaker and others added 3 commits December 10, 2019 15:34
Every time frequency was changed, an additional cpu frequency change callback was added. Code now adds ONE callback and uses a static variable to denote active channels requiring timer reconfig on base frequency change.
Change apbChangeCallback to double link list, roll cpu freq change newest->oldest then Oldest->Newest to minimize UART disruption.
Fix duplicate addApbChangeCallback detection.
Fix UART MUTEX deadlock when log_x calls occuring from apbCallbacks
Fix incorrect error message. If timer clock source was REF_CLK, (slow speed 8mhz) ApbClockChange callback was incorrectly reporting that timer channel was inactive.
@stickbreaker stickbreaker changed the title Fix Semaphore Handle leak Fix Memory leak in addApbChangeCallback() Dec 16, 2019
@me-no-dev me-no-dev merged commit 9ad8607 into master Dec 30, 2019
@me-no-dev me-no-dev deleted the stickbreaker-patch-2 branch December 30, 2019 19:35
@cyberman54
Copy link
Contributor

@stickbreaker if i run my code on current head of this repository, i suddenly get this error:
[E][esp32-hal-cpu.c:115] removeApbChangeCallback(): not found func=400EEB14 arg=3FFC00C0
Not sure if this may be related to this issue.

@stickbreaker
Copy link
Contributor Author

@cyberman54 add this debug to cores\esp32\esp32-hal-cpu.c in the
bool addApbChangeCallback(void * arg, apb_change_cb_t cb) function

About line 99:

bool addApbChangeCallback(void * arg, apb_change_cb_t cb){
// snipped
        else {
           log_d("adding apb callback func=%08X arg=%08X",c->cb,c->arg);  // << this line
            c->next = apb_change_callbacks;
            apb_change_callbacks-> prev = c;
            apb_change_callbacks = c;
        }
    }
    xSemaphoreGive(apb_change_lock);
    return true;
}

and rerun your code with CORE_DEBUG set to DEBUG.

The error message is reporting that the specified callback with the specified argument did not exist in the list of clock change callbacks. Either delete is being called twice for the same callback, or the callback function address is different or the arg value does not match the callback function address.

Grab the serial log, lets see what the problem is.

Chuck.

@cyberman54
Copy link
Contributor

@stickbreaker Thanks, i will prepare this soon. My code doesn't do anything to CPU clock cycle. Thus i need to check which library causes this error. I suspect it's the LMIC lorawanstack.

@cyberman54
Copy link
Contributor

@stickbreaker I now see this in the serial log:

[E][esp32-hal-cpu.c:116] removeApbChangeCallback(): not found func=400EF2CC arg=3FFC012C
[D][esp32-hal-cpu.c:99] addApbChangeCallback(): adding apb callback func=400EF2CC arg=3FFC012C

Looks like the remove happenes before the add... ?

@stickbreaker
Copy link
Contributor Author

add a call to abort() just after the the log_e() error report.

[E][esp32-hal-cpu.c:116] removeApbChangeCallback(): not found func=400EF2CC arg=3FFC012C

That will give you a stack dump and you can decode it to find where the out of sequence remove() is called before the add().

The error is outside of apbChangeCallback() code. This fix is now detecting a pre-existing problem.

Chuck.

@stickbreaker
Copy link
Contributor Author

@cyberman54 you could add a condition compile:

log_e("something bad happend");
#if (ARDUHAL_LOG_LEVEL >= ARDUHAL_LOG_LEVEL_DEBUG)
abort();
#endif

So it would only abort when compiled for debugging.

Chuck.

@cyberman54
Copy link
Contributor

cyberman54 commented Jan 26, 2020

@stickbreaker thanks for you help. I inserted an abort() and wrote a stack trace. See below. I am not sure how to properly read this: does the problem happen at the top or at the bottom of the strack trace?

Some Context: I am using the library MCCI LMIC which provides functions to communicate with a LORA RF chip via SPI. Stack trace looks like there goes something wrong with the SPI communication.

abort() was called at PC 0x400f0367 on core 1

Backtrace: 0x40091ad8:0x3ffb96a0 0x40091d55:0x3ffb96c0 0x400f0367:0x3ffb96e0 0x400eece9:0x3ffb9700 0x400eedae:0x3ffb9720 0x400db12a:0x3ffb9750 0x400d5995:0x3ffb9770 0x400e1093:0x3ffb9790 0x400e10a9:0x3ffb97b0 0x400df857:0x3ffb97d0 0x400d5922:0x3ffb97f0 0x40096e91:0x3ffb9810

stack:

0x40091ad8: invoke_abort at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp32/panic.c:715
0x40091d55: abort at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp32/panic.c:715

0x400f0367: removeApbChangeCallback at C:\Users.platformio\packages\framework-arduinoespressif32\cores\esp32/esp32-hal-cpu.c:231

0x400eece9: spiStopBus at C:\Users.platformio\packages\framework-arduinoespressif32\cores\esp32/esp32-hal-spi.c:283
0x400eedae: spiStartBus at C:\Users.platformio\packages\framework-arduinoespressif32\cores\esp32/esp32-hal-spi.c:283

0x400db12a: SPIClass::begin(signed char, signed char, signed char, signed char) at C:\Users.platformio\packages\framework-arduinoespressif32\libraries\SPI\src/SPI.cpp:270

0x400d5995: MyHalConfig_t::begin() at c:\Users\ESP32-Paxcounter/src/lorawan.cpp:626

0x400e1093: Arduino_LMIC::hal_init_with_pinmap(Arduino_LMIC::HalPinmap_t const*) at c:\Users\ESP32-Paxcounter/.pio\libdeps\dev\MCCI LoRaWAN LMIC library\src\hal/hal.cpp:312
0x400e10a9: hal_init_ex at c:\Users\ESP32-Paxcounter/.pio\libdeps\dev\MCCI LoRaWAN LMIC library\src\hal/hal.cpp:312
0x400df857: os_init_ex at c:\Users\ESP32-Paxcounter/.pio\libdeps\dev\MCCI LoRaWAN LMIC library\src\lmic/oslmic.c:136 (discriminator 1)
0x400d5922: lmictask(void*) at c:\Users\ESP32-Paxcounter/src/lorawan.cpp:626
0x40096e91: vPortTaskWrapper at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/port.c:355 (discriminator 1)

@cyberman54
Copy link
Contributor

cyberman54 commented Jan 26, 2020

Peeked into this some deeper: Looks like the problem is caused by spiStopBus() in file esp32/esp32-hal-spi.c which includes a removeApbChangeCallback(spi, _on_apb_change), which seems to fail.

@terrillmoore
Copy link

@cyberman54 the LMIC code doesn't use any thing special in the ESP32 SPI; it doesn't use callbacks at all. There was a change (mcci-catena/arduino-lmic#190) specifically done for ESP32 by @manuelbl, but it just uses standard higher-level SPI functions. I don't think this is an LMIC problem per se.

Possibilities to consider:

  1. memory corruption by LMIC or app. Not likely in LMIC as it doesn't use malloc() at all. Not really likely in your app because this obviously pretty early. But -- it appears you're using a separate task for the LMIC -- this is something I've not tested.
  2. LMIC use pattern of SPI uncovers latent bug. Again, this is during initialization -- not likely.
  3. Error in MyConfig_t::begin() -- this is code from your app, please review carefully. Looks like you're overriding the default.
  4. Incompatibilty between your code and the call to SPIclass::begin().
  5. The SPI logic might be tearing down some existing (default) setup because of your input parameters, possibly even for another kind of I/O -- and that is incorrectly setup.

Based on the stack backtrace, I'd investigate (5) further.

@stickbreaker
Copy link
Contributor Author

@cyberman54 the problem is that spiStartBus() calls spiStopBus() to initialize the hardware, then configures it, then registers the apbClock change callback. The last operation of spiStopBus() to remove the callback, but, since spiStartBus() has not registered the call back, boom! error.

It is a problem in esp32-hal-spi.c Not in your code. I'm working through it.

Chuck.

@stickbreaker
Copy link
Contributor Author

as a temp fix you can either ignore the error, and/or remove the log_e() in 'removeApbChangeCallback()`

@cyberman54
Copy link
Contributor

cyberman54 commented Jan 26, 2020

@terrillmoore @stickbreaker thanks for your efforts to analyze this. So it's a bug in arduino-esp32 which is now revealed.

@stickbreaker
Copy link
Contributor Author

@cyberman54 the fix is in #3675

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Using ledcWriteTone cause I2C memory fail
4 participants