Skip to content

Debugging

Jon Szymaniak edited this page Nov 26, 2013 · 17 revisions

This page provides tips on debugging various portions of the bladeRF code base.

If you suspect you've encountered a bug that hasn't been reported yet, it is generally very helpful to include results of some preliminary debugging to an issue tracker item, forum post, or question on IRC.

When reporting bugs, developers and community members may ask you to provide additional information, using some of the techniques described here.

Please remember to use a website such as http://pastebin.com/ or http://pastie.org/ when sharing debug logs on IRC.

Table of Contents

Host-side tools and libraries

Enable Additional Information

Configuring a Debug Build

In most cases, one will want to ensure debug symbols are enabled while debugging. This can be enabled by setting the CMake CMAKE_BUILD_TYPE variable to Debug:

$ cmake -DCMAKE_BUILD_TYPE=Debug ../

GCC/GDB users will likely want to include as much debug information as possible. Use ENABLE_GDB_EXTENSIONS to compile with -ggdb3:

$ cmake -DCMAKE_BUILD_TYPE=Debug -DENABLE_GDB_EXTENSIONS=Yes ../

Increasing libbladeRF's verbosity

To increase the verbosity of libbladeRF's output, add bladerf_log_set_verbosity(BLADERF_LOG_LEVEL_VERBOSE) early in your program. See the bladerf_log_level enumeration for other possible log levels.

Crashes and Memory-related Failures

When dealing with a crashing application, one generally wants to identify:

  1. The state of the application at the time the failure occurred, including what the program was currently executing, and the state of relevant variables.
  2. The origin of any incorrect values, such as when a pointer was assigned a NULL value.

A note on debugging with GNU Radio

When possible, try to reproduce failures with the bladeRF-cli or a simple test applications to remove some degree of complexity from the situation, as well as to narrow the scope of where the defects may reside.

If this is not possible, or the failure appears to be associated with the gr-osmosdr bladeRF support, it is recommended that you review the GNU Radio documentation on debugging with gdb first.

Obtaining a Backtrace in GDB

To quote the GDB documentation, "[a] backtrace is a summary of how your program got to where it is." Backtraces in GDB show the current stack frame followed by the calling stack frames.

The below examples shows a snippet from a simple debug session of the bladeRF-cli, where an invalid loop bound defect was introduced to result in a segfault. In this example, we collect a backtrace and inspect the state of a function 2 stack frames prior to location where the segfault occurred.

We first start by running the program (bladeRF-cli, in this case) in GDB. Note that we tell GDB to interpret the arguments following the program name as arguments to that program, rather than to gdb, via --args.

$ gdb --args bladeRF-cli -p   

... GDB copyright text ...
Reading symbols from /usr/local/bin/bladeRF-cli...done.
(gdb) run
Starting program: /usr/local/bin/bladeRF-cli -p
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff6d8c700 (LWP 4774)]
[New Thread 0x7ffff658b700 (LWP 4775)]
[New Thread 0x7ffff5d8a700 (LWP 4776)]
[INFO] Found FX3 bootloader device on bus=2 addr=4. This may be a bladeRF.
[INFO] Use bladeRF-cli command "recover 2 4 <FX3 firmware>" to boot the bladeRF firmware.

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7574b28 in libusb_get_device_descriptor () from /lib/x86_64-linux-gnu/libusb-1.0.so.0

Upon running the program, we see it terminate (crash) with a segmentation fault. Note that the location and name of the function that the crash occurred in is shown here -- this is an important piece of information to include an your findings.

If the function that the program has crashed in is called from many places in the code, one many not be able to glean much from this information.

The backtrace or bt command may be used to display the currently executing function and its callers. Here we can see the path of function calls taken, as well as the arguments to each function call, with line numbers.

(gdb) bt
#0  0x00007ffff7574b28 in libusb_get_device_descriptor ()
   from /lib/x86_64-linux-gnu/libusb-1.0.so.0
#1  0x00007ffff7bcfdd9 in lusb_device_is_bladerf (dev=0x0)
    at /home/jon/projects/bladeRF/host/libraries/libbladeRF/src/backend/libusb.c:265
#2  0x00007ffff7bd349c in lusb_probe (info_list=0x7fffffffe210)
    at /home/jon/projects/bladeRF/host/libraries/libbladeRF/src/backend/libusb.c:2003
#3  0x00007ffff7bc8026 in backend_probe (devinfo_items=0x7fffffffe268, num_items=0x7fffffffe260)
    at /home/jon/projects/bladeRF/host/libraries/libbladeRF/src/backend.c:97
#4  0x00007ffff7bc80bf in bladerf_get_device_list (devices=0x7fffffffe2a8)
    at /home/jon/projects/bladeRF/host/libraries/libbladeRF/src/bladerf.c:49
#5  0x0000000000409170 in cmd_probe (s=0x618010, argc=1, argv=0x6186c0)
    at /home/jon/projects/bladeRF/host/utilities/bladeRF-cli/src/cmd/probe.c:41
#6  0x00000000004066a9 in cmd_handle (s=0x618010, line=0x40f83e "probe")
    at /home/jon/projects/bladeRF/host/utilities/bladeRF-cli/src/cmd/cmd.c:640
#7  0x0000000000405404 in main (argc=2, argv=0x7fffffffe458)
    at /home/jon/projects/bladeRF/host/utilities/bladeRF-cli/src/main.c:360
(gdb) 

From the above backtrace, we can see how we got to the libusb_get_device_descriptor(). Note that in frame #1, lusb_device_is_bladerf() was called with a NULL argument, which is suspicious. To figure out why a NULL pointer was passed to this function, we'll need to inspect the state of the program in stack frame #2, inside lusb_probe()

The frame or f command may be used to select and print a stack frame:

(gdb) frame 2
#2  0x00007ffff7bd349c in lusb_probe (info_list=0x7fffffffe210)
    at /home/jon/projects/bladeRF/host/libraries/libbladeRF/src/backend/libusb.c:2003
2003            if( lusb_device_is_bladerf(list[i]) ) {

Now we can take a look at the state of the local variables in this function, and print out some additional information:

(gdb) info local
status = 0
i = 13
n = 0
count = 13
list = 0x618ef0
info = {backend = BLADERF_BACKEND_LIBUSB, 
  serial = "0000000004BE", '\000' <repeats 16 times>, "\320\341\377\377\377", 
  usb_bus = 2 '\002', usb_addr = 4 '\004', instance = 4156353024}
context = 0x618840
(gdb) print list[13]
$2 = (libusb_device *) 0x0
(gdb) print list[12]
$3 = (libusb_device *) 0x61a1c0
(gdb) 

In this code, we see that a count local variable is 13. This is supposed to be used as an upper bound of a loop. However, we see that the variable i is 13, causing us to access an invalid list element and attempt to dereference a NULL pointer. (The valid elements are in list[0] through list[12]).

Using Valgrind to Identify Invalid Accesses

This section builds upon segfault example from the previous section.

Sometimes an invalid access may not cause a program to immediately (and conveniently) crash and burn. Instead, the problem many manifest itself as an intermittent issue where data is occasionally corrupted.

Valgrind tends to come in handy in this type of situation. Dr. Memory is a similar tool, which also supports 32-bit Windows applications.

Running the bladeRF-cli through valgrind on yields similar results to what we obtained with gdb in the previous section. However, it appears that list[13] didn't contain a value of 0x0 (it is likely that gdb had zeroed out memory for us), but rather a value of 0x68.

The --track-origins=yes argument is often handy for determining where an uninitialized value came from.

$ valgrind --track-origins=yes bladeRF-cli 
--4920-- Memcheck, a memory error detector
--4920-- Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
--4920-- Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
--4920-- Command: bladeRF-cli -p
--4920-- 
[INFO] Found FX3 bootloader device on bus=2 addr=4. This may be a bladeRF.
[INFO] Use bladeRF-cli command "recover 2 4 <FX3 firmware>" to boot the bladeRF firmware.
--4920-- Invalid read of size 8
--4920--    at 0x5490B28: libusb_get_device_descriptor (in /lib/x86_64-linux-gnu/libusb-1.0.so.0.1.0)
--4920--    by 0x4E3EDD8: lusb_device_is_bladerf (libusb.c:265)
--4920--    by 0x4E4249B: lusb_probe (libusb.c:2003)
--4920--    by 0x4E37025: backend_probe (backend.c:97)
--4920--    by 0x4E370BE: bladerf_get_device_list (bladerf.c:49)
--4920--    by 0x40916F: cmd_probe (probe.c:41)
--4920--    by 0x4066A8: cmd_handle (cmd.c:640)
--4920--    by 0x405403: main (main.c:360)
--4920--  Address 0x68 is not stack'd, malloc'd or (recently) free'd
--4920-- 
--4920-- 
--4920-- Process terminating with default action of signal 11 (SIGSEGV)
--4920--  Access not within mapped region at address 0x68
--4920--    at 0x5490B28: libusb_get_device_descriptor (in /lib/x86_64-linux-gnu/libusb-1.0.so.0.1.0)
--4920--    by 0x4E3EDD8: lusb_device_is_bladerf (libusb.c:265)
--4920--    by 0x4E4249B: lusb_probe (libusb.c:2003)
--4920--    by 0x4E37025: backend_probe (backend.c:97)
--4920--    by 0x4E370BE: bladerf_get_device_list (bladerf.c:49)
--4920--    by 0x40916F: cmd_probe (probe.c:41)
--4920--    by 0x4066A8: cmd_handle (cmd.c:640)
--4920--    by 0x405403: main (main.c:360)
--4920--  If you believe this happened as a result of a stack
--4920--  overflow in your program's main thread (unlikely but
--4920--  possible), you can try to increase the size of the
--4920--  main thread stack using the --main-stacksize= flag.
--4920--  The main thread stack size used in this run was 8388608.
--4920-- 
--4920-- HEAP SUMMARY:
--4920--     in use at exit: 18,927 bytes in 59 blocks
--4920--   total heap usage: 1,268 allocs, 1,209 frees, 286,783 bytes allocated
--4920-- 
--4920-- LEAK SUMMARY:
--4920--    definitely lost: 0 bytes in 0 blocks
--4920--    indirectly lost: 0 bytes in 0 blocks
--4920--      possibly lost: 888 bytes in 6 blocks
--4920--    still reachable: 18,039 bytes in 53 blocks
--4920--         suppressed: 0 bytes in 0 blocks
--4920-- Rerun with --leak-check=full to see details of leaked memory
--4920-- 
--4920-- For counts of detected and suppressed errors, rerun with: -v
--4920-- ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 2 from 2)

Saving the Output of Valgrind

When trouble is aplenty, valgrind output can become quite voluminous, especially when running valgrind on python (e.g., in attempt to debug issues occurring with GNU Radio support).

Valgrind output is printed to stderr. To save stderr output to a file:

valgrind osmocom_fft -a bladerf=0 -s 2M -f 446M 2> valgrind_log.txt

To save both stderr and stdout to a log:

valgrind osmocom_fft -a bladerf=0 -s 2M -f 446M 1> valgrind_log.txt 2>&1

To save both stderr and stdout to a log, and continue to show the output on the terminal, use tee:

valgrind osmocom_fft -a bladerf=0 -s 2M -f 446M 2>&1 | tee valgrind.log

In the above examples, you'll see quite a bit of "noise" from python. Grep for "blade" to track down issues directly pertaining to bladeRF code.

Using Valgrind and GDB simultaneously

It is possible to break in gdb at the first sign of trouble, using Valgrind's vgdb argument, which enables an embedded gdbserver.

To do... Finish this section

FX3 Firmware

To do...

FPGA

To do...

Clone this wiki locally