Using gdb to inspect a crashing app

January 26, 2022 #linux #debugging

This is more or less a story about how one can attempt to debug an application crash by attaching to it with gdb and poking around, while resisting the urge to build the application manually. Such cases where this is useful might be when running something that takes a long time to compile, or which might have a complicated build system. It's easy to run into these situations when the system is relatively underpowered phone running Linux.

I recently came across a strange issue on my phone when running Phosh, where it would crash if you (or the system package manager) ran dconf update. This is being done in postmarketOS by a UI package in the distro, for "applying" some configuration settings for UI scaling that is useful for phones. The crash, however, is really not useful. If the system is performing an upgrade using a shell or app started in Phosh, the upgrade goes down with Phosh.

Well, we can't have that! So let's see if we can at least figure out why this is happening by debugging directly on the phone (via an SSH session)!

The first hint something is truly going sideways is this single line from the desktop manager (tinydm) log:

gnome-session-binary[32555]: WARNING: Application 'sm.puri.Phosh.desktop' killed by signal 11

Signal 11 is a segmentation fault. Since it's easy to trigger the crash manually, and not (as) easy running Phosh manually, we can use gdb to attach to the running Phosh process. Before doing that, it's helpful to install debug symbols for some things we'll likely encounter in any backtrace in gdb. Phosh is a GTK/GLib app, and I'm running on Alpine Linux which uses musl for libc. So let's start out by installing symbols for these components:

librem5:~/src/phosh $ doas apk add musl-dbg glib-dbg gtk+3.0-dbg phosh-dbg

Big shout out to the kind soul who added the debug symbols package for Phosh in Alpine's aports!!! Manual local build of phosh averted!

With symbols installed, let's fire up gdb:

librem5:~/src/phosh $ gdb --pid $(pidof phosh)
GNU gdb (GDB) 11.2
....
28      src/thread/aarch64/syscall_cp.s: No such file or directory.
(gdb) c
Continuing.

And trigger the crash:

librem5:~/src/phosh $ doas dconf update

Boom!

Thread 1 "phosh" received signal SIGSEGV, Segmentation fault.
get_meta (p=p@entry=0xffffbb8528a0 "\005") at src/malloc/mallocng/meta.h:135
135     src/malloc/mallocng/meta.h: No such file or directory.
(gdb)

When running the full backtrace (with the bt command in gdb), there are several messages similar to: glib/gmain.c: No such file or directory. Having the source files is really helpful if you need to jump to different frames in the backtrace to poke around. I usually just clone the source code and check out the tag relevant for the version I have installed, then inform gdb of the new search directory. Something like:

librem5:~/src/phosh $ cd ../
librem5:~/src/phosh $ git clone https://github.com/GNOME/glib.git
librem5:~/src/phosh $ cd glib
librem5:~/src/glib$ apk info glib
glib-2.70.1-r0 description:
...
librem5:~/src/glib $ git checkout refs/tags/2.70.1
librem5:~/src/glib $ cd -
# back in gdb session:
(gdb) directory ../glib
Source directories searched: /home/clayton/src/phosh/../glib:$cdir:$cwd

With that out of the way, we stand a chance of having a somewhat useful backtrace, let's see!

(gdb) bt
#0  get_meta (p=p@entry=0xffffbb8528a0 "\005") at src/malloc/mallocng/meta.h:135
#1  0x0000ffffbed37294 in __libc_free (p=0xffffbb8528a0) at src/malloc/mallocng/free.c:105
#2  0x0000ffffbed36974 in free (p=<optimized out>) at src/malloc/free.c:5
#3  0x0000ffffbdebcdb8 in g_free (mem=<optimized out>) at ../glib/gmem.c:199
#4  0x0000ffffbded3498 in g_strfreev (str_array=<optimized out>) at ../glib/gstrfuncs.c:2560
#5  g_strfreev (str_array=0xffffbaa55d90) at ../glib/gstrfuncs.c:2553
#6  0x0000aaaac52ebe7c in on_keybindings_changed (self=self@entry=0xffffbaa718f0 [PhoshRunCommandManager])
    at ../src/run-command-manager.c:134
#7  0x0000ffffbdfaa990 in g_cclosure_marshal_VOID__STRINGv
   Python Exception <class 'gdb.MemoryError'>: Cannot access memory at address 0x1f
 (closure=0xffffb9f6b960, return_value=<optimized out>, instance=<optimized out>, args=#8  0x0000ffffbdfa80b0 in _g_closure_invoke_va
    (closure=closure@entry=0xffffb9f6b960, return_value=return_value@entry=0x0, instance=instance@entry=0xffffb9f65b00, args=..., n_params=1, param_types=0xffffbd0bfd10) at ../gobject/gclosure.c:893
#9  0x0000ffffbdfbc914 in g_signal_emit_valist
    (instance=instance@entry=0xffffb9f65b00, signal_id=<optimized out>, detail=<optimized out>, var_args=...)
    at ../gobject/gsignal.c:3406
#10 0x0000ffffbdfbd224 in g_signal_emit
    (instance=instance@entry=0xffffb9f65b00, signal_id=<optimized out>, detail=<optimized out>)
    at ../gobject/gsignal.c:3553
#11 0x0000ffffbe0ebf74 in g_settings_real_change_event
    (settings=0xffffb9f65b00 [GSettings], keys=0xffffbc253e40, n_keys=<optimized out>) at ../gio/gsettings.c:392
#12 0x0000ffffbe076d78 in _g_cclosure_marshal_BOOLEAN__POINTER_INTv
   Python Exception <class 'gdb.MemoryError'>: Cannot access memory at address 0xb9fb4280
 (closure=<optimized out>, return_value=0xffffc69a41d8, instance=<optimized out>, args=#13 0x0000ffffbdfa65e0 in g_type_class_meta_marshalv
    (closure=<optimized out>, return_value=<optimized out>, instance=<optimized out>, args=..., marshal_data=<optimized out>, n_params=<optimized out>, param_types=<optimized out>) at ../gobject/gclosure.c:1058
#14 0x0000ffffbdfa80b0 in _g_closure_invoke_va (closure=closure@entry=0xffffbc6dba90, return_value=0xffffc69a41d8,
    return_value@entry=0x0, instance=instance@entry=0xffffb9f65b00, args=..., n_params=2, param_types=0xffffbc7c2b70)
    at ../gobject/gclosure.c:893
#15 0x0000ffffbdfbc914 in g_signal_emit_valist
    (instance=instance@entry=0xffffb9f65b00, signal_id=<optimized out>, detail=detail@entry=0, var_args=...)
    at ../gobject/gsignal.c:3406
#16 0x0000ffffbdfbd224 in g_signal_emit
    (instance=instance@entry=0xffffb9f65b00, signal_id=<optimized out>, detail=detail@entry=0)
    at ../gobject/gsignal.c:3553
#17 0x0000ffffbe0ed498 in settings_backend_path_changed
    (target=<optimized out>, backend=<optimized out>, path=0xffffbb562d40 "/", origin_tag=<optimized out>)
    at ../gio/gsettings.c:467
#18 0x0000ffffbe0e7620 in g_settings_backend_invoke_closure (user_data=0xffffb9867800) at ../gio/gsettingsbackend.c:273
#19 0x0000ffffbdeb7dd0 in g_main_dispatch (context=0xffffbdb49ec0) at ../glib/gmain.c:3381
#20 g_main_context_dispatch (context=context@entry=0xffffbdb49ec0) at ../glib/gmain.c:4099
#21 0x0000ffffbdeb8030 in g_main_context_iterate
    (context=0xffffbdb49ec0, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>)
    at ../glib/gmain.c:4175
#22 0x0000ffffbdeb8480 in g_main_loop_run (loop=loop@entry=0xffffbac235b0) at ../glib/gmain.c:4373
#23 0x0000ffffbe643180 in gtk_main () at ../gtk/gtkmain.c:1329
#24 0x0000aaaac5286814 in main (argc=<optimized out>, argv=<optimized out>) at ../src/main.c:142

That's a lot to take in at once... but...

Ah ha! At the top of the stack (frames 0-4), the g_free, free, and so on suggest there's probably an invalid pointer being freed (like, either because of a double free or some corruption.) The numbers on the far left (e.g. #0, #1, ...) are the frame numbers.

Frame #6 looks interesting, it's the top-most function from Phosh (on_keybindings_changed) itself in the stack, right before glib went boom trying to free some stuff.

I spent a lot of time inspecting the call to on_keybindings_changed, and the subsequent frames in the stack to try and figure out what might be going wrong, but wasn't able to see anything obvious. Oh well. On the bright side, we now have a useful backtrace that can be shared with the Phosh developers, since all of the symbols are there.

So, this is where the post ends abruptly. After chatting with the Phosh developers, this particular area received a few changes/fixes in the last development cycle and, indeed, I can no longer reproduce the crash when using the main branch. At the very least, hopefully this is a helpful "how to" for getting set up to debug similar issues, and shows the importance of having debug symbols available!