Improving C++ backtraces with the gdb Python API
Though the use of modern, templatized C++ libraries is encouraged by experts, many users with a C background find the transition unpleasant. One common speed bump is the debugging experience - and in particular, the bewildering stack traces.
Scenario
Imagine a junior developer has been assigned to sort a sequence of vectors of strings by the first two strings in each vector. A sample input might look like this:
using Strings = std::vector<std::string>;
std::vector<Strings> data{
{"Frodo", "Sam", "Smeagol"},
{"Foo", "Bar", "Baz"},
{"Monoid", "Endofunctor", "Monad"}};
The senior developer on the project suggests the use of a standard library function, std::sort
, with a custom comparison predicate. The junior developer (after struggling with some extremely verbose compiler errors) comes up with this code:
std::sort(data.begin(), data.end(),
[](Strings const & a, Strings const & b) {
if (a[0] < b[0]) {
return true;
} else {
return a[1] < b[1];
}
});
Sadly her unit tests (which she wrote at the same time, of course) are failing, so she brings up gdb and steps into std::sort
to understand what’s happening:
43 std::sort(data.begin(), data.end(),
(gdb) s
std::vector<std::vector<std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > > >,
std::allocator<std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >,
std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > >::end (this=0x7fffffffdb10)
at /usr/include/c++/7/bits/stl_vector.h:581
581 end() _GLIBCXX_NOEXCEPT
(gdb)
Uh-oh. That won’t do at all. She runs the list
command to get the line number where her predicate starts, and sets a breakpoint there:
(gdb) b 45
Breakpoint 2 at 0x555555554ecb: file ..., line 45.
(gdb) c
Continuing.
Breakpoint 2, <lambda(const Strings&, const Strings&)>::operator()(...
45 if (a[0] < b[0]) {
(gdb) backtrace
<gibberish>
The output now is even worse… an entire page of standard library internal templatese. She decides to ignore it and debug through the lens of her predicate, where the bug turns out to be.
Making It Better
Developers familiar with the standard library are inured to seeing this kind of output scrolling by on their terminal, and just scan for the bits that are their code. Wouldn’t it be nice to have this done for us automatically?
The GDB Python API has a feature we can use: frame filters. We can use this to control the backtrace results, including omitting frames and changing how they are printed.
Frame Filters
gdb’s Python API allows you to define and register stack frame filters that will be applied during execution of the backtrace
command. Filters are classes that wrap an iterator over stack frames, producing a new iterator, in a process that will seem familiar to users of the Boost.Iterator or range-v3 libraries. In fact, building code this way is fairly mainstream for Python!
I need a new iterator that squashes subsequences of library stack frames into one (the original call). The building blocks I need are:
- An iterator filter that removes all but the last element of subsequences for which a predicate is true
- A predicate that tests whether a stack frame is from a library
Iterator Filter
First I write a Python Generator accepting an underlying iterator and a squash predicate:
def __cond_squash(iterable, squashfn):
"""compress subsequences for which a predicate is true"""
last = None # we have to buffer 1 item
for item in iterable:
if squashfn(item):
last = item
else:
if last is not None:
yield last # empty buffer this time
last = None
yield item # resume un-squashed iteration
if last is not None:
yield last # if we end in "squashed" mode
I buffer the previous element in order to output only the last in each squashable subsequence, and directly output any other element. Notice I get to write this in straightforward imperative style, ignoring the transfer of control implied by yield
- someday we will have this in C++, too.
Library Test Predicate
The frame filter API provides the name of the function being executed in the current frame. We can test this against, e.g., a regex of library namespaces:
(gdb) set backtrace-strip-regex ^(std|boost)::
We can create a one-line function to supply as the squashfn
parameter:
fn = lambda x : re.match(gdb.parameter('backtrace-strip-regex'), x.function())
A Frame Decorator
gdb also lets you control how each frame is displayed, using the Frame Decorator API. This is another frame iterator wrapper, one that modifies the values supplied by the iterator. I override the function()
method to replace some common sequences with shorter aliases, like std::__cxx11::basic_string<char>
with std::string
:
class CommonAliasDecorator(gdb.FrameDecorator.FrameDecorator):
def __init__(self, fobj):
super(CommonAliasDecorator, self).__init__(fobj)
def function(self):
name = self.inferior_frame().name()
# rename std::string
orig = name
name = re.sub('std::__cxx11::basic_string<char>', 'std::string', name)
# and many more
return name
Putting It Together
We wrap the original frame iterator with the squashing iterator (using the regex predicate), wrap that with the decorator, and register the result:
class UserFilter:
"""Scrub library functions in stack trace"""
def __init__(self):
# set required attributes
self.name = 'UserFilter'
self.enabled = True
self.priority = 0
# register with current program space
gdb.current_progspace().frame_filters[self.name] = self
def filter(self, frame_iter):
# create predicate
fn = lambda x : re.match(gdb.parameter('backtrace-strip-regex'),
x.function())
# wrap iterator in squasher
ufi = self.__cond_squash(frame_iter, fn)
# wrap in the decorator and return
return imap(CommonAliasDecorator, ufi) # decorate each generated frame
The full system looks like:
Results
With the internal library functions filtered out and aliases applied, our stack trace looks a lot cleaner:
(gdb) bt
#0 0x0000555555554ecb in <lambda(const Strings&, const Strings&)>::operator()(const Strings &, const Strings &) const (__closure=0x7fffffffd9a0, a=std::vector of length 3, capacity 3 = {...}, b=std::vector of length 3, capacity 3 = {...}) at ...
#5 0x00005555555557d3 in std::sort<std::vector<std::vector<std::string > >::iterator, main()::<lambda(const Strings&, const Strings&)> >(std::vector<std::vector<std::string > >::iterator, std::vector<std::vector<std::string > >::iterator, <lambda(const Strings&, const Strings&)>) (__first=std::vector of length 3, capacity 3 = {"Frodo", "Sam", "Smeagol"}...
at /usr/include/c++/7/bits/stl_algo.h:4868
#6 0x00005555555553f1 in main () at ...
The user sees that main
called std::sort
, which in turn called their predicate. Four stack frames were omitted, and the remaining ones have readable type names.
Observations
We can overcome barriers to adoption of modern C++ techniques, in some cases, via technical means. Investing in tooling, like custom gdb plugins, is one such way. Python has its own rich ecosystem, and combining it with the power of gdb opens a new world of possibilities. I plan to explore more of them in future posts.
You can find my Python/gdb code on Github.