Though the use of modern, templatized C++ libraries is encouraged by experts, many users with a C background find the transition unpleasant. One common speed bump is the debugging experience - and in particular, the bewildering stack traces.

Scenario

Imagine a junior developer has been assigned to sort a sequence of vectors of strings by the first two strings in each vector. A sample input might look like this:

using Strings = std::vector<std::string>;
std::vector<Strings> data{
    {"Frodo", "Sam", "Smeagol"},
    {"Foo", "Bar", "Baz"},
    {"Monoid", "Endofunctor", "Monad"}};

The senior developer on the project suggests the use of a standard library function, std::sort, with a custom comparison predicate. The junior developer (after struggling with some extremely verbose compiler errors) comes up with this code:

std::sort(data.begin(), data.end(),
          [](Strings const & a, Strings const & b) {
              if (a[0] < b[0]) {
                  return true;
              } else {
                  return a[1] < b[1];
              }
          });

Sadly her unit tests (which she wrote at the same time, of course) are failing, so she brings up gdb and steps into std::sort to understand what’s happening:

43          std::sort(data.begin(), data.end(),
(gdb) s
std::vector<std::vector<std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > > >,
std::allocator<std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >,
std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > >::end (this=0x7fffffffdb10)
    at /usr/include/c++/7/bits/stl_vector.h:581
581           end() _GLIBCXX_NOEXCEPT
(gdb) 

Uh-oh. That won’t do at all. She runs the list command to get the line number where her predicate starts, and sets a breakpoint there:

(gdb) b 45
Breakpoint 2 at 0x555555554ecb: file ..., line 45.
(gdb) c
Continuing.

Breakpoint 2, <lambda(const Strings&, const Strings&)>::operator()(...
45                        if (a[0] < b[0]) {
(gdb) backtrace
<gibberish>

The output now is even worse… an entire page of standard library internal templatese. She decides to ignore it and debug through the lens of her predicate, where the bug turns out to be.

Making It Better

Developers familiar with the standard library are inured to seeing this kind of output scrolling by on their terminal, and just scan for the bits that are their code. Wouldn’t it be nice to have this done for us automatically?

The GDB Python API has a feature we can use: frame filters. We can use this to control the backtrace results, including omitting frames and changing how they are printed.

Frame Filters

gdb’s Python API allows you to define and register stack frame filters that will be applied during execution of the backtrace command. Filters are classes that wrap an iterator over stack frames, producing a new iterator, in a process that will seem familiar to users of the Boost.Iterator or range-v3 libraries. In fact, building code this way is fairly mainstream for Python!

I need a new iterator that squashes subsequences of library stack frames into one (the original call). The building blocks I need are:

  1. An iterator filter that removes all but the last element of subsequences for which a predicate is true
  2. A predicate that tests whether a stack frame is from a library

Iterator Filter

First I write a Python Generator accepting an underlying iterator and a squash predicate:

def __cond_squash(iterable, squashfn):
    """compress subsequences for which a predicate is true"""
    last = None              # we have to buffer 1 item
    for item in iterable:
        if squashfn(item):
            last = item
        else:
            if last is not None:
                yield last   # empty buffer this time
                last = None
            yield item       # resume un-squashed iteration
    if last is not None:
        yield last           # if we end in "squashed" mode

I buffer the previous element in order to output only the last in each squashable subsequence, and directly output any other element. Notice I get to write this in straightforward imperative style, ignoring the transfer of control implied by yield - someday we will have this in C++, too.

Library Test Predicate

The frame filter API provides the name of the function being executed in the current frame. We can test this against, e.g., a regex of library namespaces:

(gdb) set backtrace-strip-regex ^(std|boost)::

We can create a one-line function to supply as the squashfn parameter:

fn = lambda x : re.match(gdb.parameter('backtrace-strip-regex'), x.function())

A Frame Decorator

gdb also lets you control how each frame is displayed, using the Frame Decorator API. This is another frame iterator wrapper, one that modifies the values supplied by the iterator. I override the function() method to replace some common sequences with shorter aliases, like std::__cxx11::basic_string<char> with std::string:

class CommonAliasDecorator(gdb.FrameDecorator.FrameDecorator):
    def __init__(self, fobj):
        super(CommonAliasDecorator, self).__init__(fobj)

    def function(self):
        name = self.inferior_frame().name()
        # rename std::string
        orig = name
        name = re.sub('std::__cxx11::basic_string<char>', 'std::string', name)
        # and many more

        return name

Putting It Together

We wrap the original frame iterator with the squashing iterator (using the regex predicate), wrap that with the decorator, and register the result:

class UserFilter:
    """Scrub library functions in stack trace"""

    def __init__(self):
        # set required attributes
        self.name = 'UserFilter'
        self.enabled = True
        self.priority = 0

        # register with current program space
        gdb.current_progspace().frame_filters[self.name] = self

    def filter(self, frame_iter):
        # create predicate
        fn = lambda x : re.match(gdb.parameter('backtrace-strip-regex'),
                                 x.function())
        # wrap iterator in squasher
        ufi = self.__cond_squash(frame_iter, fn)
        # wrap in the decorator and return
        return imap(CommonAliasDecorator, ufi)  # decorate each generated frame

The full system looks like:

Block Diagram

Results

With the internal library functions filtered out and aliases applied, our stack trace looks a lot cleaner:

(gdb) bt
#0  0x0000555555554ecb in <lambda(const Strings&, const Strings&)>::operator()(const Strings &, const Strings &) const (__closure=0x7fffffffd9a0, a=std::vector of length 3, capacity 3 = {...}, b=std::vector of length 3, capacity 3 = {...}) at ...
#5  0x00005555555557d3 in std::sort<std::vector<std::vector<std::string > >::iterator, main()::<lambda(const Strings&, const Strings&)> >(std::vector<std::vector<std::string > >::iterator, std::vector<std::vector<std::string > >::iterator, <lambda(const Strings&, const Strings&)>) (__first=std::vector of length 3, capacity 3 = {"Frodo", "Sam", "Smeagol"}...
    at /usr/include/c++/7/bits/stl_algo.h:4868
#6  0x00005555555553f1 in main () at ...

The user sees that main called std::sort, which in turn called their predicate. Four stack frames were omitted, and the remaining ones have readable type names.

Observations

We can overcome barriers to adoption of modern C++ techniques, in some cases, via technical means. Investing in tooling, like custom gdb plugins, is one such way. Python has its own rich ecosystem, and combining it with the power of gdb opens a new world of possibilities. I plan to explore more of them in future posts.

You can find my Python/gdb code on Github.