Support for dense matrix calculations in C++ is in pretty good shape. There are a lot of libraries out there that can perform both lower level manipulations (row permutations, transposition, multiplication) and higher level algorithms (decompositions, solving), largely thanks to the simple memory layout and the long history of dense matrix algorithm optimization dating back to Fortran.

The memory layout of a dense matrix is very simple: basically a single block of memory storing rows*cols*sizeof(Value) bytes of data and indexed directly as, e.g., A[i+rows*j] for a column-major layout. For example, this matrix:

\[\begin{bmatrix} 1 & -1 & 0 & 4 \\ 0 & 1 & 0 & 3 \\ 6 & 0 & 0 & 1 \\ 3 & 2 & 0 & 2 \end{bmatrix}\]

can be laid out in memory as:

Dense Matrix

Years of research have gone into making the processing of these structures as fast as possible, with hardware and software tuned to exploit long latency (but high bandwidth) memory systems, and multiple parallel arithmetic units. However, some problems are by nature sparse, i.e., most of their entries would be zero if represented as a dense matrix. In such cases a different representation can be more efficient:

Sparse Matrix

To look up, for example, the value at \([1,1]\), you use the column, 1, as an index into the outer pointers. Entries 1 and 2 of the outer pointers give you the half-open range to search within the inner pointers - the values between entries 3 and 6. You find the row, 1, at offset 4 and its associated value, 1.0, is given by Values[4]. If you were to look up entry \([2,1]\) instead, you would not find row 2 in the inner pointer range, which means the associated value is 0. Similarly, looking up any value from column 2 would produce an empty inner pointer range, implying that every entry in that column is 0.

Using this structure gives up some regularity and increases the cost of a lookup from constant time to logarithmic (if the inner pointers are sorted), but reduces memory consumption from quadratic to linear in the number of nonzero entries. As memory latency continues to increase relative to cycle time, this becomes an attractive tradeoff.

Sparse Matrix Libraries in C++

Dense matrix support in C++ has been relatively good for some years thanks to

  • The availability of optimized libraries with C bindings, like LAPACK and BLAS, which can be wrapped to implement the necessary algorithms
  • The (relative) ease of working with directly indexed storage, which makes writing algorithms simpler
  • The compatibility of dense matrices with common mutating algorithms - many assume that in-place modification of the data happens.

Imagine, for example, how to write a transposition function for a column major sparse matrix. For a dense matrix this is just a pair of nested loops. For the sparse case, the resulting structure can be quite different - and even larger, if the original matrix has more rows than columns. So this means allocations and some complicated pointer reshuffling.

These differences are reflected in the availability of sparse matrix features in C++ libraries. It is generally the case that dense matrices are supported for both basic and higher level algorithms (such as decompositions) while sparse matrices, when available at all, lack the same level of support. Here are a few I’m familiar with:

Library Basic Algorithms Decompositions Acceleration Modern C++
Armadillo yes some some (via SuperLU) yes
Blaze yes no no yes
Boost.uBlas yes no no yes
CSparse yes yes no no
Eigen yes yes some (via SuiteSparse bindings) yes
SuiteSparse yes yes yes sort of

My dream library for sparse matrices would have:

  • Accelerated decompositions (via multiple back ends)
  • Consistent interfaces with the dense versions
  • A permissive license
  • Modern C++ style and optimizations
    • expression templates
    • automatic memory management via constructors/destructors
    • move semantics where helpful for efficiency
    • API alternatives via templates, not macros
  • Use of the Parallelism TS

Unfortunately such a library does not yet exist. In the next post I will demonstrate the use of the existing libraries Eigen, CSparse, and SuiteSparse, on a problem of interest to me, and compare them to my ideal.