I’m following up from my previous post because I got some feedback on the PR’s performance from the Eigen maintainers. You may recall it had two parts:

  1. Taking advantage of the identity right hand side (RHS) when producing a Q matrix from \(Q \times I\)
  2. Switching the inner (result column \(j\)) and outer (Householder vector \(h_k\)) loops and precalculating \({\beta}_k h_k\)

The first of those was acknowledged and quickly taken into Eigen in a somewhat different form. The second did not do so well.

Reality Strikes

One maintainer ran my changes on a selection of “real world” matrices taken from the SuiteSparse Matrix Collection using his own benchmark code and found a moderate to significant slowdown, in contrast to my measured 2-3X speedup. So what went wrong?

The real world matrices differed from mine in at least two significant ways:

  1. They were considerably sparser. I measured 5, 8, and 20% dense matrices but theirs varied between 0.5 and 1.5%
  2. Mine were uniformly random - that is, every entry had an equal chance of having a nonzero value. Real world matrices are often “lumpy”.

In addition, solving against a single vector (which involves multiplying by \(Q^*\)) was an important metric. For this case there is only one result column and we get no savings from precalculating \({\beta}_k h_k\). My tests always used square RHS matrices of the same size as \(Q\).

Conclusion

I’m going to take another look at this later - I still have some ideas that might help - but the lesson learned for now is be sure you have the right metrics when you optimize.