Introduction

In the process of learning about linear systems I encountered the concept of moments as a way of characterizing system behavior. These values are derived from the Maclaurin series expansion of the system function:

\[\begin{align*} \boldsymbol{H}(s)& = \boldsymbol{m}_0 + \boldsymbol{m}_1s + \boldsymbol{m}_2s^2 + \cdots \end{align*}\]

These \(m_i\) values describe the system’s behavior so well that the degree to which two systems’ moments match is used as a measure of equivalence between systems, e.g. in Model Order Reduction research. They also have some direct usefulness: for circuits, the 0th moment gives the DC gain and thus an indication of connectivity between input and output; the 1st moment is the famous Elmore delay metric.

For circuits represented as matrices using Modified Nodal Analysis (MNA), the matrix-valued moments can be calculated directly. Starting with the time-domain description:

\[\begin{align*} \boldsymbol{C}\frac{d\boldsymbol{X}(t)}{dt}& = -\boldsymbol{G} \boldsymbol{X}(t) + \boldsymbol{B} \boldsymbol{u}(t) \\ \boldsymbol{Y}& = \boldsymbol{L} \boldsymbol{X}(t) \end{align*}\]

Taking the Laplace transform and rearranging:

\[\begin{align*} s\boldsymbol{C}\boldsymbol{X}& = -\boldsymbol{G} \boldsymbol{X} + \boldsymbol{B} \boldsymbol{u}(s) \\ (\boldsymbol{I} + s\boldsymbol{G}^{-1}\boldsymbol{C})\boldsymbol{X}& = \boldsymbol{G}^{-1} \boldsymbol{B} \boldsymbol{u}(s) \\ \boldsymbol{H}(s)& = \frac{\boldsymbol{Y}(s)}{\boldsymbol{u}(s)} \\ \boldsymbol{H}(s)& = \frac{\boldsymbol{L}\boldsymbol{G}^{-1}\boldsymbol{B}}{\boldsymbol{I}+s\boldsymbol{G}^{-1}\boldsymbol{C}} \end{align*}\]

Finally we use the definition of the Maclaurin series:

\[\begin{align*} \boldsymbol{H}(s)& = \boldsymbol{H}(0) + \boldsymbol{H}'(0)s + \frac{1}{2!} \boldsymbol{H}''(0)s + \cdots \\ \therefore \boldsymbol{m}_i& = \boldsymbol{L} (-\boldsymbol{G}^{-1}\boldsymbol{C})^i (\boldsymbol{G}^{-1}\boldsymbol{B}) \end{align*}\]

where \(L\) is the state-to-output translation matrix, \(B\) is the input-to-state matrix, and \(G\) and \(C\) are the non-time-dependent and time-dependent component matrices, respectively.

Hints of Something More

As I’ve experimented with model order reduction algorithms I’ve occasionally Googled some aspect of moments and found, to my surprise, hits that were almost entirely in the field of statistics. Most of the associated information seemed to have nothing to do with what I was studying. Here’s what I learned:

In statistics, the moments of a probability density function \(f(X)\) are defined as follows:

\[\begin{align*} m_i& = \int_0^\infty X^if(X)dx \end{align*}\]

which is just the expected value of \(X^i\). In particular, the first (\(i=1\)) moment is the mean of the distribution.

Enlightenment

Most of my references for linear systems use the term moment without ever commenting on the terminology overlap. However, one day I came across a hint to the connection: one author said (paraphrasing) that Elmore “approximated the median with the mean” in his delay metric. On the face of it this makes no sense, because a linear system response is not a random process! But it was a tantalizing clue…

Eventually I found a more detailed summary of Elmore’s work (I could not find a non-paywalled version of his 1948 paper) that explained this connection. Consider the example of a single-pole low-pass filter with time constant \(\tau\) (such as a simple RC circuit with \(\tau=RC\)). Its unit step response in the time domain is:

\[\begin{align*} y(t)& = 1 - e^{\frac{-t}{\tau}} \end{align*}\]

Its impulse response is just the derivative of the step response, or:

\[\begin{align*} h(t)& = \frac{1}{\tau}e^{\frac{-t}{\tau}} \end{align*}\]

These two waveforms are nearly identical (can I say “isomorphic”?) to the cumulative distribution and probability density functions of an exponential distribution with rate \(\lambda=\frac{1}{\tau}\). So we may regard the CDF as an analog of the step response, and the PDF as an analog of the impulse response.

For a general LTI system, the 50% delay of the step response (that is, the time until the output reaches 50% of its final value) can be calculated as:

\[\begin{align*} y(T_D)& = \frac{1}{2} \\ \int_0^{T_D} h(t) dt& = \frac{1}{2} \tag*{(no analytic form, generally)} \end{align*}\]

If \(h(t)\) were a PDF, \(T_D\) would be the median of the distribution. Elmore observed that the median is expensive to calculate, but the mean, a.k.a. the first moment:

\[\begin{align*} m_1 = -\int_0^\infty th(t)dt \end{align*}\]

is a cheaper approximation (via matrix multiplication as above, or two passes through an RC tree), with known error bounds.

So this is the source of the “moment” terminology: a similarity between calculations in two distantly related fields.

Moving Forward

It turns out that other researchers have taken Elmore’s insight to the next level by trying to fit circuit responses to other types of probability distributions (Gamma, Weibull, lognormal); some have been demonstrated to be generally more accurate, though none (as of 2008) have the guaranteed upper bound property of Elmore’s metric.

Other Connections between Fields

Statistics has a concept of a “Moment Generating Function”, defined for a PDF \(f(X)\) as:

\[\begin{align*} M_X(t)& = \int_{-\infty}^\infty e^{tx}f(X)dX \end{align*}\]

This function has the nice property of supplying the various moments with a series of derivative operations:

\[\begin{align*} m_n& = \frac{d^nM_X}{d^nt}(0) \end{align*}\]

We can equivalently define the moment generating function as:

\[\begin{align*} M_X(t)& = \int_{-\infty}^\infty e^{-j2\pi tX}f(X)dX \\ m_n& = \frac{1}{(-2\pi j)^n}M_X^n(0) \end{align*}\]

and the moment-generating function becomes the Fourier transform of the PDF! With this transformation, tools from linear systems theory become available for analyzing a random process.