S4 and State Space Models

(Continuous) State Space Models (SSM)

The continuous state space model maps a continuous 1-D input signal u(t) scalar, to a continuous N-D latent state x(t), before projecting to a 1-D output signal scalar y(t), and defined by the two equations:

\[\newcommand{\bm}[1]{\textbf{#1}} \begin{equation} \label{eq:1} \begin{aligned} x'(t)_{N,1} &= \bm{A}_{N,N} \: x(t)_{N,1} + \bm{B}_{N,1} \: u(t) \\ y(t) &= \bm{C}_{1,N} \: x(t)_{N,1} + \bm{D}_{1,1} \: u(t) \end{aligned} \end{equation}\]

The state parameters: A (mat), B (vec), C (vec), D (scalar) are learned. I wrote the shapes of the parameters as subscripts, to make it more readable. In general, A,B,C,D can be time dependent. Here we assume they are learned but do not depend on time, so then the model is called time-invariant state space model.

Exact solution to the differential equation of x’

Let’s guess a solution to the differential equation of \(x'\):

\[\begin{equation} \label{eq:2} \begin{aligned} x(t) = e^{\bm{A}t}x(0) + \int_{0}^{t} e^{\bm{A}(t-\tau)}\bm{B}u(\tau) \, d\tau \end{aligned} \end{equation}\]

Let differentiate by t, using Leibniz Formula of integral derivatives, and see if the guess works. Here’s the Leibniz Formula:

\[\begin{equation} \frac{d}{dx} \left (\int_{a(x)}^{b(x)}f(x,t)\,dt \right) = f\big(x,b(x)\big)\cdot \frac{d}{dx} b(x) - f\big(x,a(x)\big)\cdot \frac{d}{dx} a(x) + \int_{a(x)}^{b(x)}\frac{\partial}{\partial x} f(x,t) \,dt \end{equation}\]

Now let’s use it: \(\begin{equation} \begin{aligned} x'(t) = \bm{A}e^{\bm{A}t}x(0) + e^{\bm{A}(t-t)}\bm{B}u(t) + \bm{A} \int_{0}^{t} e^{\bm{A}(t-\tau)}\bm{B}u(\tau) \, d\tau \end{aligned} \end{equation}\)

\[\begin{equation} \begin{aligned} x'(t) = \bm{A}x(t) + \bm{B}u(t) \end{aligned} \end{equation}\]

And we’re back to the original equation, so our guessed solution is right. \(\square\)

Discrete Time Invariant State Space Models

We now want to convert the continuous explicit equation of x(t), to a discretized equation, since we usually get the input in a sampled format, every \(\Delta\) step size.

\[\begin{equation} \begin{aligned} x(t) = e^{\bm{A}t}x(0) + \int_{0}^{t} e^{\bm{A}(t-\tau)}\bm{B}u(\tau) \, d\tau \end{aligned} \end{equation}\] \[\begin{equation} \begin{aligned} x(t+\Delta) = e^{\bm{A}(t+\Delta)}x(0) + \int_{0}^{t+\Delta} e^{\bm{A}(t+\Delta-\tau)}\bm{B}u(\tau) \, d\tau \end{aligned} \end{equation}\] \[\begin{equation} \begin{aligned} x(t+\Delta) = e^{\bm{A}\Delta} e^{\bm{A}t}x(0) + e^{\bm{A}\Delta} \int_{0}^{t+\Delta} e^{\bm{A}(t-\tau)}\bm{B}u(\tau) \, d\tau \end{aligned} \end{equation}\] \[\begin{equation} \begin{aligned} x(t+\Delta) = e^{\bm{A}\Delta}x(t) + e^{\bm{A}\Delta} \int_{t}^{t+\Delta} e^{\bm{A}(t-\tau)}\bm{B}u(\tau) \, d\tau \end{aligned} \end{equation}\]

Trapezoid:

\[\begin{equation} \begin{aligned} x(t+\Delta) = e^{\bm{A}\Delta}x(t) + e^{\bm{A} \Delta} \frac{\Delta}{2} ( e^{-\bm{A}\Delta}\bm{B}u(t+\Delta) + e^{\bm{A}(t-t)}\bm{B}u(t) ) \end{aligned} \end{equation}\] \[\begin{equation} \begin{aligned} x(t+\Delta) = e^{\bm{A}\Delta}x(t) + e^{\bm{A} \Delta} \frac{\Delta}{2} ( e^{-\bm{A}\Delta}\bm{B}u(t+\Delta) +\bm{B}u(t) ) \end{aligned} \end{equation}\]

Assume \(u(t+\Delta) \approx u(t)\), because the \(\Delta\) in the exponent is much more significant

\[\begin{equation} \begin{aligned} x(t+\Delta) = e^{\bm{A}\Delta}x(t) + e^{\bm{A} \Delta} \frac{\Delta}{2} ( e^{-\bm{A}\Delta}+1 )\bm{B}u(t) \end{aligned} \end{equation}\] \[\begin{equation} \begin{aligned} x(t+\Delta) = e^{\bm{A}\Delta}x(t) + \frac{\Delta}{2} ( 1+ e^{\bm{A}\Delta} )\bm{B}u(t) \end{aligned} \end{equation}\]

Define \(\bar{A}= e^{\bm{A}\Delta}\) and \(\bar{B} =\frac{\Delta}{2} ( 1+ e^{\bm{A}\Delta} )\bm{B}\) and we get to nice discrete formula:

\[\begin{equation} \begin{aligned} x(t+\Delta) = \bar{A}x(t) + \bar{B} u(t) \end{aligned} \end{equation}\]

In order to compute \(\bar{A} = e^{\bm{A}\Delta}\) we need its Taylor expansion. Define \(Z= \bm{A}\Delta\):

\[\begin{equation} \begin{aligned} \bar{A} = e^z = \frac{e^{Z/2}}{e^{-Z/2}} \approx \frac{I+Z/2 + ...}{I-Z/2 +...} = (I+Z/2)(I-Z/2)^{-1} = (I+\Delta/2 \cdot \bm{A})(I-\Delta/2 \cdot \bm{A})^{-1} \end{aligned} \end{equation}\]

Let’s compute \(\bar{B}\)

\[\begin{equation} \begin{aligned} \bar{B} = \frac{\Delta}{2} ( 1+ \frac{e^{Z/2}}{e^{-Z/2}} )\bm{B} = \frac{\Delta}{2} ( \frac{e^{-Z/2} + e^{Z/2}}{e^{-Z/2}} )\bm{B} \approx \frac{\Delta}{2} ( \frac{1-Z/2 +.. + 1 + Z/2 + ...}{1-Z/2+...} )\bm{B} \end{aligned} \end{equation}\] \[\begin{equation} \begin{aligned} \bar{B} =\frac{\Delta}{2} ( \frac{2}{1-Z/2} )\bm{B} = (I-\Delta/2\cdot\bm{A})^{-1}\Delta\bm{B} \end{aligned} \end{equation}\]

So know we know how to compute \(\bar{A},\bar{B}\) directly from \(A,B\):

\[\begin{equation} \begin{aligned} x_k = \bar{A}x_{k-1} + \bar{B}u_{k-1} \newline y_k = C x_k + D u_{k-1} \end{aligned} \end{equation}\]

And this is the recurrent representation, using the bilinear transformation Padé approximant (fraction), which often gives better approximation of the function than truncating its Taylor series, but there are other, alternative ways to compute \(\bar {A}\), such as Euler’s method:

\[\begin{equation} \begin{aligned} \bar{A}=e^{A\Delta} = I + A\Delta + (A\Delta)^2/2 + ... \end{aligned} \end{equation}\]

The Convolutional Representation

If we assume that the initial state \(x_0 = 0\) and \(D=0\), then we can unroll \(y_k\):

\[\begin{equation} \begin{aligned} y_k = C_{1,N}\bar{A_{N,N}}^k \bar{B_{N,1}} u_0 + C\bar{A}^{k-1}\bar{B}u_1 + ... + C\bar{B}u_k \end{aligned} \end{equation}\]

I added the shape size for better readability. Recall the general formula of a convolution with a kernel:

\[\begin{equation} \begin{aligned} (K*u)(k) := \int_{-\infty}^{+\infty} K(\tau)u(k-\tau)d\tau \end{aligned} \end{equation}\]

So, we can express it as a convolution of \(u\) with a kernel \(\bar{K}\):

\[\begin{equation} \begin{aligned} y_k = (C\bar{B},C\bar{A}\bar{B},...,C\bar{A}^{k}\bar{B}) * u = (\bar{K} * u)_k \end{aligned} \end{equation}\]