Discretization process for State Space Model of Mamba

## Definition

In Mamba, we have the formula:

$$
h'(t)=Ah(t) + Bx(t)
$$

Here, $A, B$ represent the parametric matrix, and $h$ represents the hidden state, and $x$ represents the input.

It is basically from control theory, which is originally:

$x'(t) = Ax(t) + Bu(t)$

Here we inherit the initial statement of the state space theory, where $x$ represent the hidden state, and $u$ represents the input at this time step.

## Motivation

By observation, we will realize that:

$x'(t) = Ax(t)$

which reminds us the $(e^x)' = e^x$.

This is where the discretization process from.

Move the $Ax(t)$ to the LHS,

$$
x'(t) - Ax(t) = Bu(t)
$$

Let it be:

$e^{-At} x'(t) - e^{-At}x(t) = e^{-At} Bu(t)$

Note that $F(t) = e^{-At}x(t)$, then $F'(t) = -Ae^{-At}x(t) + e^{-At} x'(t)$

Also we know that $F'(t) = e^{-At}Bu(t)$

Through the integration, we can get a formular which contains both $u$ and $x$:

$$
\forall \lambda \in (-\infty, +\infty), F(t) = F(\lambda) + \int_{\lambda}^{t} F'(\tau) \mathrm{d}\tau
$$

Specificly, for the convinience, let $\lambda = 0$, substitute $F(t) = e^{-At}x(t)$ and $F'(t)$ into the above equation

$$
e^{-At}x(t) = x(0) + \int_{0}^{t} -Ae^{-A\tau}x(\tau) + e^{-A\tau}x(\tau) \mathrm{d} \tau
$$

It also equals to:

$$
e^{-At}x(t) = x(0) + \int_{0}^{t} e^{-A\tau} Bu(\tau) \mathrm{d} \tau
$$

Divide both sides by $e^{-At}$

$$
x(t) = e^{At}x(0) + e^{At}\int_{0}^{t} e^{A\tau} Bu(\tau) \mathrm{d} \tau
$$

Generally,

$$
x(t_k)  = e^{At_k} x(0) + e^{At_k} \int_{0}^{t_k} e^{-A\tau} Bu(\tau) \mathrm{d} \tau
$$

To get the form like $x(t_{k+1}) = ?? x(t_k)$

$x(t_{k+1}) = e^{A(t_k + (t_{k+1}-t_k))}x(0) + e^{A(t_k + (t_{k+1} - t_k))} \int_{0}^{t_{k+1}} e^{-A\tau} Bu(\tau) \mathrm{d} \tau$

Get the simplication,

$$
x(t_{k+1}) = e^{A(t_{k+1} - t_k)} [e^{At_k}x(0) + e^{At_k} \int_{0}^{t_k} -e^{A\tau} Bu(\tau) \mathrm{d}\tau] + e^{At_{k+1}}  \int_{t_k}^{t_{k+1}} e^{-A\tau} bu(\tau) \mathrm{d} \tau
$$

Notice that, the term in $[]$ equals to $x(t_k)$. Hence, it is equivalent to:

$$
x(t_{k+1}) = e^{t_{k+1} - t_k} x(t_k) + \int_{t_k}^{t_{k+1}} e^{A(t_{k+1} - \tau)} bu(\tau) \mathrm{d} \tau
$$

## Zero-Order Holder

Here we introduce the Zero-Order Holder. Superficially, we only focus the key step from which is zero-order holder, rather than detailed zero-order holder theory.

Note the $T = t_{k+1} - t_k$. Let $T \to 0$

By zero-order holder, we regard $u(\tau)$ as $u(t_k)$. Therefore,

$$
x(t_{k+1}) = e^{AT}x(t_k) + \int_{t_k}^{t_{k+1}} e^{A(t_{k+1} - \tau)} Bu(t_k) \mathrm{d} \tau
$$

The second term of RHS is equivalent to:

$$
e^{At_{k+1}} Bu(t_k) \int_{t_k}^{t_{k+1}} e^{-A\tau} \mathrm{d} \tau =e^{At_{k+1}} Bu(t_k) [ -\frac{1}{A} (e^{-At_{k+1}} - e^{-At_k})]
$$

Get the simplication:

$$
\Rightarrow Bu(t_k) \times \frac{1}{A} [e^{A(t_{k+1} - t_k)}- 1]
$$

Substitute back to the above equation:

$$
x(t_{k+1}) = e^{AT}x(t_k) + Bu(t_k) \times \frac{1}{A} [e^{AT} - I]
$$

We substitute $1$ by the identity matrix $I$, because all of our computations at here are based on matrix operations.

Notice that $T \to 0$ is equivalent to a standard infinitesimal $\Delta$. Therefore, we substitute all the $T \to 0$ by $\Delta$

$$
x(t_{k+1}) = e^{A\Delta}x(t_k) + Bu(t_k) \cdot \frac{\Delta (e^{AT} - I)}{\Delta A}
$$

$$
x(t_{k+1}) = e^{A\Delta} x(t_k) + \Delta Bu(t_k) (e^{AT} - I) (\Delta A)^{-1}
$$

## Result

Don't forget that our target is that $x(t_{k+1}) = \overline{A} x(t_k) + \overline{B} u(t_k)$, where $\overline{A} $ and $\overline{B}$ are both parametric matrices.

Here we can get

$$
\overline{A} = e^{\Delta A}
$$

$$
\overline{B} = (\Delta A)^{-1} (e^{\Delta A} - I) \Delta B
$$

However, $h(t_{k+1}) = \overline{A} h(t_k) + \overline{B} x(t_{k+1})$. This is because of the definition of $h$, where $h_0 = \overline{B} x_0$

In the control theory, $x_1$ is from $x_0$ and $u_0$

In the SSM, $h_1$ is from hidden state $h_0$. Because the input $x_0$ corresponding $u_0$ which corresponds to hidden state $x_0$ in the control theory, $h_1$ is from input $x_1$.