Minimalist's Kalman Filter Derivation, Part I

Break Yang August 03, 2020 #robotics #control #statistics

Motivation

State estimation has many applications in general robotics, for example autonomous driving localization and environment prediction. Kalman filter is a classical yet powerful algorithm that tackles such problem beautifully. Although there are already many articles, textbooks and papers on how to derive the algorithm, I found most of them too heavy on the theoretical side and might be hard for a first-time learner who comes from an engineering background to follow. Therefore, I would shamelessly attempt to fill this hole with a series of posts.

This is going to be the first post of the series that only focuses on the 1 dimensional case. Future posts will talk about the multi-variate version of Kalman filter.

Spoiler: there will be a lot of math equations. But rest assured, nothing will exceed the level of basical calculus arithmetics.

Single Step Prediction

Let's say we have a one dimensional linear Markovian System, whose transition function is know. This means

  1. The state of the system can be represented as a single scalar. Let's denote it as xx.

  2. The transition function is a linear function, where the next state only depends on the current state. Therefore we can write the transition function as

    xt+1=axt x_{t+1} = a \cdot x_t

Suppose you have an estimation of xtx_t in the form of a Gaussian distribution

xtN(x^t,σt2) x_t \sim N(\hat{x}_t, \sigma_t^2)

Based on that, what is your best-effort guess about the next state xt+1x_{t+1}? First of all, the reason that we can make such a prediction of the next state is because the transition function actually reveals the relationship between xtx_t and xt+1x_{t+1}, which happens to be linear in this case. I know that intuitively, you would guess the answer immediately:

xt+1N(ax^t,a2σt2) x_{t+1} \sim N(a\hat{x}_t, a^2\sigma_t^2)

And that is the correct answer. But how do you prove that? Or more generally, if we have a random variable XN(μ,σ2)X \sim N(\mu, \sigma^2), can we prove that another random variable that satisfies Y=aXY = aX actually follows the distribution YN(aμ,a2σ2)Y \sim N(a\mu, a^2\sigma^2)?

Let

ϕ(x)=12πe12x2 \phi(x) = \frac{1}{\sqrt{2\pi}} \cdot e^{-\frac{1}{2}x^2}

be the pdf (probability density function) of a standard gaussian distribution N(0,1)N(0, 1). It is easy to derive that the pdf of a general gaussian distribution N(μ,σ2)N(\mu, \sigma^2) that XX follows would be

fX(x)=12πσe12σ2(xμ)2=1σϕ(xμσ) f_X(x) = \frac{1}{\sqrt{2\pi}\sigma} \cdot e^{-\frac{1}{2\sigma^2}(x-\mu)^2} = \frac{1}{\sigma}\phi \left( \frac{x - \mu}{\sigma} \right)

Using the trick called differential form, the probability of XX taking a specific value xx is1

P[X=x]=fX(x)dx=1σϕ(xμσ)dx \mathbb{P} [X = x] = f_X(x) \mathrm{d}x = \frac{1}{\sigma}\phi \left( \frac{x - \mu}{\sigma} \right) \mathrm{d} x

Okay, so what does pdf of YY (i.e. P(Y=y)\mathbb{P}(Y = y)) look like? It turns out that we can easily derive that with a bit transformation:

P[Y=y]=P[X=ya] =fX(ya)dx =fX(ya)dya =1σϕ(yaμσ)dya =1aσϕ(yaμaσ)dy \begin{darray}{rcl} \mathbb{P} \left[ Y = y \right] &=& \mathbb{P} \left[ X = \frac{y}{a} \right] \\\ &=& f_X \left(\frac{y}{a}\right) \mathrm{d}x \\\ &=& f_X \left(\frac{y}{a}\right) \frac{\mathrm{d}y}{a} \\\ &=& \frac{1}{\sigma}\phi \left( \frac{\frac{y}{a} - \mu}{\sigma} \right) \frac{\mathrm{d}y}{a} \\\ &=& \frac{1}{a\sigma}\phi \left( \frac{y - a\mu}{a\sigma} \right) \mathrm{d}y \end{darray}

This basically shows that YY's pdf is nothing but the pdf of N(aμ,a2σ2)N(a\mu, a^2\sigma^2), hence conclude the proof.

1

An intuitve perspective here that helps understanding that is fX(x)dxf_X(x) \mathrm{d}x is actually the area, whose fundamental unit is probability!

The above proved conclusion enables us to solve the prediction problem at the very beginning of this section,

based on xt+1=axt and xtN(x^t,σt2) \textrm{based on } x_{t+1} = a \cdot x_t \textrm{ and } x_t \sim N(\hat{x}_t, \sigma_t^2)

we can predict xt+1N(ax^t,a2σt2) \textrm{we can predict } x_{t+1} \sim N(a\hat{x}_t, a^2\sigma_t^2)

equivalently, this means we can obtain estimation of xt+1x_{t+1} (without observing it) as

{x^t+1=ax^tσt+1=a2σt2 \begin{cases} \hat{x}_{t+1} &=& a\hat{x}_t \\ \sigma_{t+1} &=& a^2 \sigma_t^2 \end{cases}

Uncertainty in Transition Function

Now it is time to introduce one variation on top of the simplest case we discussed in the previous section. In reality the transition is usually not perfect, which means that there is an error associated with it. Mathematically, it means

xt+1=axt+et x_{t+1} = a \cdot x_t + e_t

As usual for simplicity we assume the error is a zero-mean random variable follows a Gaussian distribution, i.e.

etN(0,σet2) e_t \sim N \left(0, \sigma_{e_t}^2 \right)

How should we revise our prediction under such condition? Remember we are still solving the following question - if we already have an estimation of xtx_t as

xtN(x^t,σt2) x_t \sim N(\hat{x}_t, \sigma_t^2)

what is a good estimation of xt+1x_{t+1}, given that we know (although not precisely in this case) the transition function?

Here we are going to introduce another useful idea - the generative model. The generative model basically describes the procedure to get a sample value of a random variable. In this particular case, the generative model of xt+1x_{t+1} consits of:

  1. Sample axta \cdot x_t out of the distribution N(ax^t,a2σt2)N(a\hat{x}_t, a^2\sigma_t^2) (Note that this is the conclusion from the previous section)
  2. Sample ete_t out of the distribution N(0,σet2)N(0, \sigma_{e_t}^2)
  3. Construct xt+1x_{t+1} by adding the two sampled values up

So you can see that the generative model is basically an interpretation of the problem formulation, provides no new knowledge at all. However, with such interpretation it is clear to see that xt+1x_{t+1} as a random variable is basically the sum of two independently distributed gaussian random variables!

I will cheat here by referring to the generating function based proof from wikipedia. As you have probably guessed, the conclusion is that

if i.i.d {XN(μX,σX2)YN(μY,σY2) then Z=X+YN(μX+μY,σX2+σY2) \textrm{if i.i.d } \begin{cases} X &\sim N(\mu_X, \sigma_X^2) \\ Y &\sim N(\mu_Y, \sigma_Y^2) \end{cases} \textrm{ then } Z = X + Y \sim N(\mu_X + \mu_Y, \sigma_X^2 + \sigma_Y^2)

By plugging in our generative model, we obtain

xt+1N(ax^t,a2σt2+σet2) x_{t+1} \sim N(a\hat{x}_t, a^2\sigma_t^2 + \sigma_{e_t}^2)

which means such good prediction would be

{x^t+1=ax^tσt+1=a2σt2+σet2 \begin{cases} \hat{x}_{t+1} &=& a\hat{x}_t \\ \sigma_{t+1} &=& a^2 \sigma_t^2 + \sigma_{e_t}^2 \end{cases}

Yep, just add the variance of the error to the estimation of variance. Pretty simple, right?

Let There Be Observations

So we know how to predict xt+1x_{t+1} given xtx_t, which is great. This means that if we happen to know the initial state of the system, x0x_0, we can start to predict x1x_1, and then x2x_2, ..., till any xtx_t, which will be

{x^t=atx^tσt=a(a(a)+σet22)+σet12 \begin{cases} \hat{x}_t &=& a^t\hat{x}_t \\ \sigma_t &=& a \cdot (a \cdot ( a \cdots) + \sigma_{e_{t-2}}^2) + \sigma_{e_{t-1}}^2 \end{cases}

As we can see, there is one fatal problem in the above prediction. As tt grows, our estimation will be less precise because the variance is going to grow very quickly. This is because ecah time we make prediction for one more step, the variance of error will be added to it. To see it more clearly, when a=1a=1, we will have

σt=σe02+σe12++σet12 \sigma_t = \sigma_{e_0}^2 + \sigma_{e_1}^2 + \cdots + \sigma_{e_{t-1}}^2

It is easy to understand this error accumulation intuitively. As we make more predictions, we are not getting new information about the system. Think about it - if you only know what a cat looks like when it is 1-week old, how can you precisely predict how it looks like when he is 3 years old? If you only know your weight before COVID-19 keeping us at home, how do you precisely estimate your current weight? The key here is that you need constant feedback to guide your estimation when it gets

Okay let's take one more step in the problem formulation, so that it will be more realistic. Now we are allowed to take measurement of the system via an observation function. In the weight example, this translates to you are allowed to weigh yourself with an electric scale every now and then. It seems that with the method to take measurement, we do not need to estimate the state anymore, we can simply observe the readings and get the precise value! Except for that in reality the measurement is usually not always accurate. Therefore, the observation function has an associated error as well. Assuming a linear observation function, we will have the readings mathematically as:

yt=htxt+rt, where rtN(0,σrt) y_t = h_t \cdot x_t + r_t, \textrm{ where } r_t \sim N \left(0, \sigma_{r_t} \right)

Note that when you take measurement at time t+1t + 1, you can directly observe yt+1y_{t+1} (though yt+1y_{t+1} is not xt+1x_{t+1}, and the latter is what we want to estimate). The question now becomes: how to make a good estimation about xt+1x_{t+1}, given

Let's first find the generative model interpretation of this. We can see that yt+1y_{t+1} is generated in the following 3 steps:

  1. Sample xt+1=axt+etx_{t+1} = a \cdot x_t + e_t out of the distribution N(ax^t,a2σt2+σet2)N(a\hat{x}_t, a^2\sigma_t^2 + \sigma_{e_t}^2) (Note that this is the conclusion from the previous section)
  2. Sample rt+1r_{t+1} out of the distribution N(0,σrt+12)N(0, \sigma_{r_{t+1}}^2)
  3. Directly compute yt+1=ht+1xt+1+rt+1y_{t+1} = h_{t+1} \cdot x_{t+1} + r_{t+1}

Let's stop for a while to take a closer look at the above generative model. The distribution N(ax^t,a2σt2+σet2)N(a\hat{x}_t, a^2\sigma_t^2 + \sigma_{e_t}^2) comes from the conclusion of the previous section, which represents the best2 estimation of xt+1x_{t+1} we can get based pure prediction. The mean and variance of this distribution will be used quite a lot in the derivation below, so it is good to give it some name. Let

{xt+1=ax^tσt+12=a2σt2+σet2 \begin{cases} x'_{t+1} &=& a\hat{x}_t \\ \sigma'^2_{t+1} &=& a^2 \sigma_t^2 + \sigma_{e_t}^2 \end{cases}

Note that both xt+1x'_{t+1} and sigmat+1sigma'_{t+1} are determinsitic values, i.e. neither of them is random variable.

2

I am being informal here as we haven't formally defined what the best estimation means. We will likely get to this topic in the future, so bear with me for now.

Although the above generative model is about generating yt+1y_{t+1}, it is xt+1x_{t+1} that we actually want to estimate. We can do this by deriving the pdf of xt+1x_{t+1}. The following derivation will likely seem very tedious, but I will try to be clear on each step and trust me, this will be the last challenge in this post!

Given that we have observed yt+1=yy_{t+1} = y, what is the probability of xt+1=xx_{t+1} = x? Such probability can be written as

x,P[xt+1=xyt+1=y]=fxt+1(x)dx \forall x, \mathbb{P} [ x_{t+1} = x \mid y_{t+1} = y] = f_{x_{t+1}}(x) \mathrm{d}x

where fxt+1(x)f_{x_{t+1}}(x) is the unknown (yet) pdf of xt+1x_{t+1} that we want to derive. Also note that there is x\forall x in the statement, which is very important. It means that the equation holds for every single xx.

By applying Bayes's law, the left hand side can also be transformed as

x,P[xt+1=xyt+1=y]=P[yt+1=yxt+1=x]P[xt+1=x]P[yt+1=y]=P[rt+1=yht+1x]P[xt+1=x]P[yt+1=y] \begin{darray}{rcl} \forall x, \mathbb{P} [ x_{t+1} = x \mid y_{t+1} = y] &=& \frac{\mathbb{P}[y_{t+1}=y \mid x_{t+1} = x] \mathbb{P}[x_{t+1} = x]}{\mathbb{P}[y_{t+1} = y]} \\ &=& \frac{\mathbb{P}[r_{t+1} = y - h_{t+1}x] \mathbb{P}[x_{t+1} = x]}{\mathbb{P}[y_{t+1} = y]} \end{darray}

So there are 3 items on the right hand side. Let's crack them one by one.

The simplest one here is P[yt+1=y]\mathbb{P}[y_{t+1} = y]. Since it does not depend on xx, this can be just written as

P[yt+1=y]=Constdy \mathbb{P}[y_{t+1} = y] = \mathrm{Const} \cdot dy

Next comes P[xt+1=x]\mathbb{P}[x_{t+1} = x], without conditioning on the value of yt+1y_{t+1}. This is the pure prediction we discussed above, which N(xt+1,σt+12) \sim N(x'_{t+1}, \sigma'^2_{t+1}). Therefore it is simply

P[xt+1=x]=Constexp((xxt+1)22σt+12)dx \mathbb{P}[x_{t+1} = x] = \mathrm{Const} \cdot \exp\left(-\frac{(x-x'_{t+1})^2}{2\sigma'^2_{t+1}}\right) \mathrm{d} x

The last one P[rt+1=yht+1x]\mathbb{P}[r_{t+1} = y - h_{t+1}x] is about rt+1r_{t+1}, which happens to be following a Gaussian distirbution as well (even better, the mean is zero)! This means

P[rt+1=yht+1x]=Constexp((yht+1x)22σrt+12)dr=Constexp((yht+1x)22σrt+12)(dyht+1dx) \begin{darray}{rcl} \mathbb{P}[r_{t+1} = y - h_{t+1}x] &=& \mathrm{Const} \cdot \exp \left( -\frac{(y - h_{t+1}x)^2}{2\sigma^2_{r_{t+1}}} \right) \mathrm{d}r \\ &=& \mathrm{Const} \cdot \exp \left( -\frac{(y - h_{t+1}x)^2}{2\sigma^2_{r_{t+1}}} \right) (\mathrm{d}y - h_{t+1}\mathrm{d}x) \end{darray}

Note dr\mathrm{d}r can be written as the form above because of differential form arithmetics. It is good to understand the rules behind them, but when you get familiar with the rules, they are just no more strange than the rules you use to take derivatives.

Therefore, take the above 3 expanded components and plug them back, and keep in mind that by differential form rule dxdx=0\mathrm{d}x \wedge \mathrm{d}x = 0, we have

x,fxt+1(x)dx=P[xt+1=xyt+1=y]=Constexp((xxt+1)22σt+12(yht+1x)22σrt+12)(dxdyht+1dxdx)Constdy=Constexp((xxt+1)22σt+12(yht+1x)22σrt+12)dx \begin{darray}{rcl} \forall x, && f_{x_{t+1}}(x) \mathrm{d}x \\ &=& \mathbb{P} [ x_{t+1} = x \mid y_{t+1} = y] \\ &=& \frac{\mathrm{Const} \cdot \exp \left( -\frac{(x-x'_{t+1})^2}{2\sigma'^2_{t+1}} -\frac{(y - h_{t+1}x)^2}{2\sigma^2_{r_{t+1}}} \right) (\mathrm{d}x \wedge \mathrm{d} y - h_{t+1} \mathrm{d}x \wedge \mathrm{d}x)} {\mathrm{Const} \cdot dy} \\ &=& \mathrm{Const} \cdot \exp \left( -\frac{(x-x'_{t+1})^2}{2\sigma'^2_{t+1}} -\frac{(y - h_{t+1}x)^2}{2\sigma^2_{r_{t+1}}} \right) \mathrm{d} x \end{darray}

Let's then take a closer look at the terms inside exp()\exp()

(xxt+1)22σt+12(yht+1x)22σrt+12=(σrt+12+ht+12σt+12)x22(σrt+12xt+1+σt+12ht+1y)x+Const2σt+12σrt+12=12x22σrt+12xt+1+σt+12ht+1yσrt+12+ht+12σt+12xσt+12σrt+12σrt+12+ht+12σt+12+Const=12(xσrt+12xt+1+σt+12ht+1yσrt+12+ht+12σt+12)2σt+12σrt+12σrt+12+ht+12σt+12+Const \begin{darray}{rcl} &&-\frac{(x-x'_{t+1})^2}{2\sigma'^2_{t+1}} -\frac{(y - h_{t+1}x)^2}{2\sigma^2_{r_{t+1}}} \\ &=& -\frac{(\sigma^2_{r_{t+1}} + h^2_{t+1}\sigma'^2_{t+1})x^2 - 2(\sigma^2_{r_{t+1}} x'_{t+1} + \sigma'^2_{t+1}h_{t+1}y)x + \mathrm{Const}} {2\sigma'^2_{t+1}\sigma^2_{r_{t+1}}} \\ &=& -\frac{1}{2} \frac{x^2 - 2\frac{\sigma^2_{r_{t+1}} x'_{t+1} + \sigma'^2_{t+1}h_{t+1}y}{\sigma^2_{r_{t+1}} + h^2_{t+1}\sigma'^2_{t+1}}x} {\frac{\sigma'^2_{t+1}\sigma^2_{r_{t+1}}}{\sigma^2_{r_{t+1}} + h^2_{t+1}\sigma'^2_{t+1}}} + \mathrm{Const} \\ &=& - \frac{1}{2}\frac{\left(x - \frac{\sigma^2_{r_{t+1}} x'_{t+1} + \sigma'^2_{t+1}h_{t+1}y}{\sigma^2_{r_{t+1}} + h^2_{t+1}\sigma'^2_{t+1}} \right)^2} {\frac{\sigma'^2_{t+1}\sigma^2_{r_{t+1}}}{\sigma^2_{r_{t+1}} + h^2_{t+1}\sigma'^2_{t+1}}} + \mathrm{Const} \end{darray}

Plug this back in the above equation we have

x,fxt+1(x)dx=P[xt+1=xyt+1=y]=Constexp(12(xσrt+12xt+1+σt+12ht+1yσrt+12+ht+12σt+12)2σt+12σrt+12σrt+12+ht+12σt+12)dx \begin{darray}{rcl} \forall x, f_{x_{t+1}}(x) \mathrm{d}x &=& \mathbb{P} [ x_{t+1} = x \mid y_{t+1} = y] \\ &=& \mathrm{Const} \cdot \exp \left( - \frac{1}{2}\frac{\left(x - \frac{\sigma^2_{r_{t+1}} x'_{t+1} + \sigma'^2_{t+1}h_{t+1}y}{\sigma^2_{r_{t+1}} + h^2_{t+1}\sigma'^2_{t+1}} \right)^2} {\frac{\sigma'^2_{t+1}\sigma^2_{r_{t+1}}}{\sigma^2_{r_{t+1}} + h^2_{t+1}\sigma'^2_{t+1}}} \right) \mathrm{d} x \end{darray}

Let's remove dx\mathrm{d}x from both side, and we have

x,fxt+1(x)==Constexp(12(xσrt+12xt+1+σt+12ht+1yt+1σrt+12+ht+12σt+12)2σt+12σrt+12σrt+12+ht+12σt+12) \forall x, f_{x_{t+1}}(x) = = \mathrm{Const} \cdot \exp \left( - \frac{1}{2} \frac{\left(x - \frac{\sigma^2_{r_{t+1}} x'_{t+1} + \sigma'^2_{t+1}h_{t+1}y_{t+1}}{\sigma^2_{r_{t+1}} + h^2_{t+1}\sigma'^2_{t+1}} \right)^2} {\frac{\sigma'^2_{t+1}\sigma^2_{r_{t+1}}}{\sigma^2_{r_{t+1}} + h^2_{t+1}\sigma'^2_{t+1}}} \right)

Note that since yy is basically the value of yt+1y_{t+1}, it is replaced with yt+1y_{t+1}.

This means that xt+1x_{t+1} follows Gaussian distirbution! We can even directly tell what is the mean and what is the variance of the estimation from the above formula, i.e.

{x^t+1=σrt+12xt+1+σt+12ht+1yt+1σrt+12+ht+12σt+12σt+12=σt+12σrt+12σrt+12+ht+12σt+12 \begin{cases} \hat{x}_{t+1} &=& \frac{\sigma^2_{r_{t+1}} x'_{t+1} + \sigma'^2_{t+1}h_{t+1}y_{t+1}}{\sigma^2_{r_{t+1}} + h^2_{t+1}\sigma'^2_{t+1}} \\ \sigma^2_{t+1} &=& \frac{\sigma'^2_{t+1}\sigma^2_{r_{t+1}}}{\sigma^2_{r_{t+1}} + h^2_{t+1}\sigma'^2_{t+1}} \end{cases}

Making Sense of the Result

The above answer still looks very complicated, and let me try to interpret it in a more intuitive way in this section.

Let's start with the mean

x^t+1=σrt+12xt+1+σt+12ht+1yt+1σrt+12+ht+12σt+12=σrt+12σrt+12+ht+12σt+12xt+1+σt+12ht+1σrt+12+ht+12σt+12yt+1=σrt+12σrt+12+ht+12σt+12xt+1+σt+12ht+12σrt+12+ht+12σt+12yt+1ht+1=Kxt+1+(1K)yt+1ht+1 \begin{darray}{rcl} \hat{x}_{t+1} &=& \frac{\sigma^2_{r_{t+1}} x'_{t+1} + \sigma'^2_{t+1}h_{t+1}y_{t+1}}{\sigma^2_{r_{t+1}} + h^2_{t+1}\sigma'^2_{t+1}} \\ &=& \frac{\sigma^2_{r_{t+1}}}{\sigma^2_{r_{t+1}} + h^2_{t+1}\sigma'^2_{t+1}} \cdot x'_{t+1} + \frac{\sigma'^2_{t+1}h_{t+1}}{\sigma^2_{r_{t+1}} + h^2_{t+1}\sigma'^2_{t+1}} \cdot y_{t+1} \\ &=& \frac{\sigma^2_{r_{t+1}}}{\sigma^2_{r_{t+1}} + h^2_{t+1}\sigma'^2_{t+1}} \cdot x'_{t+1} + \frac{\sigma'^2_{t+1}h^2_{t+1}}{\sigma^2_{r_{t+1}} + h^2_{t+1}\sigma'^2_{t+1}} \cdot \frac{y_{t+1}}{h_{t+1}} \\ &=& K \cdot x'_{t+1} + (1 - K) \cdot \frac{y_{t+1}}{h_{t+1}} \end{darray}

Note that in the above formula, we let

K=σrt+12σrt+12+ht+12σt+12 K = \frac{\sigma^2_{r_{t+1}}}{\sigma^2_{r_{t+1}} + h^2_{t+1}\sigma'^2_{t+1}}

KK is clearly a number between 00 and 11. This means that the mean of estimation of xt+1x_{t+1} is actually a weighted combination of xt+1x'_{t+1} and yt+1/ht+1y_{t+1} / h_{t+1}. It is worth noting that

  1. xt+1x'_{t+1} is the best guess you can have based on pure prediction
  2. yt+1/ht+1y_{t+1} / h_{t+1} is the best guess you can have based on pure observation

So this is basically about trusting both of those two evidences with a grain of salt. And how much you trust each of them depends on the variance of each guess. The bigger variance, the less trustworthy. Very reasonable, right?

What about the estimated variance of xt+1x_{t+1}? As we have defined KK, it can be written as

σt+12=Kσt+12 \sigma^2_{t+1} = K \cdot \sigma'^2_{t+1}

This is intuitively just updating the pure prediction based variance estimation as we have observation now. Note that becauese K<1K < 1, the final estimated variance is going to be smaller than the actual pure prediction based estimated variance!

So at this moment we can now summarize the procedure of Kalman Filter update, i.e. how to obtain t+1t+1-step estimation based on tt-step estimation and new observation.

Step I: Compute the pure prediction based estimation.

{xt+1=ax^tσt+12=a2σt2+σet2 \begin{cases} x'_{t+1} &=& a\hat{x}_t \\ \sigma'^2_{t+1} &=& a^2 \sigma_t^2 + \sigma_{e_t}^2 \end{cases}

Step II: Compute the combination weight KK, which is often called the Kalman Gain.

K=σrt+12σrt+12+ht+12σt+12 K = \frac{\sigma^2_{r_{t+1}}}{\sigma^2_{r_{t+1}} + h^2_{t+1}\sigma'^2_{t+1}}

Step III: Use Kalman Gain KK and observation yt+1y_{t+1} to update the pure prediction based estimation.

{x^t+1=Kxt+1+(1K)yt+1ht+1σt+12=Kσt+12 \begin{cases} \hat{x}_{t+1} &=& K \cdot x'_{t+1} + (1 - K) \cdot \frac{y_{t+1}}{h_{t+1}} \\ \sigma^2_{t+1} &=& K \cdot \sigma'^2_{t+1} \end{cases}

Summarry

This post demonstrated the derivation of 1D Kalman Filter, and also slightly touched the intuitive interpretation of it. Also, I think many of the techniques used here such as generative model and differential forms can find their applications in many other situations.

However, in reality, 1D Kalman Filter is rarely useful enough. This post should have prepared you for the next journey - multivariate Kalman Filter. Stay tuned!