Moving from Discrete-Time Diffusion to Continuous Time

The forward DDPM update for discrete time $t\in [0,T]$ is:

\begin{equation} x(t)=\sqrt{1-\beta(t)}x(t-1)+\sqrt{\beta(t)}\epsilon_t \end{equation}

where $0<\beta_t<1$.

We now change variables to obtain a continuous-time process on $\tau\in[0,1]$ instead of $t\in[0,T]$. Define $\tau:=\frac{t}{T}\rightarrow d\tau=\frac{1}{T}dt$, $x’(\tau):=x(\tau T)$, $\epsilon’(\tau):=N(0,I)$, and $\beta’(\tau)d\tau:=\beta(t)$. This scaling is important: as the number of steps increases, the per-step drift and noise must decrease, so $\beta$ is scaled by $d\tau$. With these definitions, the forward DDPM update becomes:

\begin{equation} x’(\tau)=\sqrt{1-\beta’(\tau)d\tau}x’(\tau-1/T)+\sqrt{\beta’(\tau)d\tau}\epsilon’(\tau) \end{equation}

Using a first-order Taylor approximation:

\begin{equation} x’(\tau)=(1-0.5\beta’(\tau)d\tau)x’(\tau-1/T)+\sqrt{\beta’(\tau)d\tau}\epsilon’(\tau) \end{equation}

\begin{equation} x’(\tau)-x’(\tau-1/T)=-0.5\beta’(\tau)d\tau x’(\tau-1/T)+\sqrt{\beta’(\tau)d\tau}\epsilon’(\tau) \end{equation}

Define $\Delta \tau:=1/T$ and take $\lim_{T\to\infty}$:

\begin{equation} dx’(\tau)=-0.5\beta’(\tau) x’(\tau) d\tau +\sqrt{\beta’(\tau)d \tau}\epsilon’(\tau) \end{equation}

If we define $dW(\tau):=\sqrt{d\tau}\epsilon’(\tau)$, remove apostrophes, and relabel $\tau$ as $t$, we obtain the Wiener/It\^o/Brownian form of the forward DDPM process: \begin{equation} dx(t)=-0.5\beta(t) x(t) dt +\sqrt{\beta(t)}dW(t) \end{equation}

where $t\in[0,1]$ and $\beta(t)$ is a time-dependent schedule that increases from near 0 toward 1 (according to the chosen schedule). \

Reverse Process

Define $f(x,t):=-0.5\beta(t)x(t)$ and $g(t):=\sqrt{\beta(t)}$. The forward process can then be written as: \begin{equation}dx(t)=f(x,t)dt + g(t)dW(t) \end{equation}
This form describes how to sample the next infinitesimal change in $x$, conditioned on the current state and time. Rewriting:

\begin{equation} x(t+dt)=x(t) + f(x,t)dt + g(t)dW(t) \end{equation} which implies:

\begin{equation} p(x(t+dt)| x(t)) = \mathcal{N}(x(t)+f(x,t)dt, g(t)^2 dt I) \end{equation}

Define the earlier step as $y:=x(t+dt)$ and the later step as $x:=x(t)$. Using $p(x|y)\propto p(y|x)p(x)$ and $f(x,t)\approx f(y,t)$, we obtain:

\begin{equation} p(y|x)\sim exp(-\frac{||y-x-f(y,t)dt||^2}{2g(t)^2 dt}) \end{equation}

Next, expand $\log p(x)$ around $y$:

\begin{equation} log p(x) = log p(y)+ (x-y)^T \nabla_x log p(y) \end{equation}

\begin{equation} log p(x|y) = log p(y|x) +log p(x) \end{equation}

Therefore:

\begin{equation} log p(x|y) = -\frac{||y-x-f(y,t)dt||^2}{2g(t)^2 dt} + (x-y)^T \nabla_x log p(y) + C \end{equation}

Define the score function $s:=\nabla_x \log p(y)$, and abbreviate $f:=f(y,t)$ and $g:=g(t)$:

\begin{equation} log p(x|y) = -\frac{||y-x-fdt||^2}{2g^2 dt} + (x-y)^T s + C \end{equation}

Expanding the square and ignoring higher-order small terms:

$(x-y+fdt)^T(x-y+fdt)=(x-y)^2+2(x-y)^Tfdt$

\begin{equation} log p(x|y) = -\frac{(x-y)^2+2(x-y)^Tfdt}{2g^2 dt} + (x-y)^T s + C \end{equation}

We now rewrite this in Gaussian form.

\begin{equation} log p(x|y) = -\frac{(x-y)^2+2(x-y)^Tfdt - (x-y)^T s 2g^2dt}{2g^2 dt} + C \end{equation}

\begin{equation} log p(x|y) = -\frac{(x-y)^2+2(x-y)^T(fdt-sg^2dt)}{2g^2 dt} + C \end{equation}

\begin{equation} log p(x|y) = -\frac{(x-y)^2+2(x-y)^T(fdt-sg^2dt)+(fdt-sg^2dt)^2}{2g^2 dt} + C’ \end{equation}

\begin{equation} log p(x|y) = -\frac{(x-(y-fdt+sg^2dt))^2}{2g^2 dt} + C’ \end{equation}

So we obtain the reverse-time transition:

\begin{equation} p(x|y) = N(y-fdt+sg^2dt, g^2dt) \end{equation}

This enables backward sampling if the score function $s:=\nabla_x \log p(y)$ is known. In practice, we train a neural network to estimate this score at each time step, and then sample backward using the expression above.

Learning the Score Function

From the standard DDPM parameterization:

\begin{equation} x_t=\alpha_t x_0 + \sigma_t \epsilon \rightarrow p(x_t|x_0) = N(\alpha_t x_0, \sigma_t^2 I) \end{equation}

\begin{equation} log p(x_t|x_0) = -\frac{||x_t-\alpha_t x_0||^2}{2\sigma_t^2} + C \rightarrow \nabla_{x_t} log p(x_t|x_0) = -\frac{x_t-\alpha_t x_0}{\sigma_t^2}=-\frac{\sigma_t \epsilon}{\sigma_t^2} \end{equation}

Therefore:

\begin{equation} \nabla_{x_t} log p(x_t|x_0) =-\frac{ \epsilon}{\sigma_t} \end{equation}

A network that predicts noise $\epsilon$ can therefore be directly rescaled to predict the score function.

Equivalence Between Conditional and Unconditional Scores

We start from: \begin{equation} p(x)=\int_{x_0} p(x|x_0)p(x_0)dx_0 \rightarrow \nabla_x p(x) = \int_{x_0} \nabla_x p(x_0|x) p(x_0) dx_0 \end{equation}

Dividing by $p(x)$ and using $\nabla_x \log p(x)=\frac{\nabla_x p(x)}{p(x)}$:

\begin{equation} \nabla_x log p(x) = \frac{\nabla_x p(x)}{p(x)} = \frac{1}{p(x)} \int_{x_0} p(x|x_0) \nabla_x log p(x|x_0) p(x_0) dx_0 \end{equation}

Hence:

\begin{equation} \nabla_x log p(x) = \int_{x_0} \frac{p(x|x_0) p(x_0)}{p(x)} \nabla_x log p(x|x_0) dx_0= \int_{x_0} p(x_0|x) \nabla_x log p(x|x_0) dx_0 \end{equation}

\begin{equation} \nabla_x log p(x) = E_{x_0|x} \nabla_x log p(x|x_0) \end{equation} This establishes the relationship between the unconditional score and the conditional score.

Solvers

Once we have the continuous-time reverse transition:

\begin{equation} p(x|y) = N(y-fdt+sg^2dt, g^2dt) \end{equation}

We start at $t=1$ from a fully noised image and integrate backward to $t=0$ to recover a sample. Any suitable SDE solver can be used, such as Euler-Maruyama or Heun.