Continuous Diffusion Process
Moving from Discrete-Time Diffusion to Continuous Time
The forward DDPM update for discrete time $t\in [0,T]$ is:
\begin{equation} x(t)=\sqrt{1-\beta(t)}x(t-1)+\sqrt{\beta(t)}\epsilon_t \end{equation}
where $0<\beta_t<1$.
We now change variables to obtain a continuous-time process on $\tau\in[0,1]$ instead of $t\in[0,T]$. Define $\tau:=\frac{t}{T}\rightarrow d\tau=\frac{1}{T}dt$, $x’(\tau):=x(\tau T)$, $\epsilon’(\tau):=N(0,I)$, and $\beta’(\tau)d\tau:=\beta(t)$. This scaling is important: as the number of steps increases, the per-step drift and noise must decrease, so $\beta$ is scaled by $d\tau$. With these definitions, the forward DDPM update becomes:
\begin{equation} x’(\tau)=\sqrt{1-\beta’(\tau)d\tau}x’(\tau-1/T)+\sqrt{\beta’(\tau)d\tau}\epsilon’(\tau) \end{equation}
Using a first-order Taylor approximation:
\begin{equation} x’(\tau)=(1-0.5\beta’(\tau)d\tau)x’(\tau-1/T)+\sqrt{\beta’(\tau)d\tau}\epsilon’(\tau) \end{equation}
\begin{equation} x’(\tau)-x’(\tau-1/T)=-0.5\beta’(\tau)d\tau x’(\tau-1/T)+\sqrt{\beta’(\tau)d\tau}\epsilon’(\tau) \end{equation}
Define $\Delta \tau:=1/T$ and take $\lim_{T\to\infty}$:
\begin{equation} dx’(\tau)=-0.5\beta’(\tau) x’(\tau) d\tau +\sqrt{\beta’(\tau)d \tau}\epsilon’(\tau) \end{equation}
If we define $dW(\tau):=\sqrt{d\tau}\epsilon’(\tau)$, remove apostrophes, and relabel $\tau$ as $t$, we obtain the Wiener/It\^o/Brownian form of the forward DDPM process: \begin{equation} dx(t)=-0.5\beta(t) x(t) dt +\sqrt{\beta(t)}dW(t) \end{equation}
where $t\in[0,1]$ and $\beta(t)$ is a time-dependent schedule that increases from near 0 toward 1 (according to the chosen schedule). \
Reverse Process
Define $f(x,t):=-0.5\beta(t)x(t)$ and $g(t):=\sqrt{\beta(t)}$. The forward process can then be written as:
\begin{equation}dx(t)=f(x,t)dt + g(t)dW(t)
\end{equation}
This form describes how to sample the next infinitesimal change in $x$, conditioned on the current state and time. Rewriting:
\begin{equation} x(t+dt)=x(t) + f(x,t)dt + g(t)dW(t) \end{equation} which implies:
\begin{equation} p(x(t+dt)| x(t)) = \mathcal{N}(x(t)+f(x,t)dt, g(t)^2 dt I) \end{equation}
Define the earlier step as $y:=x(t+dt)$ and the later step as $x:=x(t)$. Using $p(x|y)\propto p(y|x)p(x)$ and $f(x,t)\approx f(y,t)$, we obtain:
\begin{equation} p(y|x)\sim exp(-\frac{||y-x-f(y,t)dt||^2}{2g(t)^2 dt}) \end{equation}
Next, expand $\log p(x)$ around $y$:
\begin{equation} log p(x) = log p(y)+ (x-y)^T \nabla_x log p(y) \end{equation}
\begin{equation} log p(x|y) = log p(y|x) +log p(x) \end{equation}
Therefore:
\begin{equation} log p(x|y) = -\frac{||y-x-f(y,t)dt||^2}{2g(t)^2 dt} + (x-y)^T \nabla_x log p(y) + C \end{equation}
Define the score function $s:=\nabla_x \log p(y)$, and abbreviate $f:=f(y,t)$ and $g:=g(t)$:
\begin{equation} log p(x|y) = -\frac{||y-x-fdt||^2}{2g^2 dt} + (x-y)^T s + C \end{equation}
Expanding the square and ignoring higher-order small terms:
$(x-y+fdt)^T(x-y+fdt)=(x-y)^2+2(x-y)^Tfdt$
\begin{equation} log p(x|y) = -\frac{(x-y)^2+2(x-y)^Tfdt}{2g^2 dt} + (x-y)^T s + C \end{equation}
We now rewrite this in Gaussian form.
\begin{equation} log p(x|y) = -\frac{(x-y)^2+2(x-y)^Tfdt - (x-y)^T s 2g^2dt}{2g^2 dt} + C \end{equation}
\begin{equation} log p(x|y) = -\frac{(x-y)^2+2(x-y)^T(fdt-sg^2dt)}{2g^2 dt} + C \end{equation}
\begin{equation} log p(x|y) = -\frac{(x-y)^2+2(x-y)^T(fdt-sg^2dt)+(fdt-sg^2dt)^2}{2g^2 dt} + C’ \end{equation}
\begin{equation} log p(x|y) = -\frac{(x-(y-fdt+sg^2dt))^2}{2g^2 dt} + C’ \end{equation}
So we obtain the reverse-time transition:
\begin{equation} p(x|y) = N(y-fdt+sg^2dt, g^2dt) \end{equation}
This enables backward sampling if the score function $s:=\nabla_x \log p(y)$ is known. In practice, we train a neural network to estimate this score at each time step, and then sample backward using the expression above.
Learning the Score Function
From the standard DDPM parameterization:
\begin{equation} x_t=\alpha_t x_0 + \sigma_t \epsilon \rightarrow p(x_t|x_0) = N(\alpha_t x_0, \sigma_t^2 I) \end{equation}
\begin{equation} log p(x_t|x_0) = -\frac{||x_t-\alpha_t x_0||^2}{2\sigma_t^2} + C \rightarrow \nabla_{x_t} log p(x_t|x_0) = -\frac{x_t-\alpha_t x_0}{\sigma_t^2}=-\frac{\sigma_t \epsilon}{\sigma_t^2} \end{equation}
Therefore:
\begin{equation} \nabla_{x_t} log p(x_t|x_0) =-\frac{ \epsilon}{\sigma_t} \end{equation}
A network that predicts noise $\epsilon$ can therefore be directly rescaled to predict the score function.
Equivalence Between Conditional and Unconditional Scores
We start from: \begin{equation} p(x)=\int_{x_0} p(x|x_0)p(x_0)dx_0 \rightarrow \nabla_x p(x) = \int_{x_0} \nabla_x p(x_0|x) p(x_0) dx_0 \end{equation}
Dividing by $p(x)$ and using $\nabla_x \log p(x)=\frac{\nabla_x p(x)}{p(x)}$:
\begin{equation} \nabla_x log p(x) = \frac{\nabla_x p(x)}{p(x)} = \frac{1}{p(x)} \int_{x_0} p(x|x_0) \nabla_x log p(x|x_0) p(x_0) dx_0 \end{equation}
Hence:
\begin{equation} \nabla_x log p(x) = \int_{x_0} \frac{p(x|x_0) p(x_0)}{p(x)} \nabla_x log p(x|x_0) dx_0= \int_{x_0} p(x_0|x) \nabla_x log p(x|x_0) dx_0 \end{equation}
\begin{equation} \nabla_x log p(x) = E_{x_0|x} \nabla_x log p(x|x_0) \end{equation} This establishes the relationship between the unconditional score and the conditional score.
Solvers
Once we have the continuous-time reverse transition:
\begin{equation} p(x|y) = N(y-fdt+sg^2dt, g^2dt) \end{equation}
We start at $t=1$ from a fully noised image and integrate backward to $t=0$ to recover a sample. Any suitable SDE solver can be used, such as Euler-Maruyama or Heun.