Code
def placeholder_code():
passDecember 10, 2025
This page contains working notes and analyses for this ongoing project. Link to code notebook:
Inverse problem: There is an ordinary differential equation (ODE)/dynamical system with unknown parameters. We observe noisy data derived from the system and want to infer the parameters and quantify uncertainty.
If we have a solution to an ODE \(x(t; \theta)\), then there is a deterministic forward map \[ \theta \mapsto x(t; \theta). \]
We observe data \[ y_i = H(x(t_i; \theta)) + \varepsilon_i, \] where \(H\) is an observation operator (often the identity map or some projection), \(\varepsilon_i\) is a noise term.
The inverse problem is: Given \(y_i (i = 1, \dots, n)\), infer \(\theta\).
In general, these problems are ill-posed:
For these types of problems, Bayesian inference serves as a regularization and uncertainty quantification framework.
For now, we stick to a single ordinary differential equation (ODE) in order to introduce the Bayesian inference background. Later, we will see how these ideas extend to the case of dynamical systems containing multiple differential equations. Specifically, we will investigate the SIR model and the Lotka–Volterra system.
An ODE, given by: \[ \frac{dx}{dt} = f(x(t); \theta), x(0) = x_0, \] defines a deterministic mapping between parameters and solution trajectories, \[ \theta \mapsto x(\cdot; \theta). \] This map is:
We never observe the state \(x(t)\) exactly. Rather, we observe: \[ y_i = H(x(t_i; \theta)) + \varepsilon_i, \text{ for } i = 1, \dots, n, \] where \(\varepsilon_i \sim \mathcal{N}(0, \sigma^2)\).
For example, in a simple case, we may have \(H(x(t_i; \theta)) = x(t_i; \theta)\), the identity map, which means our observation model is simply the state with added Gaussian noise.
Then, our likelihood (observation model) is given by: \[ p(y \mid \theta) = \prod_{i=1}^{n}{\mathcal{N}(y_i \mid x(t_i; \theta), \sigma^2)}. \]
It is important to note that all “statistical difficulty” behind the parameter inference problem comes from the geometry of the map \[ \theta \mapsto (x(t_1; \theta), \dots, x(t_n; \theta)) \] We call this map the forward operator of the inverse problem.
In the previous section we looked at the form of the data likelihood function, given by the observation model of the ODE. We can now define the prior and posterior in the Bayesian context.
\[ \theta \sim \pi(\theta) \] For ODEs, parameters are typically positive, so we typically use the following distributions: Log-normal, Half-normal, Gamma, Inverse-gamma, etc.
\[ \pi(\theta \mid y) \propto p(y \mid \theta)\pi(\theta) \] For ODE inference problems, since in practice the forward map (likelihood) is nonlinear and partially non-identifiable, the posterior distribution is:
\[ p(y^{rep} \mid y) = \int{p(y^{rep} \mid \theta) \pi(\theta \mid y) d\theta} \] Since this expression will certainly be intractable, we solve numerically then plot the resulting distribution using the following procedure:
This is what uncertainty quantification (UQ) is all about – obtaining uncertainty bounds for solution trajectories.
A model is structurally identifiable if \[ x(t; \theta_1) = x(t; \theta_2) \text{ for all } t \implies \theta_1 = \theta_2. \] That is, there is only one parameter value that produces the observed trajectory (caveat: in the noise-free, infinite-data limit). Otherwise, the parameters are non-identifiable.
Even if the model is structurally identifiable, under the typical setting of finite, noisy, sparse data, the posterior may still be extremely wide, highly correlated, or exhibit some other degeneracy, so the parameters may be practically non-identifiable.
In general, non-identifiability in Bayesian inference will exhibit some particular patterns:
Identifiability can often be studied in terms of geometry. Recall that we let \[ G(\theta) = (x(t_1; \theta), \dots, x(t_n; \theta)) \] represent our forward map. Then, we can study the Jacobian of \(G\), given by \[ J(\theta) = \frac{\partial G(\theta)}{\partial \theta}. \] ^When we get small singular values of \(J\), these represent the directions we cannot infer; model misspecification often flattens or twists along these directions (Why?).
Model misspecification = the true governing dynamics are not in the model class being considered/used.
Formally, suppose that the true data are generated via \[ y \sim p_0(y). \] However, our model assumes \[ y \sim p(y \mid \theta), \text{ where } \theta \in \Theta \] and \[ p_0 \notin \{p(\cdot \mid \theta): \theta \in \Theta\}. \] A misspecified model for ODE inverse problems can lead to various harmful consequences, beyond merely inflating uncertainty or noise in inference outcomes. Indeed, inference can become more confident, but confidently wrong. Specifically, from (Kleijn and Van der Vaart), we know that under quite general conditions, \[ \pi(\theta \mid y_{1:n}) \rightarrow \delta_{\theta^*}, \] where \[ \theta^* = \operatorname*{argmin}_\theta {\mathrm{KL}\left(p_0(y) \,\|\, p(y \mid \theta)\right)}. \] That is, the posterior concentrates around the wrong value of the parameter—the ”best wrong” parameter—and with false confidence.
Then, the credible intervals will be centered at the wrong value and tend to severely undercover.
Things can get even worse in differential equations settings:
The forward map is nonlinear.
The misspecification distorts the geometry drastically (More details on this in the next section).
The model compensates by:
Typical misspecification effects include:
Given samples from the posterior distribution \(\pi(\theta \mid y)\), we can look at various quantities and diagnostics of its geometry:
Additionally, we can investigate the posterior predictive distribution to analyze the effects of model misspecification on prediction. We can simulate \[ y^{rep} \sim p(y \mid \theta) \,\text{ where } \theta \sim \pi(\theta \mid y) \] and compare the resulting data to the real data values, looking at trajectories, peaks, timing, shapes, residuals, etc. If our model cannot reproduce these key features of the original data, then the model is wrong.
Be careful: Posterior predictive checks can sometimes to detect misspecification!
If we know the true parameter value in the data generating process, we can run calibration and coverage checks to see how misspecification affects our ability to recover the correct parameter.
Parameter coverage: Over repeated experiments, does the \(95\%\) posterior credible interval contain the true parameter \(95\%\) of the time?
Predictive coverage: Does the \(95\%\) posterior predictive band contain the true trajectory \(95\%\) of the time?
Calibration curve: If we plot nominal level \((0.1, 0.2, \dots, 0.9)\) on the \(x\)-axis and empirical coverage on the \(y\)-axis, do we see a nearly diagonal line, reflecting perfect/strong calibration?
In real science, models are always wrong. In particular, much of science relies upon the use of differential equations to model phenomena, and these equations are always wrong/misspecified. But, many people, including researchers, still trust inference under these misspecified conditions, which can lead to some dangerous overconfidence when interpreting results. We should be wary of the pitfalls of model misspecification and make more conscious efforts to quantify the uncertainty inherent in any modeling done on data. This is much of the driving motivation behind this project…
The Lotka–Volterra system of differential equations models the interactions between a prey species and a predator species in a population.
The model is given by: \[ \frac{dx}{dt} = \alpha x - \beta xy \] \[ \frac{dy}{dt} = -\gamma y + \delta xy \]
The states in this model are \(X(t) = [x(t), y(t)]\), representing the population/density of the prey and predator species, respectively.
The unknown model parameters are:
(One important note to make for this model is that there exists a conserved quantity given by \(H(x, y) = \delta x - \gamma \ln(x) + \beta y - \alpha \ln(y)\). So, solution trajectories will live on the level sets of \(H\). This is a Hamiltonian-like conservative system. Since this project is focused on statistical inference, we omit the further interesting details of this system.)
We make the following remarks on the qualitative dynamics of the model:
Crucially, since we observe oscillations, multiple parameter combinations could give us similar period/shape or phase shifts; that is, there could be strong tradeoffs between \(\alpha\) and \(\gamma\), and between \(\beta\) and \(\delta\). This can lead to potential parameter identifiability issues…
In reality, we do not observe the states \(x(t), y(t)\) directly. We may instead observe:
We use the observation model \[ y_t = H(x(t; \theta)) + \varepsilon_t, \] where typically \(H(\cdot)\) is the identity map and \(\epsilon_t\) is Gaussian noise.
Misspecification enters because real systems will certainly have one or more of the following:
SIR = Susceptible–Infected–Recovered
States:
Since we assume the population is “closed,” meaning the population size stays constant, we have the constraint \(S(t) + I(t) + R(t) = N\); This effectively turns this system of 3 differential equations into a 2D system.
We also assume that once an individual recovers, they are no longer susceptible to contracting the disease.
Parameters:
Equations: \[ \frac{dS}{dt} = -\beta \frac{SI}{N} \] \[ \frac{dI}{dt} = \beta \frac{SI}{N} - \gamma I \] \[ \frac{dR}{dt} = \gamma I \]
Qualitative dynamics:
In reality, we do not observe the states \(S(t), I(t), R(t)\) directly or exactly. We may instead observe:
Observation model: \[ y_t = H(x(t)) + \varepsilon_t \] For example, a reasonable model might be \(y_t = Poisson(\rho I(t))\).
The SIR model is a textbook example of practical non-identifiability. For instance, if we can only observe \(I(t)\), then the parameters \(\beta\) and \(\gamma\) are heavily correlated (and the posterior will have a ridge along roughly constant \(R_0\)).
Misspecification is super relevant in models like this because with real epidemics, assumptions from the SIR model formulation are violated due to the following convenient assumptions taken instead:
So, suppose that we fit the SIR model to a case where the behavior is more aptly described by an SEIR model, or perhaps with a time-varying transmission rate \(\beta(t)\). Then, we will get biased/wrong parameters, falsely confident posteriors, and the wrong uncertainty conclusions!
The code and analysis for this project are currently ongoing and this section will be updated and amended regularly.
First, we wish to demonstrate the “baseline” case of Bayesian inference for the Lotka–Volterra model with no structural misspecification. This is the best-case scenario where the model is correctly specified and we are able to conduct our inference procedure to recover the true parameters (approximately) without issues.
In this subsection we introduce the specifications for the simulation study on the effect of misspecification on the Bayesian inference procedure. In particular, we discuss the data generating process, the misspecified models to be fit, other configurable settings for experimentation, the inference procedure, and the final results we will investigate.
We define the true data generating process (DGP) as being given by a generalized Lotka–Volterra model with a saturated functional response incorporated into it: \[ \frac{dx}{dt} = \alpha x - \frac{\beta xy}{1+hx} \] \[ \frac{dy}{dt} = -\gamma y + \frac{\delta xy}{1+hx}, \] where \(h\) is the saturated (Holling type II) functional response parameter that defines the structural misspecification in the model. When \(h = 0\), we recover the standard Lotka–Volterra equations.
The parameters we wish to infer for this model are \(\theta = (\alpha, \beta, \gamma, \delta)\).
Our true data will be generated from the above system, but we assume, incorrrectly, that we can fit the standard Lotka–Volterra model (i.e., \(h = 0\) above) to the data. This defines our structural misspecification for this study.
Next, we define the parameter settings and initial conditions for the true DGP. These values were chosen by reconciling various documented sources and best practices for this system and will be held constant across all experiments. (Note: We can choose to fix the values of the initial populations, \(x_0, y_0\), or treat them as parameters and infer them; for now we fix them for ease.) ::: {.table-narrow style=“width: 70%; margin: auto;”} | Parameter | Value | |———–|——-| | \(\alpha\) | 1 | | \(\beta\) | 0.1 | | \(\gamma\) | 1.5 | | \(\delta\) | 0.075 | | \(x_0\) | 40 | | \(y_0\) | 10 | : Table 1: Generalized Lotka–Volterra parameter settings for experiments:::
For our choices of priors on the parameters to be inferred, we use the following: \[\alpha \sim \text{LogNormal}(\log(1), 0.5)\] \[\beta \sim \text{LogNormal}(\log(0.1), 0.5)\] \[\gamma \sim \text{LogNormal}(\log(1.5), 0.5)\] \[\delta \sim \text{LogNormal}(\log(0.075), 0.5)\]
Lastly, we define the observation model: \[ z_t^x = x(t) + \varepsilon_t^x \] \[ z_t^y = y(t) + \varepsilon_t^y, \] where for simplicity, we suppose \(\varepsilon_t^x, \varepsilon_t^y \sim \mathcal{N}(0, \sigma^2)\) are independent Gaussian noise terms.
Experiment grid (table of different misspecification scenarios to test): ::: {.table-narrow style=“width: 70%; margin: auto;”} | Settings | Levels | |———————————- |——————–| | Structural misspecification (\(h\)) | 0, 0.02, 0.05, 0.1 | | Observation time grid density | dense, sparse | | Noise level (\(sigma\)) | 0.5, 1 | : Table 2: Experiment layout for misspecification scenarios:::