Bayesian Inverse Problems for ODE Systems under Model Misspecification

statistics
bayesian
differential equations
inverse problems
uncertainty quantification
Python
Exploring Model Misspecification, Parameter Identifiability, and Posterior Uncertainty in Bayesian Inverse Problems for the Lotka–Volterra and SIR Models
Published

December 10, 2025

This page contains working notes and analyses for this ongoing project. Link to code notebook:

Background on Bayesian Inverse Problems/Model Misspecification for ODEs

Big Picture

Inverse problem: There is an ordinary differential equation (ODE)/dynamical system with unknown parameters. We observe noisy data derived from the system and want to infer the parameters and quantify uncertainty.

If we have a solution to an ODE \(x(t; \theta)\), then there is a deterministic forward map \[ \theta \mapsto x(t; \theta). \]

We observe data \[ y_i = H(x(t_i; \theta)) + \varepsilon_i, \] where \(H\) is an observation operator (often the identity map or some projection), \(\varepsilon_i\) is a noise term.

The inverse problem is: Given \(y_i (i = 1, \dots, n)\), infer \(\theta\).

In general, these problems are ill-posed:

  • Many values of \(\theta\) can produce similar trajectories, preventing identifiability of \(\theta\).
  • Noise amplifies ambiguity behind inference.
  • Models are often (more like always) wrong/misspecified.

For these types of problems, Bayesian inference serves as a regularization and uncertainty quantification framework.

ODE Models as Statistical Objects

For now, we stick to a single ordinary differential equation (ODE) in order to introduce the Bayesian inference background. Later, we will see how these ideas extend to the case of dynamical systems containing multiple differential equations. Specifically, we will investigate the SIR model and the Lotka–Volterra system.

An ODE, given by: \[ \frac{dx}{dt} = f(x(t); \theta), x(0) = x_0, \] defines a deterministic mapping between parameters and solution trajectories, \[ \theta \mapsto x(\cdot; \theta). \] This map is:

  • generally nonlinear
  • smooth in \(\theta\) under mild conditions
  • can be very insensitive in some directions but very sensitive in other directions – this sensitivity structure is the root of many parameter identifiability issues

We never observe the state \(x(t)\) exactly. Rather, we observe: \[ y_i = H(x(t_i; \theta)) + \varepsilon_i, \text{ for } i = 1, \dots, n, \] where \(\varepsilon_i \sim \mathcal{N}(0, \sigma^2)\).

For example, in a simple case, we may have \(H(x(t_i; \theta)) = x(t_i; \theta)\), the identity map, which means our observation model is simply the state with added Gaussian noise.

Then, our likelihood (observation model) is given by: \[ p(y \mid \theta) = \prod_{i=1}^{n}{\mathcal{N}(y_i \mid x(t_i; \theta), \sigma^2)}. \]

It is important to note that all “statistical difficulty” behind the parameter inference problem comes from the geometry of the map \[ \theta \mapsto (x(t_1; \theta), \dots, x(t_n; \theta)) \] We call this map the forward operator of the inverse problem.

Bayesian Formulation of ODE Parameter Inference

In the previous section we looked at the form of the data likelihood function, given by the observation model of the ODE. We can now define the prior and posterior in the Bayesian context.

Prior

\[ \theta \sim \pi(\theta) \] For ODEs, parameters are typically positive, so we typically use the following distributions: Log-normal, Half-normal, Gamma, Inverse-gamma, etc.

Posterior

\[ \pi(\theta \mid y) \propto p(y \mid \theta)\pi(\theta) \] For ODE inference problems, since in practice the forward map (likelihood) is nonlinear and partially non-identifiable, the posterior distribution is:

  • often strongly correlated
  • sometimes multimodal
  • frequently ridge-like in geometry

Posterior Predictive Distribution

\[ p(y^{rep} \mid y) = \int{p(y^{rep} \mid \theta) \pi(\theta \mid y) d\theta} \] Since this expression will certainly be intractable, we solve numerically then plot the resulting distribution using the following procedure:

  1. Sample \(\theta^{(s)} \sim \pi(\theta \mid y)\).
  2. Simulate ODE to get \(x^{(s)}(t)\).
  3. Add Gaussian noise to obtain \(y_{rep}^{(s)}\).

This is what uncertainty quantification (UQ) is all about – obtaining uncertainty bounds for solution trajectories.

Identifiability

A model is structurally identifiable if \[ x(t; \theta_1) = x(t; \theta_2) \text{ for all } t \implies \theta_1 = \theta_2. \] That is, there is only one parameter value that produces the observed trajectory (caveat: in the noise-free, infinite-data limit). Otherwise, the parameters are non-identifiable.

Even if the model is structurally identifiable, under the typical setting of finite, noisy, sparse data, the posterior may still be extremely wide, highly correlated, or exhibit some other degeneracy, so the parameters may be practically non-identifiable.

In general, non-identifiability in Bayesian inference will exhibit some particular patterns:

  • flat directions or ridges in the posterior
  • large or near-perfect correlations between parameters
  • very large credible intervals

Identifiability can often be studied in terms of geometry. Recall that we let \[ G(\theta) = (x(t_1; \theta), \dots, x(t_n; \theta)) \] represent our forward map. Then, we can study the Jacobian of \(G\), given by \[ J(\theta) = \frac{\partial G(\theta)}{\partial \theta}. \] ^When we get small singular values of \(J\), these represent the directions we cannot infer; model misspecification often flattens or twists along these directions (Why?).

Model Misspecification

Model misspecification = the true governing dynamics are not in the model class being considered/used.

Formally, suppose that the true data are generated via \[ y \sim p_0(y). \] However, our model assumes \[ y \sim p(y \mid \theta), \text{ where } \theta \in \Theta \] and \[ p_0 \notin \{p(\cdot \mid \theta): \theta \in \Theta\}. \] A misspecified model for ODE inverse problems can lead to various harmful consequences, beyond merely inflating uncertainty or noise in inference outcomes. Indeed, inference can become more confident, but confidently wrong. Specifically, from (Kleijn and Van der Vaart), we know that under quite general conditions, \[ \pi(\theta \mid y_{1:n}) \rightarrow \delta_{\theta^*}, \] where \[ \theta^* = \operatorname*{argmin}_\theta {\mathrm{KL}\left(p_0(y) \,\|\, p(y \mid \theta)\right)}. \] That is, the posterior concentrates around the wrong value of the parameter—the ”best wrong” parameter—and with false confidence.

Then, the credible intervals will be centered at the wrong value and tend to severely undercover.

Misspecification in ODE Inverse Problems

Things can get even worse in differential equations settings:

  • The forward map is nonlinear.

  • The misspecification distorts the geometry drastically (More details on this in the next section).

  • The model compensates by:

    • pushing parameters into “weird” correlated regimes
    • inventing fake explanations of behavior
  • Typical misspecification effects include:

    • strong posterior correlations
    • parameter tradeoffs
    • ridges
    • sometimes multimodality
    • narrow but incorrect posteriors – overconfidence

Posterior Geometry

Given samples from the posterior distribution \(\pi(\theta \mid y)\), we can look at various quantities and diagnostics of its geometry:

  • covariance/correlation matrix
  • pair plots
  • condition number
  • entropy proxy
    • \(H \approx \frac{1}{2}\log\det\vert{\Sigma}\vert\)

Posterior Predictive Checking

Additionally, we can investigate the posterior predictive distribution to analyze the effects of model misspecification on prediction. We can simulate \[ y^{rep} \sim p(y \mid \theta) \,\text{ where } \theta \sim \pi(\theta \mid y) \] and compare the resulting data to the real data values, looking at trajectories, peaks, timing, shapes, residuals, etc. If our model cannot reproduce these key features of the original data, then the model is wrong.

Be careful: Posterior predictive checks can sometimes to detect misspecification!

Calibration and Coverage

If we know the true parameter value in the data generating process, we can run calibration and coverage checks to see how misspecification affects our ability to recover the correct parameter.

Parameter coverage: Over repeated experiments, does the \(95\%\) posterior credible interval contain the true parameter \(95\%\) of the time?

  • Under model misspecification, the answer to this will be no.

Predictive coverage: Does the \(95\%\) posterior predictive band contain the true trajectory \(95\%\) of the time?

  • Again, under misspecification, this will collapse (badly).

Calibration curve: If we plot nominal level \((0.1, 0.2, \dots, 0.9)\) on the \(x\)-axis and empirical coverage on the \(y\)-axis, do we see a nearly diagonal line, reflecting perfect/strong calibration?

Key Takeaways

In real science, models are always wrong. In particular, much of science relies upon the use of differential equations to model phenomena, and these equations are always wrong/misspecified. But, many people, including researchers, still trust inference under these misspecified conditions, which can lead to some dangerous overconfidence when interpreting results. We should be wary of the pitfalls of model misspecification and make more conscious efforts to quantify the uncertainty inherent in any modeling done on data. This is much of the driving motivation behind this project…

Background on the Lotka–Volterra Model

Lotka–Volterra Model

The Lotka–Volterra system of differential equations models the interactions between a prey species and a predator species in a population.

The model is given by: \[ \frac{dx}{dt} = \alpha x - \beta xy \] \[ \frac{dy}{dt} = -\gamma y + \delta xy \]

The states in this model are \(X(t) = [x(t), y(t)]\), representing the population/density of the prey and predator species, respectively.

The unknown model parameters are:

  • \(\alpha\): the prey growth rate when there are no predators
  • \(\beta\): the death rate of prey in the presence of predation
  • \(\gamma\): the death rate of predators when there are no prey
  • \(\delta\): the predator growth rate in the presence of prey

(One important note to make for this model is that there exists a conserved quantity given by \(H(x, y) = \delta x - \gamma \ln(x) + \beta y - \alpha \ln(y)\). So, solution trajectories will live on the level sets of \(H\). This is a Hamiltonian-like conservative system. Since this project is focused on statistical inference, we omit the further interesting details of this system.)

We make the following remarks on the qualitative dynamics of the model:

  • cyclic oscillations
  • closed orbits in the phase space
  • period and amplitude depend on parameter values
  • in an ideal model, no damping and no growth

Crucially, since we observe oscillations, multiple parameter combinations could give us similar period/shape or phase shifts; that is, there could be strong tradeoffs between \(\alpha\) and \(\gamma\), and between \(\beta\) and \(\delta\). This can lead to potential parameter identifiability issues…

In reality, we do not observe the states \(x(t), y(t)\) directly. We may instead observe:

  • only prey or only predators
  • measurement noise
  • irregular or biased sampling

We use the observation model \[ y_t = H(x(t; \theta)) + \varepsilon_t, \] where typically \(H(\cdot)\) is the identity map and \(\epsilon_t\) is Gaussian noise.

Misspecification enters because real systems will certainly have one or more of the following:

  • carrying capacity (like logistic growth assumptions)
  • damping
  • seasonal forcing
  • noise!

Background on the SIR Model

SIR Model

SIR = Susceptible–Infected–Recovered

  • models the spread of an infectious disease inside a closed population of\(N\) individuals

States:

  • \(S(t)\): the number of individuals that are susceptible to the disease
  • \(I(t)\): the number of individuals that are infected by the disease
  • \(R(t)\): the number of individuals that have recovered from the disease

Since we assume the population is “closed,” meaning the population size stays constant, we have the constraint \(S(t) + I(t) + R(t) = N\); This effectively turns this system of 3 differential equations into a 2D system.

We also assume that once an individual recovers, they are no longer susceptible to contracting the disease.

Parameters:

  • \(\beta\): transmission rate (controls how fast the disease spreads)
  • \(\gamma\): recovery rate (controls how fast individuals recover from the disease)
  • \(R_0 := \frac{\beta}{\gamma}\) (derived from the other parameters): basic reproduction number
    • Often only \(R_0\) is specified or measured, not \(\beta\) and \(\gamma\) separately –> structural identifiability issues!

Equations: \[ \frac{dS}{dt} = -\beta \frac{SI}{N} \] \[ \frac{dI}{dt} = \beta \frac{SI}{N} - \gamma I \] \[ \frac{dR}{dt} = \gamma I \]

Qualitative dynamics:

  • Initially, \(I(t)\) has exponential growth if \(R_0 > 1\).
  • Peak infections occur at \(S(t) = \frac{\gamma}{\beta}\).
  • Eventually, \(I(t) \rightarrow 0\), meaning the disease dies out.
  • The trajectory of \(I(t)\) is controlled by \(R_0\) – skewed, one peak/hump.

In reality, we do not observe the states \(S(t), I(t), R(t)\) directly or exactly. We may instead observe:

  • Daily new cases
  • Reported infected (prone to under-reporting bias)
  • Cumulative cases

Observation model: \[ y_t = H(x(t)) + \varepsilon_t \] For example, a reasonable model might be \(y_t = Poisson(\rho I(t))\).

The SIR model is a textbook example of practical non-identifiability. For instance, if we can only observe \(I(t)\), then the parameters \(\beta\) and \(\gamma\) are heavily correlated (and the posterior will have a ridge along roughly constant \(R_0\)).

Misspecification is super relevant in models like this because with real epidemics, assumptions from the SIR model formulation are violated due to the following convenient assumptions taken instead:

  • Homogeneous mixing
  • Constant \(\beta\)
  • Constant population (no entering or exiting, i.e., births or deaths)
  • No incubation period

So, suppose that we fit the SIR model to a case where the behavior is more aptly described by an SEIR model, or perhaps with a time-varying transmission rate \(\beta(t)\). Then, we will get biased/wrong parameters, falsely confident posteriors, and the wrong uncertainty conclusions!

Code and Analysis

The code and analysis for this project are currently ongoing and this section will be updated and amended regularly.

1. The Correct Model

First, we wish to demonstrate the “baseline” case of Bayesian inference for the Lotka–Volterra model with no structural misspecification. This is the best-case scenario where the model is correctly specified and we are able to conduct our inference procedure to recover the true parameters (approximately) without issues.

Code
def placeholder_code():
    pass

2. Misspecified Models

In this subsection we introduce the specifications for the simulation study on the effect of misspecification on the Bayesian inference procedure. In particular, we discuss the data generating process, the misspecified models to be fit, other configurable settings for experimentation, the inference procedure, and the final results we will investigate.

We define the true data generating process (DGP) as being given by a generalized Lotka–Volterra model with a saturated functional response incorporated into it: \[ \frac{dx}{dt} = \alpha x - \frac{\beta xy}{1+hx} \] \[ \frac{dy}{dt} = -\gamma y + \frac{\delta xy}{1+hx}, \] where \(h\) is the saturated (Holling type II) functional response parameter that defines the structural misspecification in the model. When \(h = 0\), we recover the standard Lotka–Volterra equations.

The parameters we wish to infer for this model are \(\theta = (\alpha, \beta, \gamma, \delta)\).

Our true data will be generated from the above system, but we assume, incorrrectly, that we can fit the standard Lotka–Volterra model (i.e., \(h = 0\) above) to the data. This defines our structural misspecification for this study.

Next, we define the parameter settings and initial conditions for the true DGP. These values were chosen by reconciling various documented sources and best practices for this system and will be held constant across all experiments. (Note: We can choose to fix the values of the initial populations, \(x_0, y_0\), or treat them as parameters and infer them; for now we fix them for ease.) ::: {.table-narrow style=“width: 70%; margin: auto;”} | Parameter | Value | |———–|——-| | \(\alpha\) | 1 | | \(\beta\) | 0.1 | | \(\gamma\) | 1.5 | | \(\delta\) | 0.075 | | \(x_0\) | 40 | | \(y_0\) | 10 | : Table 1: Generalized Lotka–Volterra parameter settings for experiments

:::

For our choices of priors on the parameters to be inferred, we use the following: \[\alpha \sim \text{LogNormal}(\log(1), 0.5)\] \[\beta \sim \text{LogNormal}(\log(0.1), 0.5)\] \[\gamma \sim \text{LogNormal}(\log(1.5), 0.5)\] \[\delta \sim \text{LogNormal}(\log(0.075), 0.5)\]

Lastly, we define the observation model: \[ z_t^x = x(t) + \varepsilon_t^x \] \[ z_t^y = y(t) + \varepsilon_t^y, \] where for simplicity, we suppose \(\varepsilon_t^x, \varepsilon_t^y \sim \mathcal{N}(0, \sigma^2)\) are independent Gaussian noise terms.

Experiment grid (table of different misspecification scenarios to test): ::: {.table-narrow style=“width: 70%; margin: auto;”} | Settings | Levels | |———————————- |——————–| | Structural misspecification (\(h\)) | 0, 0.02, 0.05, 0.1 | | Observation time grid density | dense, sparse | | Noise level (\(sigma\)) | 0.5, 1 | : Table 2: Experiment layout for misspecification scenarios

:::

Code
def placeholder_code():
    pass