blog_03-demand-inversion-as-optimal-transport

Demand inversion via optimal transport

January 19, 2026

Part of the blog “Math of Choice”, based on Alfred Galichon’s forthcoming book, Discrete Choice Models: Mathematical Methods, Econometrics, and Data Science, Princeton University Press, April 2026.

In the first two entries of this series (Part 1 and Part 2), we adopted the perspective of the theorist. We started with a known structure of preferences (systematic utilities $U$ and a distribution of random shocks $\mathcal{P}$) and asked: what are the resulting market shares $\boldsymbol{\pi}(U)$? We found that this problem is equivalent to maximizing a social welfare function regularized by an entropy term.

Today, we flip the script. We take the perspective of the econometrician. In the real world, we observe the market shares $\pi$ (the data) and we want to recover the underlying systematic utilities $U$ (the preferences) that generated them. This is the demand inversion problem.

While this sounds like a standard statistical estimation task, it turns out to be a geometric problem in disguise. As we will see, recovering preferences from choices is equivalent to solving an Optimal Transport problem.

The Econometrician’s Problem

The demand inversion problem can be stated simply: given an observed vector of market shares $\pi$ and a distribution of unobserved heterogeneity $\mathcal{P}$, find the vector of systematic utilities $U$ such that the predicted market shares match the observed ones:

$$ \boldsymbol{\pi}(U) = \pi $$

Mathematically, we are looking for the inverse map $\boldsymbol{\pi}^{-1}(\pi)$. But does this inverse exist? Is it unique? Intuitively, if the distribution of random shocks $\mathcal{P}$ has “holes” or gaps, we might not be able to find a utility vector that perfectly matches the data. Conversely, if the distribution has “flat spots,” multiple utility vectors might yield the same market shares.

To ensure a well-behaved inversion, we typically rely on two assumptions:

Continuity: the distribution $\mathcal{P}$ has a density (no mass points), which ensures demand is continuous.
Full support: the random shocks cover the entire space (no gaps), which ensures uniqueness.

Under these conditions, the inverse demand is well-defined. But how do we compute it?

Inversion via Convex Optimization

Recall from our previous post on Generalized Entropy that the relationship between utility $U$ and market shares $\pi$ is governed by convex duality. Specifically, we defined the Generalized Entropy of Choice, $G^*(\pi)$, as the Legendre-Fenchel transform of the welfare function $G(U)$, that is:

$$ G^*(\pi) = \max_{U \in \mathbb{R}^Y} \left\{ \sum_{y \in [Y]} \pi_y U_y – G(U) \right\}. $$

A fundamental result of convex analysis is that the gradient of a conjugate function inverts the gradient of the original function. Since $\pi = \nabla G(U)$, it follows that:

$$ U = \nabla G^*(\pi) .$$

Note that $\pi = \nabla G(U)$ arises as the envelope theorem in the expression of $G^\ast$, while $ U = \nabla G^*(\pi) $ arises as the optimality conditions in the same expression. This is a powerful insight, because it transforms a system of non-linear equations into an optimization problem. To find the vector $U$ that explains the data $\pi$, we do not need to root-find; instead, we solve:

$$ \boldsymbol{\pi}^{-1}(\pi) = \arg \max_{U \in \mathbb{R}^Y} \{ \pi^\top U – G(U) \} $$

This formulation allows us to use the very rich toolbox of optimization algorithms to recover the systematic utilities.

The Optimal Transport Connection

But we can go deeper. To understand what this inversion actually does to the data, we need to look at the structure of $G^*(\pi)$. In 2010, while working on an early version of our “Cupid’s invisible hands,” Bernard Salanié and I established a connection that bridges econometrics and computational geometry. We showed that calculating the generalized entropy $G^*(\pi)$—and consequently performing the demand inversion—is exactly an Optimal Transport (OT) problem.

The Intuition:
Imagine the random utility shocks $\varepsilon$ as a pile of sand distributed according to $\mathcal{P}$. The consumers are trying to assign these shocks to specific choices $y$ in a way that is consistent with the aggregate market shares $\pi$. The “cost” of the assignment is determined by the value of the shocks. The goal is to maximize the expected utility of the assignment.

Inversion Theorem (Galichon and Salanié):
The negative entropy $-G^*(\pi)$ is the value of the following optimal transport problem:

$$ -G^*(\pi) = \max_{\lambda \in \mathcal{M}(\mathcal{P}, \pi)} \mathbb{E}_{\lambda} [\varepsilon_{\tilde{y}}] $$

where $\mathcal{M}(\mathcal{P}, \pi)$ is the set of joint probability distributions (couplings) where the shocks $\varepsilon$ follow distribution $\mathcal{P}$ and the choices $\tilde{y}$ follow distribution $\pi$.

This is a Monge-Kantorovich transport problem. We are transporting the mass of unobserved heterogeneity $\mathcal{P}$ onto the discrete set of options with masses $\pi$. The “dual” formulation of this problem recovers our systematic utilities:

$$
\begin{aligned}
-G^*(\pi) = \min_{u, U} \quad & \int u(\varepsilon) d\mathcal{P}(\varepsilon) – \sum_{y \in [Y]} \pi_y U_y \\
\text{s.t.} \quad & u(\varepsilon) – U_y \geq \varepsilon_y \quad \forall \varepsilon, y
\end{aligned}
$$

Here, the dual variable $U_y$ corresponds exactly to the systematic utility of alternative $y$. This means that inverting a demand system is mathematically equivalent to computing the optimal transport cost between the distribution of noise and the distribution of choices.

Computational Implications

Why does this matter? Beyond the theoretical elegance, this result opens the door to powerful computational methods. Since optimal transport problems are linear programs, we can use the vast arsenal of algorithms developed for OT to solve discrete choice problems.

In practice, we can approximate the integral over $\mathcal{P}$ by drawing a sample of simulated consumers (random shocks). The demand inversion then becomes a discrete linear programming problem (or a semi-discrete one), which can be solved efficiently even for complex, non-standard distributions of heterogeneity where no closed-form formula (like the logit) exists.

This framework allows us to break free from the restrictive assumptions of the Gumbel distribution and model complex substitution patterns, all while maintaining a tractable estimation procedure.

In the next post, we will leave the world of exact integrals for the world of sampling and simulation. We will see how to turn our theoretical insights into tractable algorithms using Monte Carlo simulation and Linear Programming.

Reference

Galichon, Alfred. 2026. Discrete Choice Models: Mathematical Methods, Econometrics, and Data Science. Princeton University Press. Chapter 1.

← Previous post
Math of Choice
Next post →

January 16th, 2026

category: Uncategorized

Alfred Galichon

Econometrics, Quantitative Economics, Data Science