Econometrics, Quantitative Economics, Data Science

Author Archive

blog_03-demand-inversion-as-optimal-transport

Demand inversion via optimal transport

January 19, 2026

Part of the blog “Math of Choice”, based on Alfred Galichon’s forthcoming book, Discrete Choice Models: Mathematical Methods, Econometrics, and Data Science, Princeton University Press, April 2026.

In the first two entries of this series (Part 1 and Part 2), we adopted the perspective of the theorist. We started with a known structure of preferences (systematic utilities \(U\) and a distribution of random shocks \(\mathcal{P}\)) and asked: what are the resulting market shares \(\boldsymbol{\pi}(U)\)? We found that this problem is equivalent to maximizing a social welfare function regularized by an entropy term.

Today, we flip the script. We take the perspective of the econometrician. In the real world, we observe the market shares \(\pi\) (the data) and we want to recover the underlying systematic utilities \(U\) (the preferences) that generated them. This is the demand inversion problem.

While this sounds like a standard statistical estimation task, it turns out to be a geometric problem in disguise. As we will see, recovering preferences from choices is equivalent to solving an Optimal Transport problem.

The Econometrician’s Problem

The demand inversion problem can be stated simply: given an observed vector of market shares \(\pi\) and a distribution of unobserved heterogeneity \(\mathcal{P}\), find the vector of systematic utilities \(U\) such that the predicted market shares match the observed ones:

$$ \boldsymbol{\pi}(U) = \pi $$

Mathematically, we are looking for the inverse map \(\boldsymbol{\pi}^{-1}(\pi)\). But does this inverse exist? Is it unique? Intuitively, if the distribution of random shocks \(\mathcal{P}\) has “holes” or gaps, we might not be able to find a utility vector that perfectly matches the data. Conversely, if the distribution has “flat spots,” multiple utility vectors might yield the same market shares.

To ensure a well-behaved inversion, we typically rely on two assumptions:

  • Continuity: the distribution \(\mathcal{P}\) has a density (no mass points), which ensures demand is continuous.
  • Full support: the random shocks cover the entire space (no gaps), which ensures uniqueness.

Under these conditions, the inverse demand is well-defined. But how do we compute it?

Inversion via Convex Optimization

Recall from our previous post on Generalized Entropy that the relationship between utility \(U\) and market shares \(\pi\) is governed by convex duality. Specifically, we defined the Generalized Entropy of Choice, \(G^*(\pi)\), as the Legendre-Fenchel transform of the welfare function \(G(U)\), that is:

$$ G^*(\pi) = \max_{U \in \mathbb{R}^Y} \left\{ \sum_{y \in [Y]} \pi_y U_y – G(U) \right\}. $$

A fundamental result of convex analysis is that the gradient of a conjugate function inverts the gradient of the original function. Since \(\pi = \nabla G(U)\), it follows that:

$$ U = \nabla G^*(\pi) .$$

Note that \(\pi = \nabla G(U)\) arises as the envelope theorem in the expression of \(G^\ast\), while \( U = \nabla G^*(\pi) \) arises as the optimality conditions in the same expression. This is a powerful insight, because it transforms a system of non-linear equations into an optimization problem. To find the vector \(U\) that explains the data \(\pi\), we do not need to root-find; instead, we solve:

$$ \boldsymbol{\pi}^{-1}(\pi) = \arg \max_{U \in \mathbb{R}^Y} \{ \pi^\top U – G(U) \} $$

This formulation allows us to use the very rich toolbox of optimization algorithms to recover the systematic utilities.

The Optimal Transport Connection

But we can go deeper. To understand what this inversion actually does to the data, we need to look at the structure of \(G^*(\pi)\). In 2010, while working on an early version of our “Cupid’s invisible hands,” Bernard Salanié and I established a connection that bridges econometrics and computational geometry. We showed that calculating the generalized entropy \(G^*(\pi)\)—and consequently performing the demand inversion—is exactly an Optimal Transport (OT) problem.

The Intuition:
Imagine the random utility shocks \(\varepsilon\) as a pile of sand distributed according to \(\mathcal{P}\). The consumers are trying to assign these shocks to specific choices \(y\) in a way that is consistent with the aggregate market shares \(\pi\). The “cost” of the assignment is determined by the value of the shocks. The goal is to maximize the expected utility of the assignment.

Inversion Theorem (Galichon and Salanié):
The negative entropy \(-G^*(\pi)\) is the value of the following optimal transport problem:

$$ -G^*(\pi) = \max_{\lambda \in \mathcal{M}(\mathcal{P}, \pi)} \mathbb{E}_{\lambda} [\varepsilon_{\tilde{y}}] $$

where \(\mathcal{M}(\mathcal{P}, \pi)\) is the set of joint probability distributions (couplings) where the shocks \(\varepsilon\) follow distribution \(\mathcal{P}\) and the choices \(\tilde{y}\) follow distribution \(\pi\).

This is a Monge-Kantorovich transport problem. We are transporting the mass of unobserved heterogeneity \(\mathcal{P}\) onto the discrete set of options with masses \(\pi\). The “dual” formulation of this problem recovers our systematic utilities:

$$
\begin{aligned}
-G^*(\pi) = \min_{u, U} \quad & \int u(\varepsilon) d\mathcal{P}(\varepsilon) – \sum_{y \in [Y]} \pi_y U_y \\
\text{s.t.} \quad & u(\varepsilon) – U_y \geq \varepsilon_y \quad \forall \varepsilon, y
\end{aligned}
$$

Here, the dual variable \(U_y\) corresponds exactly to the systematic utility of alternative \(y\). This means that inverting a demand system is mathematically equivalent to computing the optimal transport cost between the distribution of noise and the distribution of choices.

Computational Implications

Why does this matter? Beyond the theoretical elegance, this result opens the door to powerful computational methods. Since optimal transport problems are linear programs, we can use the vast arsenal of algorithms developed for OT to solve discrete choice problems.

In practice, we can approximate the integral over \(\mathcal{P}\) by drawing a sample of simulated consumers (random shocks). The demand inversion then becomes a discrete linear programming problem (or a semi-discrete one), which can be solved efficiently even for complex, non-standard distributions of heterogeneity where no closed-form formula (like the logit) exists.

This framework allows us to break free from the restrictive assumptions of the Gumbel distribution and model complex substitution patterns, all while maintaining a tractable estimation procedure.

In the next post, we will leave the world of exact integrals for the world of sampling and simulation. We will see how to turn our theoretical insights into tractable algorithms using Monte Carlo simulation and Linear Programming.

Reference

Galichon, Alfred. 2026. Discrete Choice Models: Mathematical Methods, Econometrics, and Data Science. Princeton University Press. Chapter 1.

← Previous post
Math of Choice
Next post →

Beyond the logit model: generalized entropy

The role of entropy in discrete choice models, part 2: generalized entropy

January 12, 2026

Part of the blog “Math of Choice”, based on Alfred Galichon’s forthcoming book, Discrete Choice Models: Mathematical Methods, Econometrics, and Data Science, Princeton University Press, April 2026.

In our previous post, we explored how the popular logit model aggregates individual random utility into a clean, macroscopic entropy term—specifically, the famous Gibbs-Shannon entropy. But does this beautiful connection hold if we step outside the specific assumptions of the logit model?

The answer is yes. In this post, we show that every Random Utility Model (RUM) aggregates into a specific form of entropy. Whether the underlying noise is normal (probit), uniform, or something exotic, the society still behaves as if it is maximizing a welfare function regularized by a “Generalized Entropy of Choice”.

The general setting

Let’s move beyond the specific Gumbel distribution used in the logit case. Consider a general framework where the vector of random shocks \(\varepsilon\) is drawn from an arbitrary continuous distribution \(\mathcal{P}\). The individual still maximizes their utility \(U_y + \varepsilon_y\). At the aggregate level, we define the social welfare function \(G(U)\) as the expected maximum utility:

$$ G(U) = \mathbb{E}_{\mathcal{P}} \left[ \max_{y \in [Y]} \{ U_y + \varepsilon_y \} \right]. $$

Regardless of \(\mathcal{P}\), the gradient of this function yields the market shares \(\boldsymbol{\pi}\), a result seen in the previous post known as the Daly-Zachary-Williams theorem.

To see why entropy appears in general models, we must first look at the geometric properties of the social welfare function. A key property of this function is that \(G\) is a convex function of the systematic utilities \(U\). Why? Because the maximum function is convex, and taking an expectation (which is a weighted sum) preserves convexity. This convexity is crucial because, in mathematics, convex functions always have a “dual” representation. Just as the logit model had a dual representation involving Shannon entropy, the general welfare function \(G(U)\) has a dual function \(G^*(\pi)\) defined by the Legendre-Fenchel transform, or convex conjugate:

$$ G^*(\pi) = \max_{U \in \mathbb{R}^Y} \left\{ \sum_{y \in [Y]} \pi_y U_y – G(U) \right\}. $$

This function \(G^*(\pi)\) is what we call the generalized entropy of choice. It is defined on the set of valid market share vectors \(\pi\), that is, vectors with nonnegative entries such that \(\sum_{y \in [Y]} \pi_y \leq 1\). Outside of this set, it takes value \(+\infty\).

The variational principle

The variational principle in convex analysis states that a convex function is characterized by its convex conjugate: a convex function is the convex conjugate of its convex conjugate. This duality allows us to flip the problem around. Instead of defining welfare \(G(U)\) as an expectation of maximums, we can express it as an optimization problem over market shares. This is the aggregate choice problem:

$$ G(U) = \max_{\boldsymbol{\pi}} \left\{ \sum_{y \in [Y]} \boldsymbol{\pi}_y U_y – G^*(\boldsymbol{\pi}) \right\}. $$

or, keeping in mind what the domain of \(G^\ast\) is, we get the equivalent expression:

$$
\begin{aligned}
G(U) = \max_{\pi \in \mathbb{R}^Y} & \left\{ \sum_{y \in [Y]} \pi_y U_y – G^*(\pi) \right\} \\
\text{s.t. } & \pi_y \geq 0, \sum_{y \in [Y]} \pi_y \leq 1.
\end{aligned}
$$

This result (proposition 1.3.1 in the book), reveals the hidden structure of discrete choice. It tells us that the aggregate market behaves as if a single representative agent is maximizing a net utility consisting of two terms:

  • the expected systematic utility \(\sum \pi_y U_y\): the expected utility reward from the systematic utility of the options.
  • the (generalized) entropic regularization \(-G^*(\pi)\): a penalty for concentrating market share too heavily on any single option.

The meaning of generalized entropy

What does this abstract mathematical object represent physically? It turns out to have a striking interpretation. The generalized entropy \(G^*(\boldsymbol{\pi})\) is equal to minus the expected heterogeneity required to rationalize the market shares \(\boldsymbol{\pi}\).

To understand the physical meaning of \(G^\ast\), let us define \(y^\star(\varepsilon)\) as the optimal choice of an agent with utility shock \(\varepsilon\). Let us also define \(\pi^\star\) as the vector of market shares that maximizes the aggregate choice problem, which corresponds to the observed demand \(\boldsymbol{\pi}(U)\). We can write the social welfare \(G(U)\) in two equivalent ways. From the microscopic perspective, it is the expected utility of the optimal choice, \( G(U) = \mathbb{E}[ U_{y^\star(\varepsilon)} + \varepsilon_{y^\star(\varepsilon)} ] \). From the macroscopic perspective, using the variational formula with the optimal \(\pi^\star\), it is \( G(U) = \sum_{y} \pi^\star_y U_y – G^*(\pi^\star) \). Since the average systematic utility \(\sum \pi^\star_y U_y\) matches the expected individual systematic utility \(\mathbb{E}[ U_{y^\star(\varepsilon)} ]\), comparing the two expressions reveals that the entropic penalty must balance the expected random utility:

Claim (Proposition 1.3.1 p. 21). Mathematically, we have:

$$ G^*(\boldsymbol{\pi}) = – \mathbb{E} \left[ \varepsilon_{y^\star(\varepsilon)} \right], $$

where \(y^\star(\varepsilon)\) is the optimal choice for a consumer with shock \(\varepsilon\), conditional on the aggregate choice being \(\boldsymbol{\pi}\). In other words, \(-G^*(\boldsymbol{\pi})\) measures the “cost” of the random noise needed to sustain the observed behavior.

Examples of Generalized Entropy:

  • Logit model: As we saw in part 1, if \(\varepsilon\) is Gumbel-distributed, \(G^*(\pi)\) is the Gibbs-Shannon entropy (up to a sign flip): \(\sum \pi_y \log \pi_y\).
  • Binomial model: In a simple binary choice with cumulative distribution \(F\), the entropy takes the form of an integral of the quantile function: \( G^*(\pi_1) = -\int_{0}^{\pi_1} F^{-1}(1-m) dm \).

Conclusion

The “disorder” of individual choices always aggregates into a coherent structure at the macro level. While the logit model gives us the most familiar version (Gibbs entropy), every discrete choice model has its own unique entropy signature. By understanding this link, we can use powerful tools from convex analysis and optimal transport to analyze demand and welfare.

In the next post, we will discuss the connection between the inversion of discrete choice models and optimal transport in detail, showing how this geometric framework provides powerful computational tools for econometrics.

Reference

[DCM] Galichon, Alfred. 2026. Discrete Choice Models: Mathematical Methods, Econometrics, and Data Science. Princeton University Press. Chapter 1.

← Previous post
Math of Choice
Next post →

code

Code

Alfred Galichon’s github profile.

Alfred Galichon’s dockerhub profile.

Code for the math+econ+code series.
These are the github repositories for the ‘math+econ+code’ masterclass series.

TRaME: Transportation Methods for Econometrics
The TraME website is here, and sources are available on GitHub here.

Vector Quantile Regression
R and Matlab implementation of “Vector Quantile Regression” (Carlier, Chernozhukov and Galichon, Annals of Statistics, 2016).

Programming examples for Optimal Transport Methods in Economics.
These R programs are the programming examples described in Optimal Transport Methods in Economics.

The role of entropy in the logit model

The role of entropy in discrete choice models, part 1: the logit case

January 5, 2026

Part of the blog “Math of Choice”, based on Alfred Galichon’s forthcoming book, Discrete Choice Models: Mathematical Methods, Econometrics, and Data Science, Princeton University Press, April 2026.

Welcome to the first installment of a blog series exploring the rich mathematical landscape of discrete choice analysis. This series aims to go beyond the standard econometric recipes to uncover structural, geometric, and interdisciplinary connections—spanning optimal transport, convex analysis, and information theory—that often remain hidden in traditional textbook treatments. Whether you are an economist, a data scientist, or a mathematician, these posts will offer complementary insights into the mechanics of choice.

In this first blog post, we review the basics of random utility models with a focus on the logit model, and demonstrate the role played by entropy.

Basics of random utility models

In a random utility model, a consumer \(i\) chooses one option \(y\) from a set of alternatives \([Y] = \{1, \dots, Y\}\). The utility that consumer \(i\) derives from option \(y\) is defined as:

$$ U_{iy} = U_y + \varepsilon_{iy},$$

where \(U_y\) is the systematic utility—the part of utility based on observable attributes that is common to all consumers, and \(\varepsilon_{iy}\) is the random utility—an individual-specific random shock, drawn from a known probability distribution \(\mathcal P\). While the econometrician only sees the aggregate outcome (the “macro level”), the individual knows their specific \(\varepsilon_{iy}\) term (the “micro level”) and acts rationally to maximize their own utility.

Welfare. Consumer \(i\) solves the following optimization problem, which defines the indirect utility of consumer \(i\)
$$ u_i = \max_{y \in [Y]} \{ U_y + \varepsilon_{iy} \}. $$
To define the social welfare, we aggregate (or rather, we average) these values across the population. The welfare function \(G(U)\) is defined as the expectation with respect to \( \mathcal{P} \), the distribution of the vector of utility shocks \( (\varepsilon_{y}) \), of the indirect utility:

$$ G(U) = \mathbb{E}_{\mathcal{P}} \left[ \max_{y \in [Y]} \{ U_y + \varepsilon_{y} \} \right].$$

Market shares. The market share map \(\boldsymbol{\pi}(U) \) associates the market shares \(\pi_y\) of each option \(y\) to a vector of systematic utilities \(U\). Under suitable assumptions which ensure that the agent is almost never indifferent between two options, it is defined by:
$$\boldsymbol{\pi}_y(U) = \mathbb{P}_{\mathcal{P}} \left( U_y + \varepsilon_y \geq U_z + \varepsilon_z, \forall z \in [Y] \right).$$
There is an important connection between the welfare function \(G(U)\) and the market shares map \(\boldsymbol{\pi}(U) \). Imagine we increase \(U_y\) for a single \( y\in [Y]\) by a tiny amount \(\delta\). Then:

  • consumers who were already choosing option \(y\) (a proportion \(\pi_y\)) will see their welfare increase by exactly \(\delta\);
  • however, some consumers might switch to option \(y\) from other options because of this increase. But since they were previously almost indifferent between their old choice and \(y\), the net gain in welfare from switching is of a second-order magnitude.

Therefore, to a first-order approximation, the total increase in social welfare is simply the proportion of existing users multiplied by the utility increase: \(\Delta G \approx \pi_y \cdot \delta\). Taking the limit as \(\delta \to 0\) shows that \({\partial G} / {\partial U_y} = \pi_y\), which leads to:

Theorem (Daly-Zachary-Williams, th. 1.2.1 p. 16). Under standard regularity assumptions, the welfare function \(G\) is differentiable with respect to the systematic utilities \(U\), and its gradient is exactly the vector of market shares:

$$ \nabla G(U) = \boldsymbol{\pi}(U). $$

The logit specialization

The logit model arises from a specific assumption about these random shocks: the \(\varepsilon_{iy}\) terms are independent and follow a centered Gumbel distribution. As a reminder, the centered, i.e. zero-mean, Gumbel distribution has the c.d.f. \(F(x) = \exp(-\exp(-(x+\gamma)))\), where \(\gamma \approx 0.577\) is the Euler-Mascheroni constant. In this specific case, the integral defining the welfare function \(G(U)\) has a beautiful closed-form solution known as the “log-sum-exp” function, a.k.a. softmax formula:
$$ G(U) = \log \left( \sum_{y \in [Y]} \exp(U_y) \right) $$
and by the Daly-Zachary-Williams theorem, the gradient of this welfare function gives us the market shares, which is the familiar Gibbs distribution:

$$ {\boldsymbol\pi}_y(U) = \frac{\exp(U_y)}{\sum_{z \in [Y]} \exp(U_z)}. $$

Aggregation of random utility. This brings us to the core insight of this post: how does the random noise \(\varepsilon\) aggregate at the macroscopic (i.e. the econometrician’s) level? In the logit model, i.e., when \( \mathcal{P} \) is the distribution of i.i.d. centered Gumbel variables, we can express the social welfare \(G(U)\) as:

Claim (Proposition 1.3.2 p. 21). We have:
$$ G(U) = \mathbb{E}_{\mathcal{P}} \left[ \max_{y \in [Y]} \{ U_y + \varepsilon_{y} \} \right] = \max_{\pi \geq 0: \sum_y \pi_y =1} \left\{ \sum_{y \in [Y]} \pi_y U_y – \sum_{y \in [Y]} \pi_y \log \pi_y \right\}.$$

This equation will be explained in detailed in the next post, in a more general setting, using convex duality. But we see that it tells a powerful story: the market as a whole acts as if a representative agent is maximizing a trade-off between standard utility and entropy, made of two terms:

  • the first term \( \sum_y \pi_y U_y \) represents order: expected individuals’ systematic utilities;
  • the second term \(-\sum_y \pi_y \log \pi_y\) represents disorder: the Shannon entropy.

In the logit model, the Gumbel-distributed individual shocks at the microscopic level aggregate perfectly into Shannon entropy at the macroscopic level. The “randomness” of the individual has become the “entropy” of the group.

In the next post, we will see how this logic generalizes to random utility models beyond logit, where different assumptions on \(\varepsilon\) yield different forms of “generalized” entropy.

Reference

[DCM] Galichon, Alfred. 2026. Discrete Choice Models: Mathematical Methods, Econometrics, and Data Science. Princeton University Press. Chapter 1.

← Previous post
Math of Choice
Next post →

Applied microeconometrics, Fall 2025

Applied microeconometrics

PhD Course, NYU Economics

Fall 2025

Alfred Galichon

This course will revisit some classical topics in microeconometrics (such as random utility models, dynamic discrete choice, demand estimation, matching models, and bundle choice problems) though the lenses of advanced computational methods (large scale optimization and machine learning). An important part of the course is dedicated to gaining familiarity with computational libraries such as scikit-learn, pytorch, openAI gym, chatGPT, gurobi, and others.

Lectures are delivered under a mix of in-person and online format. The language used is Python. Students not familiar with Python should contact the instructor to be provided a crash course before the start of classes.

Part 1. Random utility models

Content:
Poisson regression and logistic regression as generalized Linear Models, Lasso and Elastic Net, Min-Max Regret. Computation using Scikit-learn and TensorFlow.

Lectures:

  • L1
  • L2
  • L3
  • L4

References:

  • An Introduction to Statistical Learning with applications in Python with by James, Witten, Hastie, Tibshirani and Taylor
  • The Elements of Statistical Learning by Hastie, Tibshirani, and Friedman.
  • Generalized Linear Models by McCullagh and Nelder.

Applications:

Part 2. Dynamic discrete choice models
Content:
Rust, Markov Decision Processes, Multi-armed bandits, Q-Learning. Computation using OpenAI Gym.

Lectures:

  • L5
  • L6
  • L7
  • L8

References:

  • Rust, J. (1987). Optimal Replacement of GMC Bus Engines: An Empirical Model of Harold Zurcher. Econometrica.
  • Dynamic Programming and Optimal Control by Dimitri P. Bertsekas.
  • Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto.

Applications:

Part 3. Characteristics models
Content:
Pure characteristics model, random coefficient logit model, Power diagrams, matching models. Berry, Levinsohn, Pakes. Simulation (Probit, GHK), stochastic GD. Computation using pyopt package, pyBLP, pyTorch.

Lectures:

  • L9
  • L10
  • L11
  • L12

References:

  • Deep Learning by Aaron Courville, Ian Goodfellow, and Yoshua Bengio.
  • Train, K. (2009). Discrete Choice Methods with Simulation.
  • Galichon, A. (2016). Optimal Transport Methods in Economics.

Applications:

* automotive pricing https://www.kaggle.com/code/rkamath1/exploratory-analysis-tests-regression/input

https://pyblp.readthedocs.io/en/stable/_notebooks/tutorial/blp.html

* marriage market: https://github.com/TraME-Project/TraME-Datasets/

book-dcm

Discrete Choice Models

Mathematical Methods, Econometrics, and Data Science

Princeton University Press (to appear Spring 2026)

This text provides an overview of discrete choice models with in-depth coverage of the random utility model framework, logistic regressions, generalized linear models and applications to the gravity equation, empirical models of matching, hedonic models. The theory of multivariate extreme value is reviewed with applications to the nested logit model and other generalizations. The characteristics approach is covered as well as BLP demand estimation, and dynamic discrete choice methods. Equilibrium in models with non-transferable utility are discussed. The book features exercises and problem sets, and it includes a rich mathematical appendix, as well as extensive Python code examples.

The table of content and the book’s preface is available here.

A slide deck will be posted soon.

Sign up here to be updated on news regarding the release of the book and accompanying material.

cargese-2025

Inference on the equilibrium flow problem

Research retreat and mini-workshop held at the IASC Cargese, Corsica, France, April 21-26, 2025

Organizers: Alfred Galichon (New York University and Sciences Po) and Antoine Jacquet (Sciences Po)

Funded by the European Research Council grant ERC-CoG No. 866274 “EQUIPRICE”.

This event combines a research retreat and a mini-workshop both seeking to cross perspectives in the construction of a general framework for estimation of the “equilibrium flow problem”. The “equilibrium flow problem,” extends minimum cost flow problems in order to provide a unified network-based framework to analyze problems such as matching problems, multinomial choice problems, hedonic pricing problems, shortest path problems, dynamic programming problems, international trade flows. The equilibrium flow problem is a far-reaching extension of Optimal Transport, and thus, this project can be seen as an extension of the increasingly popular topic of “Inverse Optimal Transport”.
We have been studying the mathematical properties of the problem, with questions such as existence of equilibrium prices, investigating possible uniqueness, and lattice structure, and special attention paid to the Nontransferable Utility (NTU) limit of the problem, which, in the bipartite case, provides the Gale and Shapley stable marriage problem. An inferential theory is build using maximum likelihood estimation and minimax-regret estimators. Model selection is incorporated to estimation procedures; more precisely, the proximal mapping operator will be inserted in between two iterative phases of these procedures. Several applications are developed, one to the gravity equation in international trade, one to hedonic models, one to matching models.

Investigators and speakers:

  • Jean-David Benamou (INRIA)
  • Guillaume Carlier (Dauphine)
  • Alfred Galichon (NYU and Sciences Po)
  • Pierre Jacob (ESSEC)
  • Antoine Jacquet (Sciences Po)
  • Jean-Bernard Lasserre (Toulouse School of Economics)
  • Guillaume Pouliot (University of Chicago)
  • Maxime Sylvestre (Dauphine)

Talks are accessible to the public (on zoom) but registration is mandatory by emailing antoine.jacquet@sciencespo.fr.

Sunday April 20, 2025

Afternoon
Arrival and welcome reception.

Monday April 21, 2025

Morning
11am – 1230pm: opening session, “leveling the playing field: what are the main results we are after?” (led by Antoine Jacquet)

Afternoon
230pm – 330pm: Antoine Jacquet, “Computation of NTU aggregate equilibria with a finite-time Newton-Jacobi method.”
330pm – 4pm: coffee break.
4pm – 6pm: group work session.

Tuesday April 22, 2025

Morning
930am – 1030am: Guillaume Carlier, “Characterizations of the convex order: old and new.”
1030am – 11am: coffee break.
11am – 1230pm: group work session.

Afternoon
130pm – 330pm: Maxime Sylvestre, “Convergence of a hybrid scheme for the computation of weak inverse optimal transport”.
330pm – 4pm: coffee break.
4pm – 6pm: group work session.

Wednesday April 23, 2025

Morning
930am – 1030am: Jean-Bernard Lasserr.e, “Gaussian mixtures closest to a given measure via optimal transport”.
1030am – 11am: coffee break.
11am – 1230pm: group work session.

Afternoon: 130-330pm: Guillaume Pouliot, “Distributionally Robust Optimal Transport”.
4pm – 6pm: group work session.

Thursday April 24, 2025

Morning
930am – 1030am: Jean-David Benamou, “Stochastic Optimal Transport : A Numerical  Entropic Regularisation Approach.”
1030am – 11am: coffee break.
11am – 1230pm: group work session.

Afternoon
130pm – 230pm: Pierre Jacob, “Optimal transport and related problems in the field of Monte Carlo methods.”
230pm – 330pm: Alfred Galichon, “Some partial results on the inference in ITU matching models using LCP theory.”

330pm – 4pm: coffee break.
4pm – 6pm: group work session.

Friday April 25, 2025

Morning
930am – 1030am: concluding session (led by Maxime Sylvestre): perspectives and open problems.
1030am – 11am: coffee break.
11am – 1230pm: group work session.

Afternoon
230pm – 6pm: group work session.

Saturday April 26, 2025

Morning
Departure

Funded by the EU

Applied microeconometrics, Spring 2025

Applied microeconometrics

PhD Course, NYU Economics

Spring 2025

Alfred Galichon

This course will revisit some classical topics in microeconometrics (such as random utility models, dynamic discrete choice, demand estimation, matching models, and bundle choice problems) though the lenses of advanced computational methods (large scale optimization and machine learning). An important part of the course is dedicated to gaining familiarity with computational libraries such as scikit-learn, pytorch, openAI gym, chatGPT, gurobi, and others.

Lectures are delivered under a mix of in-person and online format. The language used is Python. Students not familiar with Python should contact the instructor to be provided a crash course before the start of classes.

Part 1. Random utility models

Content:
Poisson regression and logistic regression as generalized Linear Models, Lasso and Elastic Net, Min-Max Regret. Computation using Scikit-learn and TensorFlow.

Lectures:

  • L1
  • L2
  • L3
  • L4

References:

  • An Introduction to Statistical Learning with applications in Python with by James, Witten, Hastie, Tibshirani and Taylor
  • The Elements of Statistical Learning by Hastie, Tibshirani, and Friedman.
  • Generalized Linear Models by McCullagh and Nelder.

Applications:

Part 2. Dynamic discrete choice models
Content:
Rust, Markov Decision Processes, Multi-armed bandits, Q-Learning. Computation using OpenAI Gym.

Lectures:

  • L5
  • L6
  • L7
  • L8

References:

  • Rust, J. (1987). Optimal Replacement of GMC Bus Engines: An Empirical Model of Harold Zurcher. Econometrica.
  • Dynamic Programming and Optimal Control by Dimitri P. Bertsekas.
  • Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto.

Applications:

Part 3. Characteristics models
Content:
Pure characteristics model, random coefficient logit model, Power diagrams, matching models. Berry, Levinsohn, Pakes. Simulation (Probit, GHK), stochastic GD. Computation using pyopt package, pyBLP, pyTorch.

Lectures:

  • L9
  • L10
  • L11
  • L12

References:

  • Deep Learning by Aaron Courville, Ian Goodfellow, and Yoshua Bengio.
  • Train, K. (2009). Discrete Choice Methods with Simulation.
  • Galichon, A. (2016). Optimal Transport Methods in Economics.

Applications:

* automotive pricing https://www.kaggle.com/code/rkamath1/exploratory-analysis-tests-regression/input

https://pyblp.readthedocs.io/en/stable/_notebooks/tutorial/blp.html

* marriage market: https://github.com/TraME-Project/TraME-Datasets/

Cemmap-masterclass-2024

Cemmap masterclass, June 3-4, 2024

These lectures will introduce the optimal transport (OT) toolbox, with two applications in econometrics. The first one will pertain to the estimation of matching models. We start by introducing the discrete OT problem and its entropic regularization, and inverse OT, as well as its estimation using generalized linear models. The second application will deal with quantile methods. The one-dimensional OT problem will be discussed as well as its connections with the notions of quantile and rank is then covered. Connection with quantile regression will be discussed and the ‘vector quantile regression’ problem will then be introduced.

Part I Introduction (3h)
S1. Monge-Kantorovich duality (1h30)

S2. Computational optimal transport (1h30)

https://www.math-econ-code.org/optimal-assignment

Part II OT and matching models (3h)
S3. Matching with Transferable Utility and random utility (1h30)

https://www.math-econ-code.org/regularized-optimal-transport

S4. Estimation of matching models (1h30)

https://www.math-econ-code.org/matching-estimation

Part III OT and quantiles (2h)
S5. 1D optimal transport and quantiles (1h)

https://www.math-econ-code.org/one-dimensional-assignment

S5. Connection with quantile regression (1h)

https://www.math-econ-code.org/quantile-regression

Applied microeconometrics, Spring 2024

Applied microeconometrics

PhD Course, NYU Economics

Spring 2024

Alfred Galichon

This course will revisit some classical topics in microeconometrics (such as random utility models, dynamic discrete choice, demand estimation, matching models, and bundle choice problems) though the lenses of machine learning and state-of-the-art optimization methods. An important part of the course is dedicated to gaining familiarity with computational libraries such as scikit-learn, pytorch, openAI gym, chatGPT, gurobi, and others.

Lectures are delivered under a mix of in-person and online format. The language used is Python. Students not familiar with Python should contact the instructor to be provided a crash course before the start of classes.

Part 1. Random utility models meet Machine learning

Content:
Poisson regression and logistic regression as generalized Linear Models, Lasso and Elastic Net, Min-Max Regret. Computation using Scikit-learn and TensorFlow.

Lectures:

  • L1: Tue 1/30, 1145am-145pm (19W4, 802 and zoom)
  • L2: Thu 2/1, 1pm-3pm (19W4, 802 and zoom)
  • L3: Tue 2/6, 1145am-145pm (zoom)
  • L4: Thu 2/15, 1pm-3pm (zoom)

References:

  • An Introduction to Statistical Learning with applications in Python with by James, Witten, Hastie, Tibshirani and Taylor
  • The Elements of Statistical Learning by Hastie, Tibshirani, and Friedman.
  • Generalized Linear Models by McCullagh and Nelder.

Applications:

Part 2. Dynamic discrete choice models meet Reinforcement Learning
Content:
Rust, Markov Decision Processes, Multi-armed bandits, Q-Learning. Computation using OpenAI Gym and Stable Baselines.

Lectures:

  • L5: Thu 2/29, 1pm-3pm (zoom)
  • L6: Thu 3/7, 1pm-3pm (zoom)
  • L7: Wed 3/13, 330pm-530pm (19W4, 802 and zoom)
  • L8: Thu Mar 3/14, 1pm-3pm (19W4, 802 and zoom)

References:

  • Rust, J. (1987). Optimal Replacement of GMC Bus Engines: An Empirical Model of Harold Zurcher. Econometrica.
  • Dynamic Programming and Optimal Control by Dimitri P. Bertsekas.
  • Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto.

Applications:

Part 3. Characteristics models meet Deep Learning and Optimal Transport
Content:
Pure characteristics model, random coefficient logit model, Power diagrams, matching models.
Simulation (Probit, GHK), stochastic GD. Computation using pyopt package, pyBLP, pyTorch.

Lectures:

  • L9: Tue 4/2, 1145am-145pm (zoom)
  • L10: Tue 4/9, 1145am-145pm (zoom)
  • L11: Tue 4/16, 1145am-145pm (19W4, 802 and zoom)
  • L12: Thu 4/18, 1pm-3pm (19W4, 802 and zoom)

References:

  • Deep Learning by Aaron Courville, Ian Goodfellow, and Yoshua Bengio.
  • Train, K. (2009). Discrete Choice Methods with Simulation.
  • Galichon, A. (2016). Optimal Transport Methods in Economics.

Applications:

* automotive pricing https://www.kaggle.com/code/rkamath1/exploratory-analysis-tests-regression/input

https://pyblp.readthedocs.io/en/stable/_notebooks/tutorial/blp.html

* marriage market: https://github.com/TraME-Project/TraME-Datasets/

Part 4. Recent advances on Bundle choice
Content:
Bundle choice, assortment problem, one-to-many matching, gross substitutes, greedy algorithm 

Lectures:

  • L13: Tue 4/23, 1145am-145pm (zoom)
  • L14: Thu 4/25, 1pm-3pm (zoom) 
  • L15: Thu 5/2, 1pm-3pm (zoom)

Application: