Multivariate and Propensity Score Matching Software for Causal Inference for Stata

Jasjeet S. Sekhon
Erin Hartman

This website is for the distribution of "Matching" which is a Stata package for estimating causal effects by multivariate and propensity score matching. The package provides functions for multivariate and propensity score matching and for finding optimal balance based on a genetic search algorithm Also see Genetic Optimization Using Derivatives: Theory and Application to Nonlinear Models for more information on the genetic matching algorithm. A variety of univariate and multivariate tests to determine if balance has been obtained are also provided. These tests can also be used to determine if an experiment or quasi-experiment is balanced on baseline covariates.

For an introduction to the package with documentation and examples, please see genmatch The following two papers provides examples where the R version of genmatch is able to recover experimental benchmarks using observational data: "Genetic Matching for Estimating Causal Effects: A General Multivariate Matching Method for Achieving Balance in Observational Studies" and "A New Non-Parametric Matching Method for Bias Adjustment with Applications to Economic Evaluations".

To install:

MAKE SURE THAT YOU HAVE R INSTALLED ON YOUR COMPUTER. R can be downloaded at: http://www.r-project.org/

Intel Based Mac OS X binary package:
Download: Matching_Stata_v0.1.zip
Unzip the Matching_Stata_v0.1.zip file. This will create a folder called Matching_Stata_v0.1 in the local directory. Copy all of the contents of this folder (the genmatch folder, genmatch.ado, genmatch.hlp and genmatchCleanup.class) into your personal ado file (which can be found by typing personal in the Stata command line) or other ado location. This will install the genmatch.ado, genmatch.hlp and other necessary supporting files in your personal ado file.

Alternative Intel Based Mac OS X installation:
Download: Matching_Stata_v0.1.zip
Copy Matching_Stata_v0.1.zip into your personal ado file (which can be found by typing personal in the Stata command line) or other ado location. Unzip the Matching_Stata_v0.1.zip file by typing unzip Matching_Stata_v0.1.zip in Terminal's command line (make sure you are in the local directory where you placed the zipped file). This will install the genmatch.ado, genmatch.hlp and other necessary supporting files in your personal ado file.

At this time the code is only developed for Intel-based Mac OS X. A Windows version is currently being developed. Please check back for updates.


The package includes the following main user exposed functions:
genmatch: finds optimal balance using multivariate matching where a genetic search algorithm determines the weight each covariate is given. The user can choose which function of covariate balance to optimize from a list or provide one of her own.

COMING SOON: match: performs multivariate and propensity score matching.

lalonde.dta is the sample data set used in the genmatch examples. It can also be obtained by typing:
. use http://ekhartman.berkeley.edu/stata/matching/lalonde

The package is under active development so please check back for updates. Please cite the software as follows:

Erin Hartman and Jasjeet S. Sekhon. "Matching: Stata version of Multivariate and Propensity Score Matching Software for Causal Inference." URL: http://ekhartman.berkeley.edu/stata/matching.html

and

Sekhon, Jasjeet S. Forthcoming. "Multivariate and Propensity Score Matching Software with Automated Balance Optimization: The Matching package for R." Journal of Statistical Software.


The following paper describes genmatch in detail and discusses its theoretical properties: "Genetic Matching for Estimating Causal Effects: A General Multivariate Matching Method for Achieving Balance in Observational Studies." Monte Carlo experiments are presented in the paper, using the R version of genmatch which illustrate GenMatch's properties, and real data examples are provided where GenMatch recovers the experimental bench. Also see the paper entitled "A New Non-Parametric Matching Method for Bias Adjustment with Applications to Economic Evaluations," where genmatch is used to recover another experimental benchmark.

Also see "Alternative Balance Metrics for Bias Reduction in Matching Methods for Causal Inference" paper which critically reviews various ways to measure balance. Cumulative probability distribution functions of standardized statistics are advocated as balance metrics. Formal hypothesis tests of balance should not be conducted as is common in the matching literature because no measure of balance is a monotonic function of bias and because balance should be optimized without limit. However, descriptive measures of discrepancy ignore information related to bias which is captured by probability distribution functions of standardized statistics.

The R version of the Matching software was used to produce the following working paper: The Varying Role of Voter Information Across Democratic Societies. The robust propensity score methods discussed in the paper will be included in a future version. The core matching estimator which is implemented is that of Alberto Abadie and Guido Imbens. This algorithm provides principled standard errors when matching is done with covariates or a known propensity score. Ties are handled in a deterministic and coherent fashion. For details see Large Sample Properties of Matching Estimators for Average Treatment.

For some Stata issues, see this WEBPAGE.

Significant performance enhancements were provided by Nate Begeman (Mac OS X Performance Group at Apple). And "Matching" relies on a modified version of the Scythe Statistical Library developed by Andrew Martin, Kevin Quinn and Daniel Pemstein. A modified version of the library is included in the "Matching" package.

For more details on matching and causal inference see Jonathan Wand's Reading List.

Return to Jasjeet Sekhon's Homepage
Return to Erin Hartman's Homepage