R package for marginal effects of Heckman sample selection models

  1. Installation
  2. Example
  3. References

The Heckman model is used for modeling sample selected dependent variables. In many contexts, including the original studies by James Heckman (1976, 1979), the researcher is interested in the latent variable, which is observable only under special conditions. Typical cases of such variables are bids and shadow prices. In these cases, the estimates of interest are simply the coefficients of the outcome equation, which can be unbiasedly estimated by using the Heckman model.

In other contexts, the researcher is interested in the manifest variable, which includes zeros for non-selected cases of the latent variable. One important example of this is the analysis of individual expenditure on specific goods and services, such as food, cigarettes, and tourism. For instance, in the study of individual tourism demand, the researcher is frequently interested in the total tourism expenditure during a given period, including zeros for those who did not travel. In these cases, the demand level that maximizes the utility of the traveler is only observed for individuals that travel. In other words, we only know how much people are willing to spend on a trip when they actually take the trip. The optimal demand for those who stay at home is never observed. Besides, the optimal demand is likely to be correlated with the propensity to travel. People willing to spend a lot on a trip are less likely to travel, while less ambitious consumers have a higher tendency to put travel plans into practice.

When the interest lies in the manifest variable including zeros, the estimation of the marginal effects using the Heckman model is tricky. For independent variables included in both the outcome and the selection equations, marginal effects are not simply equal to the estimated coefficients. The effect of a variable on the probability of selection adds a nonlinear effect to the outcome equation. An algorithm for estimating the total marginal effect of an independent variable included in both equations of the model is presented by Saha, Capps & Byrne (1997). This is a solution for dependent variables measured in level (such as units or $). When the dependent is measured in logarithms, the solution is more complex, as shown by Hoffmann & Kassouf (2005).

Until now there was no R command to estimate the marginal effects of the Heckman model. In Stata, there is a command for marginal effects on dependent variables measured in level, but not for variables in logarithms. Fortunately, this was the past. The heckitmfx package offers estimates of marginal effects for the Heckman model with dependent variables both in level and in logarithm.




Consider that the tourism optimal demand level conditional on traveling is a function of individual income, education level, and weather conditions during the trip. The individual decides to travel if the utility of the optimal demand level is higher than the utility of not traveling at all. Travel utility is a function of individual income, education level, and health status. If the individual chooses to travel, the optimal level of demand is observed. Otherwise, the optimal level of demand remains unobserved, that is:

demand=0 if no travel
demand=demand* if travel

where demand* is the optimal demand level for travelers, that is, conditional on actually traveling. Let’s run this model by using the heckitmfx package. First, load the data:


The outcome equation (optimal tourism demand) and the selection equation (travel or not) are:

outcome < — “expenditure ~ income + education + tripweather”
selection < — “participation ~ income + education + health”

Finally, write the model using the heckitmfx_level command:

heckitmfx_level(tourexp, selection, outcome)

Using the sample dataset provided in the package (tourexp), the output of the model is:

Since the weather is included only in the outcome equation, its total marginal effect equals its partial marginal effect (i.e., marginal effect on demand conditional on traveling). Although health is included only in the selection equation, its partial marginal effect on the outcome is not null due to the correlation between the demand and the probability of traveling. Education and income partial effects are positive for both equations. A 1 unit increase in income is associated with an $18.32 increase in the tourism expenditure due to the higher probability of traveling and a $24.46 increase in the optimal tourism demand. As a result, a 1 unit increase in income is associated with a $42.79 increase in total tourism expenditure.

If our latent dependent variable was measured in logarithms, the outcome equation would be:

outcome <- “log(expenditure) ~ income + education + tripweather”

In this case, the output using the embedded dataset of the package is:

A 1 unit increase in income is associated with a 5.76% increase in tourism expenditure due to its influence on the probability of traveling and a 2.46% expenditure increase due to the larger optimal demand of the traveler. As a result, the income semi-elasticity of tourism demand is 8,2%.


Heckman, J. J. (1976). The common structure of statistical models of truncation, sample selection and limited dependent variables and a simple estimator for such models. Annals of Economic and Social Measurement, 5(4), 475–492. https://www.nber.org/chapters/c10491

Heckman, J. J. (1979). Sample selection bias as a specification error. Econometrica, 47(1), 153–161. https://doi.org/10.2307/1912352

Hoffmann, R., & Kassouf, A. L. (2005). Deriving conditional and unconditional marginal effects in log earnings equations estimated by Heckman’s procedure. Applied Economics, 37(11), 1303–1311. https://doi.org/10.1080/00036840500118614

Saha, A., Capps, O., & Byrne, P. J. (1997). Calculating marginal effects in models for zero expenditures in household budgets using a Heckman-type correction. Applied Economics, 29(10), 1311–1316. https://doi.org/10.1080/00036849700000021