PROBIT

Options Example References

PROBIT obtains estimates of the linear probit model, where the dependent variable takes on only two values. Options allow you to obtain and save the inverse Mills ratio as a series so that the sample selection correction due to Heckman can be estimated (also see the SAMPSEL command).

PROBIT (FEI,FEPRINT,MILLS=<name for output inverse Mills ratio>,

NHERMITE=<number of points for hermite quadrature>,REI,nonlinear options)

<dependent variable> <list of independent variables> ;

Usage

The basic PROBIT statement is like the OLSQ statement: first list the dependent variable and then the independent variables. If you wish to have an intercept term in the regression (usually recommended), include the special variable C or CONSTANT in your list of independent variables. You may have as many independent variables as you like subject to the overall limits on the number of arguments per statement and the amount of working space, as well as the number of data observations you have available.

The observations over which the regression is computed are determined by the current sample. If any of the observations have missing values within the current sample, PROBIT will print a warning message and will drop those observations. PROBIT also checks for complete or quasi-complete sample separation by one of the right hand side variables; such models are not identified.

The list of independent variables on the PROBIT command may include variables with explicit lags and leads as well as PDL (Polynomial Distributed Lag) variables. These distributed lag variables are a way to reduce the number of free coefficients when entering a large number of lagged variables in a regression by imposing smoothness on the coefficients. See the PDL section for a description of how to specify such variables.

The dependent variable need not be a strictly zero/one variable. Positive values are treated as one and zero or negative values are treated as zero.

The FEI and REI options compute estimates for models with fixed and random effects for individuals respectively. FREQ (PANEL) must be in effect. For fixed effects, a very efficient algorithm is used, so large unbalanced panels can easily be handled. The FEPRINT option prints a table of the effects, their standard errors, and t-statistics. Individuals that have dependent variable values that are all zero or all one are allowed, although their data is not informative for the slopes. The fixed effects for such individuals will be either a very large negative number (in the case of zero) or a very large positive number (in the case of one). These values yield the correct probability for these observations (zero or one). Note that this estimator has a finite-T bias, so the number of time periods per individual should not be too small. The random effects model is estimated by maximum likelihood; see the method section below for details.

Output

The output of PROBIT begins with an equation title and the name of the dependent variable. Starting values and diagnostic output from the iterations will be printed. Final convergence status is printed.

This is followed by the mean of the dependent variable, number of positive observations, sum of squared residuals, R-squared, and a table of right hand side variable names, estimated coefficients, standard errors and associated t-statistics.

PROBIT also stores some of these results in data storage for later use. The table below lists the results available after a PROBIT command.

variable	type	length	description
@LOGL	scalar	1	Log of likelihood function
@IFCONV	scalar	1	Convergence status (1 = success)
@NOB	scalar	1	Number of observations
@NPOS	scalar	1	Number of positive observations
@SRSQ	scalar	1	Scaled R-squared for binary probit
@RSQ	scalar	1	Squared correlation between Y and @FIT
@SSR	scalar	1	Sum of squared residuals
@RNMS	list	#params	List of parameter names
@GRAD	vector	#params	Gradient of likelihood function at maximum
@COEF	vector	#params	Estimated values of parameters
@SES	vector	#params	Standard errors of estimated parameters
@T	vector	#params	T-statistics
%T	vector	#params	p-values for T-statistics
@VCOV	vector	#par*#par	Estimated variance-covariance of estimated parameters
@DPDX	matrix	#vars* 2	Matrix of mean probability derivatives for the two values of the dependent variable
@MILLS	series	#obs	Inverse Mills ratios
@FIT	series	#obs	Fitted probabilities
@NCOEFAI	scalar	1	Number of fixed effects
@NCIDAI	scalar	1	Number of identified fixed effects
@AI	series	#obs	estimated fixed effects stored as a series (for FEI)
@COEFAI	vector	#individuals	estimated fixed effects (for FEI)
@SESAI	vector	#individuals	standard errors for fixed effects (for FEI)
@TAI	vector	#individuals	T-statistics for fixed effects (for FEI)
%TAI	vector	#individuals	p-values corresponding to T-statistics for fixed effects (for FEI)

If the regression includes a PDL variable, the following will also be stored:

@SLAG	scalar	1	Sum of the lag coefficients
@MLAG	scalar	1	Mean lag coefficient (number of time periods)
@LAGF	vector	#lags	Estimated lag coefficients, after "unscrambling"

Method

PROBIT uses analytic first and second derivatives to obtain maximum likelihood estimates via the Newton-Raphson algorithm. This algorithm usually converges fairly quickly. TSP uses zeros for starting parameter values, unless @START is used to override this (see the NONLINEAR entry). As in other regression procedures in TSP, estimation is done using a generalized inverse in the case of multicollinearity of the independent variables.

The numerical implementation involves evaluating the normal density and cumulative normal distribution functions. The cumulative normal distribution function is computed from an asymptotic expansion, since it has no closed form. See the reference under the CDF command for the actual method used to evaluate CNORM(). The ratio of the density to the distribution function is also known as the inverse Mills ratio. This is used in the derivatives and with the MILLS= option.

@MILLS is actually the expectation of the structural residual, where the model is given by

@MILLS is the value of the following two expressions, depending on whether D=0 or 1:

where NORM is the normal density, CNORM is the cumulative normal and DLCNORM is the derivative of the log cumulative normal with respect to its argument. Before estimation, PROBIT checks for univariate complete and quasi-complete separation of the data and flags this condition. The model is not identified in this case, because one or more of the independent variables perfectly predict the dependent variable for some of the observations, and therefore their coefficients would slowly iterate to plus or minus infinity if estimation was allowed to proceed.

The scaled R-squared is a measure of goodness of fit relative to a model with just a constant term; it replaced the Kullback-Leibler R-squared beginning with TSP 4.5 since it has somewhat better properties for discrete dependent variable problems. See the Estrella (1998) article.

The Probit random effects model estimated is the following:

This normalization means that the slope estimates are normalized the same way as the results from the usual Probit command. The parameter RHO is estimated and corresponds to the share of the variance that is within individual. The likelihood function involves computing a multivariate integral and this is done with Hermite quadrature, using a default 20 points; when RHO is high, it may be necessary to increase this using the NHERMITE option.

Options

FEI/NOFEI specifies that the fixed effects Probit model should be computed. FREQ (PANEL) must be in effect.

FEPRINT/NOFEPRIN specifies whether the estimated effects and their standard errors should be printed.

MILLS= the name of a series used to store the inverse Mills ratio series evaluated at the estimated parameters. The default is @MILLS.

NHERMITE= number of points for the Hermite quadrature in computing the integral for the random effects Probit model. The default is 20. The value set is retained throughout the TSP run.

REI/NOREI specifies that the random effects Probit model should be computed. FREQ (PANEL) must be in effect.

The usual nonlinear estimation options can be used. See the NONLINEAR entry.

Examples

Standard probit model:

PROBIT MOVE C WAGE1 WAGE2 COST1 COST2;

Heckman sample selection model (see the SAMPSEL command for ML estimation of this model):

PROBIT (MILLS=RMILL) WORK C OCC1 OCC2 TENURE MSTAT AGE;

SELECT WORK;

OLSQ LWAGE C SCHOOL EXPER IQ UNION OCC1 OCC2 RMILL;

Computing fitted probabilities and inverse Mills ratios explicitly:

PROBIT MOVE C WAGE1 WAGE2 COST1 COST2;

FORCST XB;

MOVEP = CNORM(XB);

MILLSR = MOVE * DLCNORM(XB) + (1-MOVE) * (-DLCNORM(-XB));

References

Amemiya, Takeshi, "Qualitative Response Models: A Survey," Journal of Economic Literature 19, December 1981, pp. 1483-1536.

Cameron, A. Colin, and Frank A. G. Windmeijer, An R-squared Measure of Goodness of Fit for Some Common Nonlinear Regression Models, Journal of Econometrics 77 (1997), pp.329-342.

Estrella, Arturo, A New Measure of Fit for Equations with Dichotomous Dependent Variables, Journal of Business and Economic Statistics, April 1998, pp. 198-205.

Maddala, G. S., Limited-dependent and Qualitative Variables in Econometrics, Cambridge University Press, New York, 1983, pp. 22-27, 221-223, 231-234, 257-259, 365.