BJEST

Options     Examples     References

BJEST estimates the parameters of an ARIMA (AutoRegressive Integrated Moving Average) univariate time series model by the method of conditional or exact maximum likelihood. The technical details of the method used are described in Box and Jenkins, Chapter 7. TSP uses the notation of Box and Jenkins to describe the time series model.

BJEST (CONSTANT, CUMPLOT, EXACTML, NAR=<number of AR parameters>,

               NBACK=<number of back-forecasted residuals>, NDIFF=<degree of differencing>,

               NLAG=<number of autocorrelations>, NMA=<number of MA parameters>,

               NSAR=<number of seasonal AR params>,

               NSDIFF=<degree of seasonal differencing>,

               NSMA=<number of seasonal MA parameters>, NSPAN=<span of seasonal>,

               PLOT, START, nonlinear options)

               [<series name>] [START <parameter name> <parameter value> ......]

               [FIX <parameter name> <parameter value> ......] [ZERO <parameter names> .....]

               [ZFIX <parameter names> .....] ;

Usage

To estimate an ARIMA model, use the BJEST command followed by the options you want in parentheses and then the name of the series and specification of the starting values, if you want to override the default starting values.

The general form of the model which is estimated is the following:

w(t) is the input series after ordinary and seasonal differencing, and a(t) is the underlying "white noise" process. B is the "backshift" or lag operator:

The order of these polynomials is specified by the options NSAR, NAR, NSMA, and NMA respectively. Gamma(B) and Delta(B) are the seasonal AR and MA polynomials and Phi(B) and Theta(B) are the ordinary AR and MA polynomials. All the polynomials are of the form

Note that the coefficients all have minus signs in front of them.

The BJEST procedure uses conditional sum of squares estimation to find the best values of the coefficients of these polynomials, consistent with the unobserved variable at being independently identically distributed. The options specify the exact model which is to be estimated and can also be used to control the iteration process.

As with the other iterative estimation techniques in TSP, it is helpful to specify reasonable starting values for the parameters. Unlike the other nonlinear TSP procedures, the starting values for BJEST can be specified directly in the command, not with a PARAM statement, since the parameters have certain fixed predetermined names in the Box-Jenkins notation.

There are several ways to specify starting values in BJEST. The easiest way is to let TSP "guess" reasonable starting values for the parameters. TSP will choose starting values, for the non-seasonal parameters only, based on the autocorrelations of the time series. You can suppress this feature with NOSTART. If no other information is supplied (see below), TSP will use zero as the starting value for all the parameters. In practice, this generally leads to slower convergence of the parameter estimates. You can also supply starting values in an @START vector.

You may also specify the starting values for any or all of the parameters yourself. For a model with more than one or two parameters, it may be difficult to choose starting values, since the individual parameters often do not possess a simple interpretation. However, you may wish to specify the starting values of at least some of the parameters based, perhaps, on previous estimates. In this case any parameters for which no starting value is supplied will be given the default value (TSP's guess if the START option is on, or zero for NOSTART, or the @START value).

User-supplied starting values are given after the name of the series on the command. Follow the series name with the keyword START and then a series of pairs of parameter names and starting values. The names available are:

AR(lag) or PHI(lag)

an ordinary autoregressive parameter.

MA(lag) or THETA(lag)

an ordinary moving average parameter.

SAR(lag) or GAMMA(lag)

a seasonal autoregressive parameter.

SMA(lag) or DELTA(lag)

a seasonal moving average parameter.

The lag in parentheses should be an actual number giving the position of the coefficient in the lag polynomial, i.e., AR(1) means the coefficient which multiplies the series lagged once (AR(0) is always unity).

Sometimes it is useful to fix a parameter at a certain value while others are being estimated. This may be done to limit the number of parameters being estimated, or to incorporate prior information about a parameter value, or to isolate an estimation problem which is specific to one or a few parameters. The keyword FIX followed by pairs of parameter names and values can be used to achieve this. FIX and START can be used on the same BJEST command to give starting values to some parameters and hold others fixed. The ZERO keyword is used to override the automatic starting values for some parameter(s) with zero(s). The ZFIX keyword fixes some parameter(s) to zero.

BJEST can also be used to check roots of polynomials for stationarity/invertability, without doing any estimation. Just supply the coefficient values in @START, use NAR=p and/or NMA=q, and do not supply a dependent variable. The PRINT option will print the roots, or the @ARSTAT and @MAINV variables can be used.

Output

The output of BJEST begins with a printout of the starting values, followed by an iteration log with one line per iteration giving the value of the objective function and the convergence criterion. If the PRINT option is on, the convert values of the options and the exact time series process being used are printed. In the iteration log, parameter values and changes are printed for each iteration when PRINT is on.

When convergence of the iterative process has been achieved or the maximum number of iterations reached, a message to that effect is printed, and the final results are displayed. These include the conventional statistics on the model: the standard error of at, the R-squared and F-statistic for the hypothesis that all the parameters are zero. The parameter estimates and their standard errors are shown in the usual regression output.

When the PRINT option is on, If the order of any polynomial is greater than one, its roots and moduli are shown so that you can check that they are outside the unit circle (as is required for stationarity). A table of the autocorrelations and Ljung-Box modified Q-statistics of the residuals is printed after this. If the PRINT option is on, the exact model estimated is again printed in lag notation, along with some summary statistics for the residuals. Then comes a printout of the lagged cross correlations between the differenced series wt and the white noise (residual) series at.

If the PLOT option is on, a time series plot of the residuals (at) is printed. If the CUMPLOT option is on, a normalized cumulative periodogram is also plotted (see the CUMPLOT option above).

The following variables are stored (statistics based on differenced variable if differencing is specified):

variable

type

length

description

@LHV

list

1

Name of the dependent variable

@RNMS

list

#vars

Names of right hand side parameters

@SSR

scalar

1

Sum of squared residuals

@S

scalar

1

Standard error of the regression

@S2

scalar

1

Standard error squared

@YMEAN

scalar

1

Mean of the dependent variable

@SDEV

scalar

1

Standard deviation of the dependent variable

@DW

scalar

1

Durbin-Watson statistic

@RSQ

scalar

1

R-squared

@ARSQ

scalar

1

Adjusted R-squared

@IFCONV

scalar

1

=1 if convergence achieved, =0 otherwise

@LOGL

scalar

1

Log of likelihood function

@NCOEF

scalar

1

Number of coefficients

@NCID

scalar

1

Number of identified coefficients

@COEF

vector

#coefs

Coefficient estimates

@SES

vector

#coefs

Standard errors of coefficient estimates

@T

vector

#coefs

T-statistics on coefficients

@GRAD

vector

#coefs

Values of the gradient at convergence

@VCOV

matrix

#coefs* #coefs

Variance-covariance of estimated coefficients.

@QSTAT

vector

NLAG

Ljung-Box modified Q-statistics

%QSTAT

vector

NLAG

P-values for Q-statistics

@ARSTAT

scalar

1

1 if AR polynomial is stationary

@MAINV

scalar

1

1 if MA polynomial is invertible

@FIT

series

#obs

Fitted values of the dependent variable

@RES

series

#obs

Residuals=actual - fitted values of the dependent variable

@ARRTRE

vector

NAR

Real parts of the AR roots

@ARRTIM

vector

NAR

Imaginary parts of the AR roots

@ARRTMO

vector

NAR

Moduli of the AR roots

@MARTRE

vector

NMA

Real parts of the MA roots

@MARTIM

vector

NMA

Imaginary parts of the MA roots

@MARTMO

vector

NMA

Moduli of the MA roots

@SARRTRE

vector

NSAR

Real parts of the seasonal AR roots

@SARRTIM

vector

NSAR

Imaginary parts of the seasonal AR roots

@SARRTMO

vector

NSAR

Moduli of the seasonal AR roots

@SMARTRE

vector

NSMA

Real parts of the seasonal roots

@SMARTIM

vector

NSMA

Imaginary parts of the seasonal MA roots

@SMARTMO

vector

NSMA

Moduli of the seasonal MA roots

Method

The method used by BJEST (for the default method NOEXACTML) to estimate the parameters is essentially the one described by Box and Jenkins in their book. It uses a conventional nonlinear least squares algorithm with numerical derivatives. The major difference between the estimation of time series models and estimation in the traditional (nonlinear least squares) way relates to the use of "back-forecasted" residuals. The likelihood function for the ARIMA model depends on the infinite past sequence of residuals. If we estimate the time series model by simply setting the values of these past residuals to zero, their unconditional expectation, we might seriously misestimate the parameters if the initial disturbance, a0, happens to be very different from zero.

The solution to this problem, suggested by Box and Jenkins, is to invert the representation of the time series process, i.e., write the same process as if the future outcomes were determining the past. Thus it describes the relationships which the time series will, ex post, exhibit. This representation of the backward process constructs back-forecasts of the disturbance series, the at series and then uses these calculated residuals in the likelihood function. By using a reasonable number of these backcasted residuals, the problems introduced by an unusually high positive or negative value for the first disturbance in the time series can be eliminated.

If the process is a pure moving average process, this backcast becomes zero after a fixed number of time periods; consequently, you can set NBACK to a fairly small number in this case.

When the EXACTML option is used, no backcasting is done; the AS 197 algorithm (Melard 1984) is used.

Options

Note that for all the Box-Jenkins procedures (BJIDENT, BJEST, and BJFRCST), TSP remembers the options from the previous Box-Jenkins command (except for nonlinear options), so that you only need to specify the ones you want to change.

CONSTANT/NOCONS specifies whether a constant term is to be included in the model.

CUMPLOT/NOCUMPLO specifies whether a cumulative periodogram of the residuals is to be plotted. The number of computations required for this plot goes up with the square of the number of observations, so that it may be better to forego this option if the number of observations is large.

EXACTML/NOEXACT specifies exact (versus conditional) maximum likelihood estimation. EXACTML is recommended for models with a unit root in the MA polynomial. This option is not yet fully integrated with all the other options; it does not support NSAR>0 or NSMA>0, or automatic computation of starting values (use @START). It also does not (yet) impose invertability on the MA polynomial for NMA>1 .

NAR= the number of autoregressive parameters in the model. The default is zero.

NBACK= the number of back-forecasted residuals to be calculated. The default is 100.

NDIFF= the degree of differencing to be applied to the series. The default is zero.

NLAG= the number of autocorrelations/Q-statistics to calculate. The default is 20.

NMA= the number of moving average parameters to be estimated. The default is zero.

NSAR= the number of seasonal autoregressive parameters to be estimated. The default is zero.

NSDIFF= the degree of seasonal differencing to be applied to the series. The default is zero (no differencing).

NSMA= the number of seasonal moving average parameters to be estimated. The default is zero.

NSPAN= the span (number of periods) of the seasonal cycle, i.e., for quarterly data, NSPAN should be 4. The default is the current frequency (that is, 1 for annual, 4 for quarterly, 12 for monthly).

PLOT/NOPLOT specifies whether the residuals are to be plotted.

START/NOSTART specifies whether the procedure should supply its own starting values for the parameters.

Nonlinear options control iteration and printing. They are explained in the NONLINEAR entry.

Examples

This example estimates a simple ARMA(1,1) model with no seasonal component; no plots are produced of the results.

BJEST (NAR=1,NMA=1,NOPLOT,NOCUMPLO) AR9MA5 ;

This example uses the Nelson (1973) auto sales data; a logarithmic transformation of the series is made before estimation.

GENR LOGAUTO = LOG(AUTOSALE) ;

BJEST (NDIF=1,NSDIFF=1,NMA=2,NSMA=1,NSPAN=12, NBACK=15) LOGAUTO

START THETA(1) 0.12 THETA(2) 0.20 DELTA(1) 0.82 ;

Note that NBACK is specified as 15 since the backcasted residuals fall to exactly zero after NSPAN+NSMA+NMA periods in a pure moving average model.

The next example estimates a third order autoregressive process with one parameter fixed:

BJEST GNP START PHI(2) -0.5 PHI(3) 0.1 FIX PHI(1) 0.9 ;

The model being estimated is

GNP(t) - 0.9*GNP(t-1) - tsp90006.gif2*GNP(t-2) - tsp90006.gif3*GNP(t-3) = a(t)

Exact ML estimation:

MMAKE @START .1 .1 .1;

BJEST(NAR=2,NMA=1,NDIFF=1,EXACTML) Y;

References

Box, George P. and Gwilym M. Jenkins, Times Series Analysis: Forecasting and Control, Holden-Day, New York, 1976.

Ljung, G.M., and Box, George P., "On a measure of lack of fit in times series models," Biometrika 66, 1978, pp. 297-303.

Mélard, G., "Algorithm AS 197: A Fast Algorithm for the Exact Likelihood of Autoregressive-moving Average Models," Applied Statistics, 1984, p.104-109. (code available on StatLib)

Nelson, Charles, Applied Times Series Analysis for Managerial Forecasting, Holden-Day, New York, 1973.

Pindyck, Robert S. and Daniel L. Rubinfeld, Econometric Models and Economic Forecasts, McGraw-Hill Book Co., New York, 1976, Chapter 15.

Statlib, http://lib.stat.cmu.edu/apstat/