### Friday, March 10, 2006

## Re: st: qvf command for count data

On 3/10/06, Hugh Colaco <Hugh.Colaco@business.uconn.edu> wrote: > qvf y x1 x2 x3 x4 (z1 x3 x4), family(nbinomial) robust cluster (A); > ivreg y x1 x2 x3 x4 (z1 = x3 x4), robust cluster (A);

You seem to be misspecifying both -ivreg- and -qvf- calls at a very basic level--which variables are included and excluded instruments? Do you mean z1 to be an excluded instrument for two endogenous variables x3 and x4? If so, your equation is not identified. Note your -ivreg- syntax is regressing y on x1 and x2 and z1 (where z1 is instrumented by x3 and x4) though I don't think it will run exactly as written:

. net from http://www.stata-journal.com/software/sj3-4 . net inst st0049 . clear . set obs 1000 . gen x1 = uniform() . gen x2 = uniform() . gen x3 = uniform() . gen err = invnorm(uniform()) . gen y = 1+2*x1+3*x2+4*x3+err . gen x4 = uniform() . gen t3 = .8*x3 + .6*invnorm(uniform()) . ivreg y x1 x2 x3 x4 (z1 = x3 x4) equation not identified; must have at least as many instruments not in the regression as there are instrumented variables r(481);

. qvf y x1 x2 x3 x4 (x1 x2 x4 t3)

IV Generalized linear models No. of obs = 1000 Optimization : MQL Fisher scoring Residual df = 995 (IRLS EIM) Scale param = 2.137276 Deviance = 2126.589444 (1/df) Deviance = 2.137276 Pearson = 2126.58962 (1/df) Pearson = 2.137276

Variance Function: V(u) = 1 [Gaussian] Link Function : g(u) = u [Identity] Standard Errors : OIM Sandwich

------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- x1 | 1.914558 .106234 18.02 0.000 1.706343 2.122773 x2 | 2.912829 .1086845 26.80 0.000 2.699811 3.125846 x4 | .2775132 .1095646 2.53 0.011 .0627706 .4922558 x3 | 4.106679 .3455157 11.89 0.000 3.429481 4.783877 _cons | .93817 .193988 4.84 0.000 .5579605 1.31838 ------------------------------------------------------------------------------

Try using -ivreg2- instead. It's got good first-stage diagnostics, and the fact that your endogenous variable is a count variable does not imply the standard IV estimator is not consistent--just that you lose a tiny bit of efficiency by disregarding that fact. Note that many of the classic RHS endogenous variables are counts, e.g. educational attainment, and most researchers would use -ivreg2- on these models.

. ssc install ivreg2 . ivreg2 y x1 x2 x4 (x3=z1), ffirst

Summary results for first-stage regressions -------------------------------------------

Shea Variable | Partial R2 | Partial R2 F( 1, 995) P-value x3 | 0.1009 | 0.1009 111.65 0.0000

Underidentification tests: Chi-sq(1) P-value Anderson canon. corr. likelihood ratio stat. 106.35 0.0000 Cragg-Donald N*minEval stat. 112.21 0.0000 Ho: matrix of reduced form coefficients has rank=K-1 (underidentified) Ha: matrix has rank>=K (identified)

Weak identification statistics: Cragg-Donald (N-L)*minEval/L2 F-stat 111.65

Anderson-Rubin test of joint significance of endogenous regressors B1 in main equation, Ho:B1=0 F(1,995)= 67.79 P-val=0.0000 Chi-sq(1)= 68.13 P-val=0.0000

Number of observations N = 1000 Number of regressors K = 5 Number of instruments L = 5 Number of excluded instruments L2 = 1

Instrumental variables (2SLS) regression ----------------------------------------

Number of obs = 1000 F( 4, 995) = 292.31 Prob > F = 0.0000 Total (centered) SS = 3276.986562 Centered R2 = 0.7013 Total (uncentered) SS = 33323.59494 Uncentered R2 = 0.9706 Residual SS = 978.9706817 Root MSE = .9894

------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- x3 | 4.106679 .337572 12.17 0.000 3.44505 4.768308 x1 | 1.914558 .1075383 17.80 0.000 1.703787 2.125329 x2 | 2.912829 .1075683 27.08 0.000 2.701999 3.123658 x4 | .2775132 .1073605 2.58 0.010 .0670905 .4879358 _cons | .93817 .1888342 4.97 0.000 .5680617 1.308278 ------------------------------------------------------------------------------ Anderson canon. corr. LR statistic (identification/IV relevance test): 106.350 Chi-sq(1) P-val = 0.0000 ------------------------------------------------------------------------------ Sargan statistic (overidentification test of all instruments): 0.000 (equation exactly identified) ------------------------------------------------------------------------------ Instrumented: x3 Included instruments: x1 x2 x4 Excluded instruments: z1 ------------------------------------------------------------------------------

* * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

Tag: statalist