### Thursday, March 02, 2006

## RE: st: fixed effects with clustering when the number of levels of variable to be absorbed exceeds number of clusters

Daniel,

To be honest, I'm not sure about the answer to your question.

Rather than me hazarding a guess, maybe someone else on the list would like to take a stab at it...?

--Mark

> -----Original Message-----
> From: owner-statalist@hsphsun2.harvard.edu
> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of
> Daniel Simon
> Sent: 01 March 2006 18:20
> To: statalist@hsphsun2.harvard.edu
> Subject: RE: st: fixed effects with clustering when the
> number of levels of variable to be absorbed exceeds number of clusters
>
> thanks a lot, Mark. once again, extremely helpful. one last
> question: If I'm only interested in testing significance of a
> handful of regressors is this less of a concern?
> Thanks again for your thoughtful replies. Daniel
>
> At 04:49 PM 3/1/2006 +0000, you wrote:
> >Daniel,
> >
> >This is a tricky question, at least for me, and I don't know the
> >complete answer.
> >
> >The situation you describe is definitely a problem if you
> want to test
> >lots of parameter restrictions. If you try, say, to test the joint
> >significance of all your regressors, you will fail, because you have
> >more (restrictions on) regressors than clusters. You will probably
> >also see that the F statistic automatically reported by areg
> or xtreg
> >is missing and highlighted in blue, and if you click on it
> you'll get a
> >longish discussion that includes the following:
> >
> >"There is no mechanical problem with your model, but you need to
> >consider carefully whether any of the reported standard errors mean
> >anything. The theory that justifies the standard error
> calculation is
> >asymptotic in the number of clusters, and we have just
> established that
> >you are estimating at least as many parameters as you have clusters.
> >
> >Putting that concern aside, the model test statistic issue
> is that you
> >cannot simultaneously test that all coefficients are zero
> because there
> >is insufficient information. You could test a subset, but
> not all, and
> >so Stata refuses to report the overall model test statistic."
> >
> >The full help message is available as -help j_robustsingular-.
> >
> >However ... there is some ambiguity in the statement above, since it
> >implies that it's *possible* that none of the SEs mean anything. I
> >used to think this was automatically the case if the cluster-robust
> >var-cov matrix is not full rank, but now I'm not sure. It
> may be the
> >case that, for example, you can still get valid tests of one
> or a few
> >coefficients even if you can't test them all jointly. I've been
> >meaning to go searching through the literature to find the
> references
> >on this but haven't had the time....
> >
> >Cheers,
> >Mark
> >
> >
> > > -----Original Message-----
> > > From: owner-statalist@hsphsun2.harvard.edu
> > > [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Daniel
> > > Simon
> > > Sent: 01 March 2006 15:54
> > > To: statalist@hsphsun2.harvard.edu
> > > Subject: RE: st: fixed effects with clustering when the number of
> > > levels of variable to be absorbed exceeds number of clusters
> > >
> > > Mark - thanks, this is very helpful, as usual. Now, I have a
> > > follow-up. If, in addition to the set of fixed effects that I am
> > > absorbing, I have another set of dummies that I am including
> > > manually with i. and there about as many of these i.fixed
> effects as
> > > there are clusters, then this will pose a problem. Is
> that correct?
> > > For example, if in my individual fixed effects model
> where I cluster
> > > on state, I also want to include fixed effects for age (e.g. a
> > > separate dummy for each value of age in years in my
> dataset), and I
> > > have forty different age dummies, then the number of age
> dummies is
> > > close to the number of clusters. In this situation, is
> there some
> > > way to assess whether the estimates of the std errors are
> > > problematic? and, is there some alternative way to proceed?
> > >
> > > Thanks again. Daniel
> > >
> > > At 03:19 PM 3/1/2006 +0000, you wrote:
> > > >Daniel,
> > > >
> > > >What you need to be aware of is that the asymptotics
> justifying the
> > > >cluster-robust estimator requires the number of clusters to
> > > go off to
> > > >infinity. I don't think Austin's comment is quite right, at
> > > least in
> > > >the context you've cited it. The number of fixed effects
> > > can be much
> > > >bigger than the number of clusters, and that won't by
> itself cause
> > > >a problem - after all, the fixed effects are not actually
> > > being estimated.
> > > >What *will* cause problems is if you have very few
> clusters, esp.
> > > >if compared to the number of parameters that you *are*
> estimating.
> > > >In your example, you want to cluster by state. 50 is not very
> > > far on the
> > > >way to infinity, but maybe it's enough for your purposes.
> > > But if you
> > > >also have lots of parameters that you want to test, then you
> > > will start
> > > >running into serious problems (nb: the rank of the
> cluster-robust
> > > >var-cov matrix is equal to the number of clusters minus the
> > > number of
> > > >estimated parameters).
> > > >
> > > >Hope this helps.
> > > >
> > > >--Mark
> > > >
> > > > > -----Original Message-----
> > > > > From: owner-statalist@hsphsun2.harvard.edu
> > > > > [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of
> > > > > Daniel Simon
> > > > > Sent: 01 March 2006 15:02
> > > > > To: statalist@hsphsun2.harvard.edu
> > > > > Subject: Re: st: fixed effects with clustering when
> the number
> > > > > of levels of variable to be absorbed exceeds number
> of clusters
> > > > >
> > > > > Sorry - I made a mistake in the subject line of my last
> > > message. It
> > > > > is now correct. Daniel
> > > > >
> > > > > At 09:59 AM 3/1/2006 -0500, you wrote:
> > > > > >Hi Austin - thanks for pointing out that "the number of
> > > levels of
> > > > > >the
> > > > > >absorb() variable should not exceed the number of clusters."
> > > > > I have two
> > > > > >questions about this: (1) I assume that the same
> holds true for
> > > > > >xtreg,fe with clustering (given that this yields identical
> > > > > std errors
> > > > > >to areg with clustering). Is this assumption correct? (2)
> > > > > Does anyone
> > > > > >have suggestions for the most efficient way to estimate
> > > > > fixed-effects
> > > > > >models with clustering when there are thousands of fixed
> > > > > >effects but clustering occurs on a variable with many fewer
> > > > > >units? For
> > > > > example, if
> > > > > >I have a panel dataset tracking thousands of individuals
> > > > > over time and
> > > > > >I want to examine the impact of a state policy variable,
> > > > > then I would
> > > > > >want to estimate a model with individual fixed effects but I
> > > > > would also want to cluster by state.
> > > > > >What would be a sensible way to proceed in this situation?
> > > > > >
> > > > > >Thanks. Daniel
> > > > > >
> > > > > >At 02:06 PM 2/28/2006 -0500, you wrote:
> > > > > >>Perhaps I should ignore this question in the same way you
> > > > > have ignored
> > > > > >>the advice in the Statalist FAQ on how to write a
> > > > > well-formed question
> > > > > >>(in particular, you give no indication what command you
> > > > > used or what
> > > > > >>error message you got, much less show us the output), but
> > > > > you should
> > > > > >>certainly read:
> > > > > >> -help xtreg- -help xtdata- and -help areg- for
> > > > > starters. Note
> > > > > >>also that you may want to cluster on id, assuming your
> > > > > fixed effects
> > > > > >>are individual id and year effects, to allow for
> arbitrary serial
> > > > > >>correlation within panel, and -cluster- implies -robust-.
> > > > > But see the
> > > > > >>various FAQs on the subject, and such advice as
> appears in the
> > > > > >>relevant help files, e.g.
> > > > > >> Note: Exercise caution when using the cluster()
> > > option with areg.
> > > > > >> The effective number of degrees of freedom for the
> > > > > robust variance
> > > > > >> estimator is (n_g - 1), where n_g is the number of
> > > > > clusters. Thus
> > > > > >> the number of levels of the absorb() variable
> > > > > should not exceed the
> > > > > >> number of clusters.
> > > > > >>
> > > > > >>On 2/28/06, Yasmine Kent <yasmine_kent@yahoo.co.uk> wrote:
> > > > > >> > Hi,
> > > > > >> >
> > > > > >> > Apologies if this is a basic question...
> > > > > >> >
> > > > > >> > I would like to obtain ROBUST standard errors and
> > > > > t-statistics in a
> > > > > >> > panel data regression that I am running (with 2-way
> > > > > fixed effects).
> > > > > >> > The 'robust'
> > > > > >> > command does not appear to work with panel data, it
> > > > > gives an error
> > > > > >> > message. Theoretically, I thought that it should be
> > > > > possible to get
> > > > > >> > these. Is there another command I should use instead? (I
> > > > > am using
> > > > > >> > Stata 8).
> > > > > >> >
> > > > > >> > Thank you!
> > > > > >> > Yasmine

