Wednesday, March 08, 2006
Re: st: RE: how to choose between geographical identifiers??
I'm not a geographer, but I think this is an interesting question. You could just regress wage on a full set of dummies twice, once for LAD and once for TTWA, and compare the R-squared values, though that is unlikely to convince you or anyone else that one division is more useful than another. I guess I would start by calculating mean and standard deviation of log wage for each LAD and TTWA, and population for each, and then I would make two graphs of the StdDevs against the means with marker size given by population, just to get a sense of what kind of variation in wages the divisions capture. A picture can give you a better sense of the data than numerous tabular results, sometimes.
I think your criterion is really a kind of entropy-minimizing one, since you don't want to have geocode categories to 8 decimal places (one category for each worker produces very little variation within cells, and a lot of categories) or a country identifier (one cell with a lot of variation within cell). So the size of the grid, in terms of population in each LAD/TTWA, is important, not just how homogenous people are within each LAD/TTWA.
I'll be interested in what others with more experience in this area have to say on how they would approach this problem. Nick--how would you measure minimal structure in residuals here?
On 3/8/06, Nick Cox <email@example.com> wrote: > I am a geographer but I don't know much about (what is > usually called human) geography. I regarded it as my main field > of interest between 1968 and 1969, but no longer. There aren't > many geographers on this list, I think. > > However, your question is not really geographical. I guess > from this that you are using lots of dummies in each case > and for once the answer is whichever set of dummies gives > you a better model, according to your criteria of model > excellence (my favourite criterion is usually minimal > structure in residuals). > > In broad terms both LADs and TTWAs are fairly heterogeneous > as both spring from a idea of an area functioning together > rather than formal similarity of anything. So knowing the > area might not help enormously in predicting wage. But > whichever spatial subdivision has a finer mesh should > prove better. > > Nick > firstname.lastname@example.org > > Ada Ma > > > I have a bunch of wage observations and all the observations are > > attached with two geographical identifiers - local authority districts > > (LADs) and travel to work areas (TTWAs). I want to find out how wages > > vary across different areas in UK. > > > > Now I can run wage estimations using either one of the two categorical > > variables as explanatory variable. I would however like to find out > > which categorical variable fits the data better. How do I compare the > > two sets of results given that the explanatory variables are quite > > different? > > > > Could you recommend what kind of tests I should use and if you are a > > geographer, could you tell me are there any criteria that are used by > > geographers to choose between different definitions of geographies > > (regions, as opposed to LADs, as opposed to TTWAs, etc.) > > * > * For searches and help try: > * http://www.stata.com/support/faqs/res/findit.html > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ >
* * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/
Links to this post: