Wednesday, January 04, 2006

st: Spell or episode data with overlaps in case control studies

As part of my dissertation in epidemiology, I have case-control study occupational data that have left me baffled. I have work histories for cases and controls, with one record per respondent, thus:

id jobcode1 industrycode1 firstyear1 lastyear1 jobcode2 industrycode2 firstyear2 lastyear2 . . .

so the record for a respondent who worked as a bank teller from 1979 to 1985 and then as a bank manager from 1985 to 1990 would look like this

id jobcode1 indcode1 firstyr1 lastyr1 jobcode2 indcode2 firstyr2 lastyr2 . . . 31040-70 383 702 1979 1985 019 702 1985 1990 . . .

I need to figure out duration within job class as a predictor for case/control status. "forvalues" has come in handy to make job classes out of job and industry codes, for example:

*engineers, architects, draughtsmen forvalues i = 1/16{ gen tempjobnum`i'=jobnum`i' replace tempjobnum`i'=. if (jobnum`i'==185 & indnum`i'==681) } egen engarchdra=eqany(tempjobnum*),v(043 044 046 049 053 055 to 057 059 127 173 185 217) drop tempjobnum*

But I can't figure out how to get at duration in a certain job class, especially since jobs may overlap within (or outside) class or even job code. For the example above, if the respondent had worked her main job as a bank teller for Bank of Mystery from 1983 to 1989, but had worked part-time as a bank teller at First Bank of Other from 1983 to 1987, she would actually have worked in that job class (and job code) for ten years, but summing her years would overestimate her time worked as 10 years. I have the idea that creating a flag variable for such overlap using the same kind of indexed "forvalues" syntax is the way to go, but I'm not at all sure how to go about using the flag, especially since the job class is conditioned on the industry class.

I tried thinking about it from the perspective described in the FAQ "How do I convert my spell-type data into a survival dataset?" but of course there's no failure event in case-control data. Please forgive me if I'm missing an obvious source of this information - I've looked through the manuals and archives, but I may be looking the wrong way, in which case a redirect would be much appreciated. Oh, and I'm still using STATA8 because changing versions mid-analysis is kind of scary, but I'm open to it if my answers lie there. Thanks very much, Jennifer Marino University of Washington & Fred Hutchinson Cancer Research Center

