Friday, February 24, 2006

st: Re: simple way to create missing data that is "missing at random" from a small datset

Thanks Maarten for providing me more detail on your command. I worked with the constant and now have the correct proportion of missingness, although I'm not sure what the implications are of the std dev and the max values of p (.549). Now that I better understand what the command is doing, I will continue to work with the values and look at the outcomes. I really appreciate your help!

. gen p = invlogit( -8 +.1*age )

. sum p

Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- p | 332 .0999268 .113432 .0054863 .549834

. replace bmi = . if uniform() < p (27 real changes made, 27 to missing)

Maarten buis wrote:

>Suzy: > >No problem, but if you find my reply puzzeling than chances are that someone else on statalist >might find it puzzeling too, so I also sent my reply (and your full question underneath) to the >statalist. > >The variable p is the probability of missingness, so the mean of p should be .1 if you want >apporximately 10% missingness. Your mean is .99, so most people will be made missing. -invlogit- >transforms a linear function of "explanatory variables" (in yourcase .1*age) to lie between zero >and one according to 1/(1+exp{-xb}), so the values you plug in (in your case .1 for age and 0 for >the constant) are "logistic regression coefficients". I would play around with values of the >constant so that you get a mean p of about .1 (the more negative the constant the lower the >probability), For instance look at the mean of p if you do -gen p =invlogit(-10 + .1*age)- > >Afterwards I would look if there is enough variation in the values of p. If the value of p is >approximately constant than the influence of age on the probability of missingness is probably not >strong enough to show up in your simulations. If p is approximately constant you should increase >the parameter of age. This might than mess up the mean probability of missingness a bit, so than >it would be good to check if the mean probability of missingness is still close to .1 > >HTH, >Maarten > >--- Suzy <scott_788@wowway.com> wrote: > > > >>Dear Maarten: >> >>Hope you don't mind the direct e-mail. I tried your code based on my >>dataset and what I thought I should do and all of my BMI observations >>went missing rather than say 5-10%. I have obviously done something >>wrong with it. I'm hoping you can help. I would like about 10% of the >>BMI variable to be missing. I want the missingness to be associated with >>older age, but not dependent on the value of BMI - thus hopefully >>satisfying the MAR assumption. >> >>I've included the summary stats of the variables, the code you provided >>(I modified it somewhat) and the result... >>can you see what I did wrong?? >> >>summarize >> >> Variable | Obs Mean Std. Dev. Min Max >>-------------+-------------------------------------------------------- >> sex | 332 .4849398 .5005275 0 1 >> race | 332 .3253012 .4691944 0 1 >> age | 332 52.06024 12.6857 28 82 >> fhdm | 332 .3373494 .4735189 0 1 >> bmi | 332 30.98795 6.18837 18 48 >>-------------+-------------------------------------------------------- >> dmcat | 332 .2771084 .4482461 0 1 >> >>. gen p = invlogit(.1*age) >> >>. sum p >> >> Variable | Obs Mean Std. Dev. Min Max >>-------------+-------------------------------------------------------- >> p | 332 .9894261 .0121324 .9426758 .9997254 >> >> >>. replace bmi = . if uniform() < p >>(332 real changes made, 332 to missing) >> >>. summarize >> >> Variable | Obs Mean Std. Dev. Min Max >>-------------+-------------------------------------------------------- >> sex | 332 .4849398 .5005275 0 1 >> race | 332 .3253012 .4691944 0 1 >> age | 332 52.06024 12.6857 28 82 >> fhdm | 332 .3373494 .4735189 0 1 >> bmi | 0 >>-------------+-------------------------------------------------------- >> dmcat | 332 .2771084 .4482461 0 1 >> p | 332 .9894261 .0121324 .9426758 .9997254 >> >> >> >> >> >> > > >----------------------------------------- >between 1/2/2006 and 31/3/2006 I will be >visiting the UCLA, during this time the >best way to reach me is by email > >Maarten L. Buis >Department of Social Research Methodology >Vrije Universiteit Amsterdam >Boelelaan 1081 >1081 HV Amsterdam >The Netherlands > >visiting adress: >Buitenveldertselaan 3 (Metropolitan), room Z214 > >+31 20 5986715 > >http://home.fsw.vu.nl/m.buis/ >----------------------------------------- > > > >___________________________________________________________ >Win a BlackBerry device from O2 with Yahoo!. Enter now. http://www.yahoo.co.uk/blackberry >* >* For searches and help try: >* http://www.stata.com/support/faqs/res/findit.html >* http://www.stata.com/support/statalist/faq >* http://www.ats.ucla.edu/stat/stata/ > > > > >

* * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/


Tag:


Links to this post:

Create a Link



<< Home

This page is powered by Blogger. Isn't yours?