### Friday, February 24, 2006

## st: Re: simple way to create missing data that is "missing at random" from a small datset

Suzy:

No problem, but if you find my reply puzzeling than chances are that someone else on statalist might find it puzzeling too, so I also sent my reply (and your full question underneath) to the statalist.

The variable p is the probability of missingness, so the mean of p should be .1 if you want apporximately 10% missingness. Your mean is .99, so most people will be made missing. -invlogit- transforms a linear function of "explanatory variables" (in yourcase .1*age) to lie between zero and one according to 1/(1+exp{-xb}), so the values you plug in (in your case .1 for age and 0 for the constant) are "logistic regression coefficients". I would play around with values of the constant so that you get a mean p of about .1 (the more negative the constant the lower the probability), For instance look at the mean of p if you do -gen p =invlogit(-10 + .1*age)-

Afterwards I would look if there is enough variation in the values of p. If the value of p is approximately constant than the influence of age on the probability of missingness is probably not strong enough to show up in your simulations. If p is approximately constant you should increase the parameter of age. This might than mess up the mean probability of missingness a bit, so than it would be good to check if the mean probability of missingness is still close to .1

HTH, Maarten

--- Suzy <scott_788@wowway.com> wrote:

> Dear Maarten: > > Hope you don't mind the direct e-mail. I tried your code based on my > dataset and what I thought I should do and all of my BMI observations > went missing rather than say 5-10%. I have obviously done something > wrong with it. I'm hoping you can help. I would like about 10% of the > BMI variable to be missing. I want the missingness to be associated with > older age, but not dependent on the value of BMI - thus hopefully > satisfying the MAR assumption. > > I've included the summary stats of the variables, the code you provided > (I modified it somewhat) and the result... > can you see what I did wrong?? > > summarize > > Variable | Obs Mean Std. Dev. Min Max > -------------+-------------------------------------------------------- > sex | 332 .4849398 .5005275 0 1 > race | 332 .3253012 .4691944 0 1 > age | 332 52.06024 12.6857 28 82 > fhdm | 332 .3373494 .4735189 0 1 > bmi | 332 30.98795 6.18837 18 48 > -------------+-------------------------------------------------------- > dmcat | 332 .2771084 .4482461 0 1 > > . gen p = invlogit(.1*age) > > . sum p > > Variable | Obs Mean Std. Dev. Min Max > -------------+-------------------------------------------------------- > p | 332 .9894261 .0121324 .9426758 .9997254 > > > . replace bmi = . if uniform() < p > (332 real changes made, 332 to missing) > > . summarize > > Variable | Obs Mean Std. Dev. Min Max > -------------+-------------------------------------------------------- > sex | 332 .4849398 .5005275 0 1 > race | 332 .3253012 .4691944 0 1 > age | 332 52.06024 12.6857 28 82 > fhdm | 332 .3373494 .4735189 0 1 > bmi | 0 > -------------+-------------------------------------------------------- > dmcat | 332 .2771084 .4482461 0 1 > p | 332 .9894261 .0121324 .9426758 .9997254 > > > >

----------------------------------------- between 1/2/2006 and 31/3/2006 I will be visiting the UCLA, during this time the best way to reach me is by email

Maarten L. Buis Department of Social Research Methodology Vrije Universiteit Amsterdam Boelelaan 1081 1081 HV Amsterdam The Netherlands

visiting adress: Buitenveldertselaan 3 (Metropolitan), room Z214

+31 20 5986715

http://home.fsw.vu.nl/m.buis/ -----------------------------------------

___________________________________________________________ Win a BlackBerry device from O2 with Yahoo!. Enter now. http://www.yahoo.co.uk/blackberry * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

Tag: statalist