Monday, February 27, 2006

st: too many duplicates with bsample, weight()?

I've been experimenting a bit with the bootstrap commands and there seems to be something wrong with the bsample command when the weight option is used. As I understand it, the weight option is supposed draw a sample with replacement but it does not delete the non-selected observations from memory. Instead, it creates a frequency variable that indicates which observations have been sampled and the number of times they have been sampled (since the sampling is by replacement, an observation can be included in the sample more than once). I was testing this option, though, and it seems create duplicate observations in the sample way too often. I tested repeatedly taking a sample of 10 from 3000 observations and never got a sample consisting of all unique values! The bsample command without the weight option seems to work much better. I'm pasting an example below.

On a related note, is there the equivalent of the weight option for the bootstrap command? A way to leave the full dataset in memory? I saw the -nodrop- option but it's not completely clear to me what it does.

Here's the example. Basically it takes several bootstrap samples of 10 observations out of 1000 and counts the number of non-duplicated observations in each sample.

. clear

. set seed 12345

. forvalues i=1/10 { 2. quietly set obs 1000 3. gen id=_n 4. bsample 10 5. gen n=1 6. collapse (count) n, by(id) 7. count if n==1 8. clear 9. } 10 10 10 10 10 8 10 10 10 10

. . clear

. quietly set obs 1000

. gen id=_n

. quietly gen freq=.

. forvalues i=1/10 { 2. bsample 10, weight(freq) 3. count if freq==1 4. } 3 4 4 4 2 4 4 4 1 2

As you can see, the weight option produces samples that have many more duplicates. Any idea what's going on? I'm running Stata 9, but I just ran the same do-file on Stata 8 with the same results.

Matissa

__________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/


Tag:


Links to this post:

Create a Link



<< Home

This page is powered by Blogger. Isn't yours?