### Tuesday, February 28, 2006

## Re: st: too many duplicates with bsample, weight()?

Matissa Hollister <m73hollis@yahoo.com>

> I've been experimenting a bit with the bootstrap commands and there seems to > be something wrong with the bsample command when the weight option is used. > As I understand it, ...

Matissa has found a problem in -bsample- when used with the -weight()- option and an expression that results in a resample size that is less than the sample size. While

. bsample, weight(w)

is returning the correct frequency weights for a simple random sample with replacement of the _N observations,

. bsample 10, weight(w)

is not when _N >> 10 (for example).

We have fixed the problem, and the updated -bsample- will be available in the next ado-file update.

> On a related note, is there the equivalent of the > weight option for the bootstrap command? A way to > leave the full dataset in memory? I saw the -nodrop- > option but it's not completely clear to me what it > does.

In short, no. The -nodrop- option prevents -bootstrap- from dropping out-of-sample observations specified in the -if- and -in- conditions. This option is mostly useful for something like

program myboot, rclass args y group reg `y' if group == 0 local m0 = _b[_cons] reg `y' if group == 1 return scalar diff = _b[_cons] - `m0' end

. sysuse auto . bootstrap diff=r(diff), nodrop reps(100) : mybook mpg for

Without the -nodrop- option, -bootstrap- would drop all the domestic cars. This is because -bootstrap- assumes that -e(sample)- identifies the within sample observations when -e(sample)- is created as a result of the first call to the prefixed command, and -bootstrap- drops out-of-sample observations by default.

