## Re: st: random pick from a custom distribution

I have generated an empiric frequency distribution, using -contract-, such that I have two variables-- the variable of interest (hour of the day that a patient is discharged from the hospital), and the cumulative frequency of from a set of empiric observations.

How can I generate a function which picks random hours in a frequency proportional to this distribution?

--------------------------------------------------------------------------------

You can test whether a uniform random variate is at least as great as each level of the cumulative relative frequency and increment the hour variable if it is.

I've illustrated it below, first creating a fictional hospital-discharge dataset to use as the starting point. Following your path, it first uses -contract- to get the empirical cumulative percent distribution.

To create a random dataset that follows the distribution, first use -levelsof- to put the cumulative percentage values into a macro. Then, generate a random uniform variate for as large a dataset as you wish (the illustration uses 100 000). To fill in the hour variable, test whether the uniform random variate exceeds the respective cumulative percentage point stored in the macro, and increment the hour variable whenever it does.

Joseph Coveney

clear set memory 50M set more off * Creating fictional ca. 10000-patient * hospital discharge dataset * with peak discharge rate in midafternoon set obs 14 generate byte hour = 7 + _n // Discharges during hour ending at . . . generate count = floor( `=1e4'* (cos(2 * _pi * (_n) / _N - _pi) + 1 ) / _N) graph7 count hour, xlabel(8(1)21) // No discharges after 8:00 p.m. * * BEGIN HERE: * 1. Get empirical cumulative distribution function contract hour [fweight=count], cpercent(cumulative_distribution) float levelsof cumulative_distribution, local(cut_points) * * 2. Create random variate ( drop _all set seed `=date("2006-02-17", "ymd")' set obs `=10e5' generate randu = uniform() * * 3. Wrap-up: populate with randomly drawn hours * that follow empirical distribution function generate byte hour = 8 foreach cut_point of local cut_points { replace hour = hour + (randu > `cut_point' / 100) } graph7 hour, bin(14) xlabel(8(1)20) tabulate hour exit

* * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

Tag: