Thursday, March 09, 2006
Re: st: memory required for -merge-
Danielle H Ferry wrote:
>Is there a rule for the amount of memory required for -merge-? I keep >getting a "no room to add more variables due to width" error on the >same merge. I am attempting to -merge- on only one variable >(placefip), and 67% of memory is free before I attempt the -merge- >(i.e., while the "master" dataset is loaded). The "using dataset is >small, and I've got the memory set quite high (900m). So, I can't >understand why I am running into memory problems. >[...]
When you do a match-merge, Stata must prepare for the maximal possible resulting dataset, which would be the result of a total mismatch between the two existing datasets. That result would be a dataset with n1 + n2 observations, where n1 = number of obs in the master set, and n2 = number of obs in the using set. The width of this set, which is independent of the degree of matching, is (width of master) + (width of using) - (width of common variables). So you can get a very large potential set, even if the actual set is much smaller (due to matches). (The actual number of observations formed is n1 + n1 - (number of matched observations) ).
If it happens that you don't want any unmatched observations from the using set, then use the -nokeep- option. This discards unused observations from the using set before they get a chance to be joined on to he master. And Stata adjusts its expected maximal size accordingly; that would be just n1 observations (though, of course, the width will usually increase.) In other words... merge varlist using usingset drop if _merge==2 and merge varlist using usingset, nokeep
are equivalent in the end result, but that the latter demands less memory. The former grows a big set and then trims it; the latter grows a leaner set to begin with.
* * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/
Links to this post: