Tuesday, February 28, 2006

Re: RE: st: behavior of -areg-

My response to the statalist requires a rebuttal on my part. The promotion from single to double precision is done in binary so my demonstration is misleading in that I listed the promoted variables in base 10. If you view the variables in hexidecimal you can see that variable fvalue generated by

. gen double fvalue = mvalue

is really padded in zeros, but the dvalue generated by

. gen double dvalue = round(mvalue,10^(digits+ceil(log10(abs(mvalue)))))

is not.

. format %21x mvalue fvalue dvalue

. list fvalue mvalue dvalue in 1/20

+-----------------------------------------------------------------------+ | fvalue mvalue dvalue | |-----------------------------------------------------------------------| 1. | +1.80d0000000000X+00b +1.80d0000000000X+00b +1.80d0000000001X+00b | 2. | +1.235b340000000X+00c +1.235b340000000X+00c +1.235b333333334X+00c | 3. | +1.50b19a0000000X+00c +1.50b19a0000000X+00c +1.50b199999999bX+00c | 4. | +1.5d06660000000X+00b +1.5d06660000000X+00b +1.5d06666666668X+00b | 5. | +1.0d93340000000X+00c +1.0d93340000000X+00c +1.0d93333333334X+00c | |-----------------------------------------------------------------------| 6. | +1.223e660000000X+00c +1.223e660000000X+00c +1.223e666666667X+00c | 7. | +1.1c73340000000X+00c +1.1c73340000000X+00c +1.1c73333333334X+00c | 8. | +1.9583340000000X+00b +1.9583340000000X+00b +1.9583333333335X+00b | 9. | +1.fab6660000000X+00b +1.fab6660000000X+00b +1.fab6666666668X+00b | 10. | +1.11b4cc0000000X+00c +1.11b4cc0000000X+00c +1.11b4cccccccceX+00c | |-----------------------------------------------------------------------| 11. | +1.2e8e660000000X+00c +1.2e8e660000000X+00c +1.2e8e666666667X+00c | 12. | +1.324e660000000X+00c +1.324e660000000X+00c +1.324e666666667X+00c | 13. | +1.b8d0000000000X+00b +1.b8d0000000000X+00b +1.b8d0000000002X+00b | 14. | +1.96d6660000000X+00b +1.96d6660000000X+00b +1.96d6666666668X+00b | 15. | +1.ce86660000000X+00b +1.ce86660000000X+00b +1.ce86666666668X+00b | |-----------------------------------------------------------------------| 16. | +1.d573340000000X+00b +1.d573340000000X+00b +1.d573333333335X+00b | 17. | +1.2e10000000000X+00c +1.2e10000000000X+00c +1.2e10000000001X+00c | 18. | +1.33ce660000000X+00c +1.33ce660000000X+00c +1.33ce666666667X+00c | 19. | +1.861b340000000X+00c +1.861b340000000X+00c +1.861b333333335X+00c | 20. | +1.5d999a0000000X+00c +1.5d999a0000000X+00c +1.5d9999999999bX+00c | +-----------------------------------------------------------------------+

> I would like to add to this demonstration where I show the > dangers of having single precision variables (float) and not > promoting them to double precision with caution. > > Note that single precision will have only about 7 digits > of precision while double has about 16 corresponding to > 24 and 53 bits for the mantissia: > > . di c(epsfloat) > 1.192e-07 > > . di c(epsdouble) > 2.220e-16 > > . di 2^(1-24) > 1.192e-07 > > . di 2^(1-53) > 2.220e-16 > > machine epsilon is 2^(1-p) where p is the number of bits > reserved for the mantissia of floating point numbers. > > Now take a look at the storage type for each of the variables > in the grunfeld dataset. > > . webuse grunfeld,clear > > . describe > > Contains data from http://www.stata-press.com/data/r9/grunfeld.dta > obs: 200 > vars: 6 3 Mar 2005 20:27 > size: 5,600 (99.5% of memory free) > ------------------------------------------------------------------------------- > storage display value > variable name type format label variable label > ------------------------------------------------------------------------------- > company float %9.0g > year float %9.0g > invest float %9.0g > mvalue float %9.0g > kstock float %9.0g > time float %9.0g > > . list in 1/20 > > +--------------------------------------------------+ > | company year invest mvalue kstock time | > |--------------------------------------------------| > 1. | 1 1935 317.6 3078.5 2.8 1 | > 2. | 1 1936 391.8 4661.7 52.6 2 | > 3. | 1 1937 410.6 5387.1 156.9 3 | > 4. | 1 1938 257.7 2792.2 209.2 4 | > 5. | 1 1939 330.8 4313.2 203.4 5 | > |--------------------------------------------------| > 6. | 1 1940 461.2 4643.9 207.2 6 | > 7. | 1 1941 512 4551.2 255.2 7 | > 8. | 1 1942 448 3244.1 303.7 8 | > 9. | 1 1943 499.6 4053.7 264.1 9 | > 10. | 1 1944 547.5 4379.3 201.6 10 | > |--------------------------------------------------| > 11. | 1 1945 561.2 4840.9 265 11 | > 12. | 1 1946 688.1 4900.9 402.2 12 | > 13. | 1 1947 568.9 3526.5 761.5 13 | > 14. | 1 1948 529.2 3254.7 922.4 14 | > 15. | 1 1949 555.1 3700.2 1020.1 15 | > |--------------------------------------------------| > 16. | 1 1950 642.9 3755.6 1099 16 | > 17. | 1 1951 755.9 4833 1207.7 17 | > 18. | 1 1952 891.2 4924.9 1430.5 18 | > 19. | 1 1953 1304.4 6241.7 1777.3 19 | > 20. | 1 1954 1486.7 5593.6 2226.3 20 | > +--------------------------------------------------+ > > We can extend the display format of mvalue and kstock and you will see > what you are going to get beyond the 7-th digit when you promote these > variables to double precision. > > . format %20.15g kstock mvalue > > . gen double fvalue = mvalue > > . gen double fstock = kstock > > . format %20.15g fvalue fstock > > . list fvalue mvalue fstock kstock in 1/20 > > +---------------------------------------------------------------------------+ > | fvalue mvalue fstock kstock | |---------------------------------------------------------------------------| > 1.| 3078.5 3078.5 2.79999995231628 2.79999995231628 | > 2.| 4661.7001953125 4661.7001953125 52.5999984741211 52.5999984741211 | > 3.| 5387.10009765625 5387.10009765625 156.899993896484 156.899993896484 | > 4.| 2792.19995117188 2792.19995117188 209.199996948242 209.199996948242 | > 5.| 4313.2001953125 4313.2001953125 203.399993896484 203.399993896484 | > |---------------------------------------------------------------------------| > 6.| 4643.89990234375 4643.89990234375 207.199996948242 207.199996948242 | > 7.| 4551.2001953125 4551.2001953125 255.199996948242 255.199996948242 | > 8.| 3244.10009765625 3244.10009765625 303.700012207031 303.700012207031 | > 9.| 4053.69995117188 4053.69995117188 264.100006103516 264.100006103516 | > 10.| 4379.2998046875 4379.2998046875 201.600006103516 201.600006103516 | > |---------------------------------------------------------------------------| > 11 | 4840.89990234375 4840.89990234375 265 265 | > 12.| 4900.89990234375 4900.89990234375 402.200012207031 402.200012207031 | > 13.| 3526.5 3526.5 761.5 761.5 | > 14.| 3254.69995117188 3254.69995117188 922.400024414063 922.400024414063 | > 15.| 3700.19995117188 3700.19995117188 1020.09997558594 1020.09997558594 | > |---------------------------------------------------------------------------| > 16.| 3755.60009765625 3755.60009765625 1099 1099 | > 17.| 4833 4833 1207.69995117188 1207.69995117188 | > 18.| 4924.89990234375 4924.89990234375 1430.5 1430.5 | > 19.| 6241.7001953125 6241.7001953125 1777.30004882813 1777.30004882813 | > 20.| 5593.60009765625 5593.60009765625 2226.30004882813 2226.30004882813 | > +---------------------------------------------------------------------------+ > > > . egen double t1 = total(mva), by(com) > > . egen double t2 = total(kstock), by(com) > > . format %20.15g t1 t2 > > . list t1 t2 in 1/20 > > +------------------------------------+ > | t1 t2 | > |------------------------------------| > 1. | 86676.900390625 12968.7000625134 | > 2. | 86676.900390625 12968.7000625134 | > 3. | 86676.900390625 12968.7000625134 | > 4. | 86676.900390625 12968.7000625134 | > 5. | 86676.900390625 12968.7000625134 | > |------------------------------------| > 6. | 86676.900390625 12968.7000625134 | > 7. | 86676.900390625 12968.7000625134 | > 8. | 86676.900390625 12968.7000625134 | > 9. | 86676.900390625 12968.7000625134 | > 10. | 86676.900390625 12968.7000625134 | > |------------------------------------| > 11. | 86676.900390625 12968.7000625134 | > 12. | 86676.900390625 12968.7000625134 | > 13. | 86676.900390625 12968.7000625134 | > 14. | 86676.900390625 12968.7000625134 | > 15. | 86676.900390625 12968.7000625134 | > |------------------------------------| > 16. | 86676.900390625 12968.7000625134 | > 17. | 86676.900390625 12968.7000625134 | > 18. | 86676.900390625 12968.7000625134 | > 19. | 86676.900390625 12968.7000625134 | > 20. | 86676.900390625 12968.7000625134 | > +------------------------------------+ > > You can see here that even though we have made the variables > t1 and t2 double precision, we really only have done single > precision numerics (actually a little better than single > precision) and the variables have only 9 digits that > are any good. We gained some precsion, but we did not > do as well as we would have if we started with full double > precision. This garbage beyond the 9-th digit can be > enough for -_rmcoll- to not detect variable collinearity. > > Now if we promote mvalue and kstock to double carefully > we can get full precision. > > . scalar digits = round(log10(abs(c(epsfloat))),1)+1 > > . gen double dvalue = round(mvalue,10^(digits+ceil(log10(abs(mvalue))))) > > . gen double dstock = round(kstock,10^(digits+ceil(log10(abs(kstock))))) > > . format %20.15g dvalue dstock > > . list dvalue mvalue dstock kstock in 1/20 > > +-------------------------------------------------------+ > | dvalue mvalue dstock kstock | > |-------------------------------------------------------| > 1. | 3078.5 3078.5 2.8 2.79999995231628 | > 2. | 4661.7 4661.7001953125 52.6 52.5999984741211 | > 3. | 5387.1 5387.10009765625 156.9 156.899993896484 | > 4. | 2792.2 2792.19995117188 209.2 209.199996948242 | > 5. | 4313.2 4313.2001953125 203.4 203.399993896484 | > |-------------------------------------------------------| > 6. | 4643.9 4643.89990234375 207.2 207.199996948242 | > 7. | 4551.2 4551.2001953125 255.2 255.199996948242 | > 8. | 3244.1 3244.10009765625 303.7 303.700012207031 | > 9. | 4053.7 4053.69995117188 264.1 264.100006103516 | > 10. | 4379.3 4379.2998046875 201.6 201.600006103516 | > |-------------------------------------------------------| > 11. | 4840.9 4840.89990234375 265 265 | > 12. | 4900.9 4900.89990234375 402.2 402.200012207031 | > 13. | 3526.5 3526.5 761.5 761.5 | > 14. | 3254.7 3254.69995117188 922.4 922.400024414063 | > 15. | 3700.2 3700.19995117188 1020.1 1020.09997558594 | > |-------------------------------------------------------| > 16. | 3755.6 3755.60009765625 1099 1099 | > 17. | 4833 4833 1207.7 1207.69995117188 | > 18. | 4924.9 4924.89990234375 1430.5 1430.5 | > 19. | 6241.7 6241.7001953125 1777.3 1777.30004882813 | > 20. | 5593.6 5593.60009765625 2226.3 2226.30004882813 | > +-------------------------------------------------------+ > > . egen double d1 = total(dvalue), by(com) > > . egen double d2 = total(dstock), by(com) > > . format %20.15g d1 d2 > > . list d1 t1 d2 t2 in 1/20 > > +--------------------------------------------------------+ > | d1 t1 d2 t2 | > |--------------------------------------------------------| > 1. | 86676.9 86676.900390625 12968.7 12968.7000625134 | > 2. | 86676.9 86676.900390625 12968.7 12968.7000625134 | > 3. | 86676.9 86676.900390625 12968.7 12968.7000625134 | > 4. | 86676.9 86676.900390625 12968.7 12968.7000625134 | > 5. | 86676.9 86676.900390625 12968.7 12968.7000625134 | > |--------------------------------------------------------| > 6. | 86676.9 86676.900390625 12968.7 12968.7000625134 | > 7. | 86676.9 86676.900390625 12968.7 12968.7000625134 | > 8. | 86676.9 86676.900390625 12968.7 12968.7000625134 | > 9. | 86676.9 86676.900390625 12968.7 12968.7000625134 | > 10. | 86676.9 86676.900390625 12968.7 12968.7000625134 | > |--------------------------------------------------------| > 11. | 86676.9 86676.900390625 12968.7 12968.7000625134 | > 12. | 86676.9 86676.900390625 12968.7 12968.7000625134 | > 13. | 86676.9 86676.900390625 12968.7 12968.7000625134 | > 14. | 86676.9 86676.900390625 12968.7 12968.7000625134 | > 15. | 86676.9 86676.900390625 12968.7 12968.7000625134 | > |--------------------------------------------------------| > 16. | 86676.9 86676.900390625 12968.7 12968.7000625134 | > 17. | 86676.9 86676.900390625 12968.7 12968.7000625134 | > 18. | 86676.9 86676.900390625 12968.7 12968.7000625134 | > 19. | 86676.9 86676.900390625 12968.7 12968.7000625134 | > 20. | 86676.9 86676.900390625 12968.7 12968.7000625134 | > +--------------------------------------------------------+ > > Now we do Scott's experiment in double precision: > > . gen double dinvest = round(invest,10^(digits+ceil(log10(abs(invest))))) > > . recast long company > > . areg dinvest dstock d1, ab(com) > > Linear regression, absorbing indicators Number of obs = 200 > F( 1, 189) = 366.45 > Prob > F = 0.0000 > R-squared = 0.9184 > Adj R-squared = 0.9141 > Root MSE = 63.566 > > ------------------------------------------------------------------------------ > dinvest | Coef. Std. Err. t P>|t| [95% Conf. Interval] > -------------+---------------------------------------------------------------- > dstock | .3707496 .0193676 19.14 0.000 .3325452 .4089541 > d1 | (dropped) > _cons | 43.62499 6.984315 6.25 0.000 29.84777 57.40222 > -------------+---------------------------------------------------------------- > company | F(9, 189) = 36.975 0.000 (10 categories) > > . areg dinvest dvalue d2, ab(com) > > Linear regression, absorbing indicators Number of obs = 200 > F( 1, 189) = 111.35 > Prob > F = 0.0000 > R-squared = 0.8491 > Adj R-squared = 0.8411 > Root MSE = 86.444 > > ------------------------------------------------------------------------------ > dinvest | Coef. Std. Err. t P>|t| [95% Conf. Interval] > -------------+---------------------------------------------------------------- > dvalue | .1898776 .0179944 10.55 0.000 .1543819 .2253733 > d2 | (dropped) > _cons | -59.42872 20.40144 -2.91 0.004 -99.6725 -19.18494 > -------------+---------------------------------------------------------------- > company | F(9, 189) = 15.973 0.000 (10 categories) > > > > So my advice is to be very wary of single precision floating point variable. > > > -Rich > * > * For searches and help try: > * http://www.stata.com/support/faqs/res/findit.html > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/


Tag:


Links to this post:

Create a Link



<< Home

This page is powered by Blogger. Isn't yours?