Subject: Responses to supposed "bug" in Excel |

From: "Christopher J. Lacke" |

Date: Fri, 17 Oct 2003 10:12:16 -0400 |

To: |

Hi all: Wow! Thanks for all of the responses. It's been 14 years since I took a class that involved floating point issues, and since many of us in the statistics community (myself included) trash Excel's stat package, I was guilty of (conciously/subconciously) wanting to trash Excel instead of remembering what I learned in my Fortran 77 class (boy, am I feeling middle-aged). Anyway, a number of you responded with very nice descriptions of the issues in floating point mathematics. Here are the responses that I received. By the way, a few of you noted that you tried to access my web page and received a notice that the page could not be located. Thanks to all of the viruses that have been floating around, attempting to access our web servers this semester can be modelled as a Bernoulli trial with p=0.03. Thanks again, Chris this is not a "bug" but a "feature" of floating point arithmetic. To avoid seeing it, you need to use Excel's "ROUND" function to limit your numbers to some significant number of digits (say 10 or 12). The other solution is to use something, like APL or J (see www.jsoftware.com) with the notion of comparison tolerance built in. In Excel, for example, instead of the entering the function "=(A2+4)*(A2-2)" (for (X+4)(X-2)), instead enter "=ROUND((A2+4)*(A2-2),10)" to round to 10 places past the decimal point. The problem is unavoidable in the sense that there are decimal numbers which have no exact, finite representation in binary but they have to be stored with a limited number of digits. Devon McCormick ^me^ at acm.org Unless I am misreading this, it sounds like all you need to do is format the cells. Highlight the column in which you plan to type the x-values and right click. Click on Format Cells and select the Number Tab and then highlight Number, type in 1 for the number of decimal places and then click Okay. Ann Kirkpatrick The problem is that the binary representation of the decimal 0.1 is 0.0001100110011... where the 0011 repeats forever. The floating point representation of 0.1 is inherently inaccurate. In a more familiar setting, if we had a machine that could store a two decimal digit mantissa, then 1/3 would be stored as .33 and when added to itself two times would yield .99 rather than 1.0. Displayed values are often rounded so that the last couple of digits are not displayed so that the underlying representation is again different from the displayed value. The solution to your original problem would be to create the integers -60 to 60, which can be created and stored exactly in binary. And then divide all of these numbers by 10. The underlying representations of some of the numbers will not exactly match the numbers intended, but the errors should be a single round-off rather than an accumulation of round-off errors. David Housman There's no bug. That's normal roundoff error. 0.1 is only approximated in floating point. Only rational numbers that can be expressed with a power of 2 in the denominator can be represented exactly in any binary floating point format. Excel's strategy for formatting numbers for output (e.g. if it's close enough to x.xx, it shows only x.xx, not x.xx00000000002) might be doing some strange stuff there, but if I do one of your sequences such that 0 ought to be there, it almost always isn't, and I'm not seeing evidence that anything's wrong with the internal floating point values. Michael Fry I haven't been able to replicate the problem on my Mac G4. Maybe the work-around is to get a Macintosh. -- Paul Velleman I'm running Excel 2002 on Windows XP and looked into this a bit. Personally, I'm a fan of using Excel in Intro Stats AND understanding it's shortcomings as is the case here. I was able to get "rounding" error with a bunch of different increments. For example, starting with -4 and incrementing by 0.5, after line 27 rounding was occurring. The funny thing is that in some cases, the rounding error is "corrected". For example, starting at -4 and using 0.1 as the increment, gets off after line 27, but Excel corrects itself starting at line 51. The number of lines before rounding error occurs differs if you use a formula like $A$1 - 0.1 versus getting a series started with -4, -3.9, -3.8, and then copying down. My empirical evidence suggests getting a series started and copying it down is better in terms of controlling the rounding error. Probably most distributing is, if you get a series started and copy it down you don't necessarily get the same results as if you start copying a series, stop, and then restart copying again. YIKES. The "only" good news is that the rounding errors appear to be in digits well beyond the limits that I care about. But, it is still a problem with Excel. Good luck on finding a "fix", Chris Malone Hi. Here are two things to note. First of all, you may not be seeing any "problem" with Excel. You may just be exhibiting rounding error. You see, computers store floating point numbers (non-integers) using a binary rather than a decimal scheme. In this scheme, numbers such as .5, .25 and .125 would be exact, but normal increments of 0.1 would be rounded. Adding 0.1 multiple times will yield errors at about the level of 10^(-15) which is the limit of floating point accuracy. In short, it may not be a "bug". If you'd like to remove the problem "cosmetically", try changing the cell information so that it just shows fewer decimal digits. Dov Chelst Associate Professor DeVry University P.S. I can generate similar problems using Matlab (math software). This rounding issue is even more obvious when you use a programming language. I see it commonly in Python. My point is that this facet of floating point arithmetic arises in advanced mathematics software and in common programming languages. I guess I don't understand why we think we have found a bug, rather than just the ever-present effects of finite precision. I replicated the calculation I think you were doing (repeatedly adding .1 to -6) in R on a unix workstation, with the follwoing results... it computes zero as -4.5796699765787707e-15. Hmmm, that looks a lot like the computed value of zero in the first message. [One might think the computations were exact for a while if the number of digits printed is too few.]

dx <- rep(.1,120) dx[1] <- -6 x <- cumsum(dx)

options(digits=18) x

[1] -6.0000000000000000e+00 -5.9000000000000004e+00 -5.8000000000000007e+00 [4] -5.7000000000000011e+00 -5.6000000000000014e+00 -5.5000000000000018e+00 [7] -5.4000000000000021e+00 -5.3000000000000025e+00 -5.2000000000000028e+00 [10] -5.1000000000000032e+00 -5.0000000000000036e+00 -4.9000000000000039e+00 [13] -4.8000000000000043e+00 -4.7000000000000046e+00 -4.6000000000000050e+00 [16] -4.5000000000000053e+00 -4.4000000000000057e+00 -4.3000000000000060e+00 [19] -4.2000000000000064e+00 -4.1000000000000068e+00 -4.0000000000000071e+00 [22] -3.9000000000000070e+00 -3.8000000000000069e+00 -3.7000000000000068e+00 [25] -3.6000000000000068e+00 -3.5000000000000067e+00 -3.4000000000000066e+00 [28] -3.3000000000000065e+00 -3.2000000000000064e+00 -3.1000000000000063e+00 [31] -3.0000000000000062e+00 -2.9000000000000061e+00 -2.8000000000000060e+00 [34] -2.7000000000000060e+00 -2.6000000000000059e+00 -2.5000000000000058e+00 [37] -2.4000000000000057e+00 -2.3000000000000056e+00 -2.2000000000000055e+00 [40] -2.1000000000000054e+00 -2.0000000000000053e+00 -1.9000000000000052e+00 [43] -1.8000000000000052e+00 -1.7000000000000051e+00 -1.6000000000000050e+00 [46] -1.5000000000000049e+00 -1.4000000000000048e+00 -1.3000000000000047e+00 [49] -1.2000000000000046e+00 -1.1000000000000045e+00 -1.0000000000000044e+00 [52] -9.0000000000000446e-01 -8.0000000000000449e-01 -7.0000000000000451e-01 [55] -6.0000000000000453e-01 -5.0000000000000455e-01 -4.0000000000000457e-01 [58] -3.0000000000000460e-01 -2.0000000000000459e-01 -1.0000000000000459e-01 [61] -4.5796699765787707e-15 9.9999999999995426e-02 1.9999999999999543e-01 [64] 2.9999999999999544e-01 3.9999999999999547e-01 4.9999999999999545e-01 [67] 5.9999999999999543e-01 6.9999999999999540e-01 7.9999999999999538e-01 [70] 8.9999999999999536e-01 9.9999999999999534e-01 1.0999999999999954e+00 [73] 1.1999999999999955e+00 1.2999999999999956e+00 1.3999999999999957e+00 [76] 1.4999999999999958e+00 1.5999999999999959e+00 1.6999999999999960e+00 [79] 1.7999999999999960e+00 1.8999999999999961e+00 1.9999999999999962e+00 [82] 2.0999999999999961e+00 2.1999999999999962e+00 2.2999999999999963e+00 [85] 2.3999999999999964e+00 2.4999999999999964e+00 2.5999999999999965e+00 [88] 2.6999999999999966e+00 2.7999999999999967e+00 2.8999999999999968e+00 [91] 2.9999999999999969e+00 3.0999999999999970e+00 3.1999999999999971e+00 [94] 3.2999999999999972e+00 3.3999999999999972e+00 3.4999999999999973e+00 [97] 3.5999999999999974e+00 3.6999999999999975e+00 3.7999999999999976e+00 [100] 3.8999999999999977e+00 3.9999999999999978e+00 4.0999999999999979e+00 [103] 4.1999999999999975e+00 4.2999999999999972e+00 4.3999999999999968e+00 [106] 4.4999999999999964e+00 4.5999999999999961e+00 4.6999999999999957e+00 [109] 4.7999999999999954e+00 4.8999999999999950e+00 4.9999999999999947e+00 [112] 5.0999999999999943e+00 5.1999999999999940e+00 5.2999999999999936e+00 [115] 5.3999999999999932e+00 5.4999999999999929e+00 5.5999999999999925e+00 [118] 5.6999999999999922e+00 5.7999999999999918e+00 5.8999999999999915e+00 Albyn Jones I was able to replicate the problem in part: I got the same values at x=0 (-2.0428E-14 and -4.57967E-15). I have a mac G4, running OSX (10.2.6) and Excel:mac v. X. Jeff Witmer How about this... If I simply make a column of numbers 10 9.9 9.8... and copy the pattern, by the time I get to 8.5,which just happens to be 16 rows later, I then have 8.5000000000000100. Now, when computers do subtraction they usually to 2's complement and addition. The problem may arise when we try to represent 0.10 in binary. We have to represent it as a sum of bits whose values are 1/(2^1), 1/(2^2) + 1/(2^3)...etc. No matter how many bits we have (23 in IEEE Floating point format) that number looks like 11111111111.... Its two's complement looks like 000000000000...1. We simply cannot represent it accurately. Maybe Jeff, our resident CS grad student, can shed additional light on this? -- Alan Scrivner through Charles Antoniak Christopher Jay Lacke Mathematics Department Rowan University Glassboro, NJ 08028 (856) 256-4500 x3897 (856) 256-4816 (fax) http://www.rowan.edu/mars/depts/math/lacke/lacke1.html -------------------------------------------------------------------------------- "There are two types of people in the world - those who think that there are two types and those who don't" - author unknown --------------------------------------------------------------------------------