This calculation is just wrong / computer can't count!

Discussion:

This calculation is just wrong / computer can't count!

(too old to reply)

GT

2007-10-04 14:24:51 UTC

I have been debugging something for ages now. I have a method that does some
complex maths, but right at the beginning it works out a proportion and a
few ratios and the maths is simply wrong. In my code I (obviously) use
variables and the values vary each time the method is called, but there
seems to a problem with the maths. I have narrowed the problem down to the
following. Can someone else please try this simply calculation and see what
their computer gets.

Code Line 1:
double effortChangeProportion = (55.0 - 30.0) / 30.0;

This first line does 55-30 and divides the result by 30. In other words
25/30, which is 0.8333 (recurring 3s).

The computer manages to give the answer 0.83333333333333337 !!

Code Line 2:
effortChangeProportion++;

or

effortChangeProportion = effortChangeProportion + 1.0;

The second line of code (both alternatives give the same result) builds on
the first by simply adding 1 (so I can then multiply other numbers by this
proportion).

In this case 0.8333 becomes 1.8333 , but again the computer gets this wrong.
It tries to add 1 to 0.83333333333333337 and gets 1.8333333333333335.

Obviously this can easily be done in 1 line of code, but it is broken down
to demonstrate the maths going wrong twice!

Can anyone shed some light on this for me please?

GT

.rhavin grobert

2007-10-04 14:36:32 UTC

Post by GT
I have been debugging something for ages now. I have a method that does some
complex maths, but right at the beginning it works out a proportion and a
few ratios and the maths is simply wrong. In my code I (obviously) use
variables and the values vary each time the method is called, but there
seems to a problem with the maths. I have narrowed the problem down to the
following. Can someone else please try this simply calculation and see what
their computer gets.
double effortChangeProportion = (55.0 - 30.0) / 30.0;
This first line does 55-30 and divides the result by 30. In other words
25/30, which is 0.8333 (recurring 3s).
The computer manages to give the answer 0.83333333333333337 !!
effortChangeProportion++;
or
effortChangeProportion = effortChangeProportion + 1.0;
The second line of code (both alternatives give the same result) builds on
the first by simply adding 1 (so I can then multiply other numbers by this
proportion).
In this case 0.8333 becomes 1.8333 , but again the computer gets this wrong.
It tries to add 1 to 0.83333333333333337 and gets 1.8333333333333335.
Obviously this can easily be done in 1 line of code, but it is broken down
to demonstrate the maths going wrong twice!
Can anyone shed some light on this for me please?
GT

what happens if you do...

double test = 0.833333333333333333;

and print it's value? is it 0.83333333333333337?

perhaps the difference between 0.83333333333333333 and
0.83333333333333337 is just below doubles granuarlity an your machine?

GT

2007-10-04 15:00:22 UTC

Post by .rhavin grobert

Post by GT
I have been debugging something for ages now. I have a method that does some
complex maths, but right at the beginning it works out a proportion and a
few ratios and the maths is simply wrong. In my code I (obviously) use
variables and the values vary each time the method is called, but there
seems to a problem with the maths. I have narrowed the problem down to the
following. Can someone else please try this simply calculation and see what
their computer gets.
double effortChangeProportion = (55.0 - 30.0) / 30.0;
This first line does 55-30 and divides the result by 30. In other words
25/30, which is 0.8333 (recurring 3s).
The computer manages to give the answer 0.83333333333333337 !!
effortChangeProportion++;
or
effortChangeProportion = effortChangeProportion + 1.0;
The second line of code (both alternatives give the same result) builds on
the first by simply adding 1 (so I can then multiply other numbers by this
proportion).
In this case 0.8333 becomes 1.8333 , but again the computer gets this wrong.
It tries to add 1 to 0.83333333333333337 and gets 1.8333333333333335.
Obviously this can easily be done in 1 line of code, but it is broken down
to demonstrate the maths going wrong twice!
Can anyone shed some light on this for me please?
GT

what happens if you do...
double test = 0.833333333333333333;
and print it's value? is it 0.83333333333333337?
perhaps the difference between 0.83333333333333333 and
0.83333333333333337 is just below doubles granuarlity an your machine?

That simple line also fails. What can I do about this? I tried multiplying
the number by a million, storing in an int, then dividing by a million
(truncating the number), but the divide by 1 million introduces extra digits
at the end of the number too. I don't understand why the computer would
generate a number that it can't handle - in my original calculation I just
do 25.0/30.0 and store the result in a double - the computer decides for
itself how many decimal places to use! Why would the computer calculate
numbers to a larger number of decimal places than it can handle! I'm using
an Intel CoreDuo T2500 laptop. Now 1 year old, but last year it was a high
spec Dell.

What do you mean by the number might be below double granularity on the
computer? What can I do about this?

.rhavin grobert

2007-10-04 15:19:48 UTC

Post by GT
What do you mean by the number might be below double granularity on the
computer? What can I do about this?

your computer stores floatingpoints (float, double, long double) in a
format like

[sign][exponent][fraction]. that means that it can distinguish between
0.0000003 and 0.0000004 but perhaps not between 700000.0000003 and
700000.0000004. on the other hand, it can handle numbers up to
3.4028235*(10^38) for a 32-bit floating point variable.

GT

2007-10-04 15:28:32 UTC

Post by .rhavin grobert

Post by GT
What do you mean by the number might be below double granularity on the
computer? What can I do about this?

your computer stores floatingpoints (float, double, long double) in a
format like
[sign][exponent][fraction]. that means that it can distinguish between
0.0000003 and 0.0000004 but perhaps not between 700000.0000003 and
700000.0000004. on the other hand, it can handle numbers up to
3.4028235*(10^38) for a 32-bit floating point variable.

But if the computer stores numbers in this format, then why does it add an
extra digit on the end that it can't handle and more importantly - how do I
stop it from adding the extra digit on the end?

.rhavin grobert

2007-10-04 15:40:13 UTC

Post by .rhavin grobert

Post by GT
What do you mean by the number might be below double granularity on the
computer? What can I do about this?

your computer stores floatingpoints (float, double, long double) in a
format like
[sign][exponent][fraction]. that means that it can distinguish between
0.0000003 and 0.0000004 but perhaps not between 700000.0000003 and
700000.0000004. on the other hand, it can handle numbers up to
3.4028235*(10^38) for a 32-bit floating point variable.

But if the computer stores numbers in this format, then why does it add an
extra digit on the end that it can't handle and more importantly - how do I
stop it from adding the extra digit on the end?

it doesnt "add" it.

If you do...
int a = 1.34;

and then look into the memory at &a you'll find a "1", because the
compiler does a

int a = (int) 1.34; => int a = 1;

because the constant is first cast into the appropriate value (1 for
int, because granuarlity of int is 1) and then moved into memory at
&a. For doubles, thats the same. But doubles have granuarlity
antiproportional to absolute (unsigned) size. You can imagine it as a
"floating point" in decimals: guess you have (for example) 10 digits,
a sign (on= + /off = -) and something that states: put decimal point
here...

if you do a = 0.123456789, then it would be printed as 0.123456789, if
you do a = 0.1234567895 than it would be also printet as 0.123456789
because you're below granuarlity.

if you do a a = 1234567890 ten wit will be printed as 1234567890, if
you substact 0.4 then you're below granuarlity and may get 1234567890
or 1234567889 depending on your system.

GT

2007-10-04 16:10:28 UTC

Post by .rhavin grobert

Post by .rhavin grobert

Post by GT
What do you mean by the number might be below double granularity on the
computer? What can I do about this?

your computer stores floatingpoints (float, double, long double) in a
format like
[sign][exponent][fraction]. that means that it can distinguish between
0.0000003 and 0.0000004 but perhaps not between 700000.0000003 and
700000.0000004. on the other hand, it can handle numbers up to
3.4028235*(10^38) for a 32-bit floating point variable.

But if the computer stores numbers in this format, then why does it add an
extra digit on the end that it can't handle and more importantly - how do I
stop it from adding the extra digit on the end?

it doesnt "add" it.
If you do...
int a = 1.34;
and then look into the memory at &a you'll find a "1", because the
compiler does a
int a = (int) 1.34; => int a = 1;
because the constant is first cast into the appropriate value (1 for
int, because granuarlity of int is 1) and then moved into memory at
&a. For doubles, thats the same. But doubles have granuarlity
antiproportional to absolute (unsigned) size. You can imagine it as a
"floating point" in decimals: guess you have (for example) 10 digits,
a sign (on= + /off = -) and something that states: put decimal point
here...
if you do a = 0.123456789, then it would be printed as 0.123456789, if
you do a = 0.1234567895 than it would be also printet as 0.123456789
because you're below granuarlity.
if you do a a = 1234567890 ten wit will be printed as 1234567890, if
you substact 0.4 then you're below granuarlity and may get 1234567890
or 1234567889 depending on your system.

I understand, but how do I stop the computer from working outwith its
granularity? I understand that there are a finite number of digits in a
double, plus a sign, plus an exponent, but why does the computer display an
extra digit? I asked the computer to do 25/30 and store the answer as a
double. The answer is 0.8333333. The computer should surely only store as
many of the trailing 3s as it has space for. If it has 8 digits, then it
should store 0.83333333, but instead it shows a 7 as the last digit. This
isn't a rounding error, this is just wrong. How do I stop it from
showing/using the extra digits?

.rhavin grobert

2007-10-04 16:22:34 UTC

Post by .rhavin grobert

Post by .rhavin grobert

Post by GT
What do you mean by the number might be below double granularity on the
computer? What can I do about this?

your computer stores floatingpoints (float, double, long double) in a
format like
[sign][exponent][fraction]. that means that it can distinguish between
0.0000003 and 0.0000004 but perhaps not between 700000.0000003 and
700000.0000004. on the other hand, it can handle numbers up to
3.4028235*(10^38) for a 32-bit floating point variable.

But if the computer stores numbers in this format, then why does it add an
extra digit on the end that it can't handle and more importantly - how do I
stop it from adding the extra digit on the end?

it doesnt "add" it.
If you do...
int a = 1.34;
and then look into the memory at &a you'll find a "1", because the
compiler does a
int a = (int) 1.34; => int a = 1;
because the constant is first cast into the appropriate value (1 for
int, because granuarlity of int is 1) and then moved into memory at
&a. For doubles, thats the same. But doubles have granuarlity
antiproportional to absolute (unsigned) size. You can imagine it as a
"floating point" in decimals: guess you have (for example) 10 digits,
a sign (on= + /off = -) and something that states: put decimal point
here...
if you do a = 0.123456789, then it would be printed as 0.123456789, if
you do a = 0.1234567895 than it would be also printet as 0.123456789
because you're below granuarlity.
if you do a a = 1234567890 ten wit will be printed as 1234567890, if
you substact 0.4 then you're below granuarlity and may get 1234567890
or 1234567889 depending on your system.

I understand, but how do I stop the computer from working outwith its
granularity? I understand that there are a finite number of digits in a
double, plus a sign, plus an exponent, but why does the computer display an
extra digit? I asked the computer to do 25/30 and store the answer as a
double. The answer is 0.8333333. The computer should surely only store as
many of the trailing 3s as it has space for. If it has 8 digits, then it
should store 0.83333333, but instead it shows a 7 as the last digit. This
isn't a rounding error, this is just wrong. How do I stop it from
showing/using the extra digits

my example was in decimal system, but computers store binaries. your
"extra digit" comes from the translation your debugger does. use the
system with the smallest granuarlity (long double or some math-libs)
and truncate/round the result to the value you want to have/show/use.

Joseph M. Newcomer

2007-10-07 02:19:14 UTC

See below...

Post by .rhavin grobert

Post by .rhavin grobert

Post by GT
What do you mean by the number might be below double granularity on the
computer? What can I do about this?

your computer stores floatingpoints (float, double, long double) in a
format like
[sign][exponent][fraction]. that means that it can distinguish between
0.0000003 and 0.0000004 but perhaps not between 700000.0000003 and
700000.0000004. on the other hand, it can handle numbers up to
3.4028235*(10^38) for a 32-bit floating point variable.

But if the computer stores numbers in this format, then why does it add an
extra digit on the end that it can't handle and more importantly - how do I
stop it from adding the extra digit on the end?

it doesnt "add" it.
If you do...
int a = 1.34;
and then look into the memory at &a you'll find a "1", because the
compiler does a
int a = (int) 1.34; => int a = 1;
because the constant is first cast into the appropriate value (1 for
int, because granuarlity of int is 1) and then moved into memory at
&a. For doubles, thats the same. But doubles have granuarlity
antiproportional to absolute (unsigned) size. You can imagine it as a
"floating point" in decimals: guess you have (for example) 10 digits,
a sign (on= + /off = -) and something that states: put decimal point
here...
if you do a = 0.123456789, then it would be printed as 0.123456789, if
you do a = 0.1234567895 than it would be also printet as 0.123456789
because you're below granuarlity.
if you do a a = 1234567890 ten wit will be printed as 1234567890, if
you substact 0.4 then you're below granuarlity and may get 1234567890
or 1234567889 depending on your system.

I understand, but how do I stop the computer from working outwith its
granularity? I understand that there are a finite number of digits in a
double, plus a sign, plus an exponent, but why does the computer display an
extra digit? I asked the computer to do 25/30 and store the answer as a
double. The answer is 0.8333333. The computer should surely only store as
many of the trailing 3s as it has space for. If it has 8 digits, then it
should store 0.83333333, but instead it shows a 7 as the last digit. This
isn't a rounding error, this is just wrong. How do I stop it from
showing/using the extra digits?

This doesn't even make sense. BY DEFINITION, the computer is forced to work within the
limits of its physical design. There is no "extra digit" being added; I have no idea why
you think it is an "extra" digit. It is NOT an "extra" digit, IT IS THE VALUE THE
COMPUTER HAS! If you ask the computer to print out a floating point number to maximum
precision, you are going to see the floating point number to its maximum precision, and
the number you are seeing is the ACTUAL, PRECISE VALUE THAT IS STORED IN THE COMPUTER.
That value is NOT 0.833,333,333,333,333,3 and it is not 0.833,333,333,333,333,33, it is
0.833,333,333,333,333,37. There is *NO* "extra" digit. This IS the value that was
computed. (Note that I put , separators in so you can count the digits!). There is
absolutely NO way you can represent 0.833,333,333,333,333,33 in 64-bit IEEE 754 floating
point.

You can represent
0.833,333,333,333,333,26
which is represented as the 64-bit hexadecimal number
0xAAAAAAAAAAAAA
and you can represent
0.833,333,333,333,333,37
which is represented as the 64-bit hexadecimal number
0xAAAAAAAAAAAAB

Note that these differ by 1 bit in the low-order position. There are no other
representations possible, and in particular, you cannot possibly find a bit pattern that
represents
0.833,333,333,333,333,33

You obviously asked to display the value to maximum decimal places. If you want to see
fewer decimal places, you would ask it to show the value to fewer decimal places. You
obviously did not do this, so you get to see the entire value, AS IT IS REPRESENTED IN THE
COMPUTER. I cannot emphasize enough that THERE IS NO EXTRA DIGIT BEING ADDED!

Do not think you are doing decimal arithmetic. Youa re not doing decimal arithmetic, and
having expectations that you are doing decimal arithmetic is why you are having
unrealistic expectations about what value you are seeing.

It is not wrong. It is absolutely correct, to within the limits of the design of IEEE 754
floating point arithmetic. There is no way you are going to get any other value, unless
you choose a completely different floating point chip on a completely different platform.
Oh yes, and if you do, you will STILL see phenomena like this. They will just be slightly
different due to different choices of how that particular platform's floating point
arithmetic works.
joe
****
Joseph M. Newcomer [MVP]
email: ***@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

GT

2007-10-08 09:57:54 UTC

Post by Joseph M. Newcomer
See below...

Post by .rhavin grobert

Post by .rhavin grobert

Post by GT
What do you mean by the number might be below double granularity on the
computer? What can I do about this?

your computer stores floatingpoints (float, double, long double) in a
format like
[sign][exponent][fraction]. that means that it can distinguish between
0.0000003 and 0.0000004 but perhaps not between 700000.0000003 and
700000.0000004. on the other hand, it can handle numbers up to
3.4028235*(10^38) for a 32-bit floating point variable.

But if the computer stores numbers in this format, then why does it add an
extra digit on the end that it can't handle and more importantly - how
do
I
stop it from adding the extra digit on the end?

it doesnt "add" it.
If you do...
int a = 1.34;
and then look into the memory at &a you'll find a "1", because the
compiler does a
int a = (int) 1.34; => int a = 1;
because the constant is first cast into the appropriate value (1 for
int, because granuarlity of int is 1) and then moved into memory at
&a. For doubles, thats the same. But doubles have granuarlity
antiproportional to absolute (unsigned) size. You can imagine it as a
"floating point" in decimals: guess you have (for example) 10 digits,
a sign (on= + /off = -) and something that states: put decimal point
here...
if you do a = 0.123456789, then it would be printed as 0.123456789, if
you do a = 0.1234567895 than it would be also printet as 0.123456789
because you're below granuarlity.
if you do a a = 1234567890 ten wit will be printed as 1234567890, if
you substact 0.4 then you're below granuarlity and may get 1234567890
or 1234567889 depending on your system.

I understand, but how do I stop the computer from working outwith its
granularity? I understand that there are a finite number of digits in a
double, plus a sign, plus an exponent, but why does the computer display an
extra digit? I asked the computer to do 25/30 and store the answer as a
double. The answer is 0.8333333. The computer should surely only store as
many of the trailing 3s as it has space for. If it has 8 digits, then it
should store 0.83333333, but instead it shows a 7 as the last digit. This
isn't a rounding error, this is just wrong. How do I stop it from
showing/using the extra digits?

This doesn't even make sense. BY DEFINITION, the computer is forced to work within the
limits of its physical design. There is no "extra digit" being added; I have no idea why
you think it is an "extra" digit. It is NOT an "extra" digit, IT IS THE VALUE THE
COMPUTER HAS! If you ask the computer to print out a floating point number to maximum
precision, you are going to see the floating point number to its maximum precision, and
the number you are seeing is the ACTUAL, PRECISE VALUE THAT IS STORED IN THE COMPUTER.
That value is NOT 0.833,333,333,333,333,3 and it is not
0.833,333,333,333,333,33, it is
0.833,333,333,333,333,37. There is *NO* "extra" digit. This IS the value that was
computed. (Note that I put , separators in so you can count the digits!).
There is
absolutely NO way you can represent 0.833,333,333,333,333,33 in 64-bit IEEE 754 floating
point.
You can represent
0.833,333,333,333,333,26
which is represented as the 64-bit hexadecimal number
0xAAAAAAAAAAAAA
and you can represent
0.833,333,333,333,333,37
which is represented as the 64-bit hexadecimal number
0xAAAAAAAAAAAAB
Note that these differ by 1 bit in the low-order position. There are no other
representations possible, and in particular, you cannot possibly find a bit pattern that
represents
0.833,333,333,333,333,33
You obviously asked to display the value to maximum decimal places. If you want to see
fewer decimal places, you would ask it to show the value to fewer decimal places. You
obviously did not do this, so you get to see the entire value

[snip]

Please read the rest of the thread before jumping on the 'attack the OP'
bandwagon. This 0.83(r) number is not being displayed. I was always taught
to do my calculations to as many decimal places as possible, then only do
rounding at the last (display) stage, so my final calculations are rounded
to 0, 1 or 2 decimal places.

This 0.83(r) number the first step in a series of basic mathematics
calculations. The accuracy (not precision) of this number is relied on in
later calculations and in one example one number was subtracted from another
number and the result should have been zero. In actual fact, the PC
stored -0.00000000000007 (with the appropriate number of zeros).

My question was simple and everyone has jumped on my back with
binary/decimal conversion explanations, quotes about IEEE and standards, I
simply want to know how to get the computer to do 25 / 33 and store the
answer to a reasonable number of decimal places (at least 6). I don't want
it to store so many decimal places that it can't handle the number and
actually ends up storing a number that is mathematically incorrect (25/30 !=
0.833,333,333,333,333,37)! I understand the binary conversion and why
'double' can't store 0.833,333,333,333,333,33 and the last digit is
therefore random/jumbled/wrong/whatever you want to call it. My question is
simply, given that the last digit can never be relied on, why doesn't it
just not use the last digit when performing maths calculations?

A primary child could see that 25 / 30, then multiplied by 30 should be 25,
but the computer gets it wrong because of this incorrect final digit! How do
I get around this problem using the basic, build in C++ data types?

Les

2007-10-08 10:15:00 UTC

I give up.

Les

Post by Stuart Redmann

Post by Joseph M. Newcomer
See below...

Post by .rhavin grobert

Post by .rhavin grobert

Post by GT
What do you mean by the number might be below double granularity on the
computer? What can I do about this?

your computer stores floatingpoints (float, double, long double) in a
format like
[sign][exponent][fraction]. that means that it can distinguish between
0.0000003 and 0.0000004 but perhaps not between 700000.0000003 and
700000.0000004. on the other hand, it can handle numbers up to
3.4028235*(10^38) for a 32-bit floating point variable.

But if the computer stores numbers in this format, then why does it
add
an
extra digit on the end that it can't handle and more importantly - how
do
I
stop it from adding the extra digit on the end?

it doesnt "add" it.
If you do...
int a = 1.34;
and then look into the memory at &a you'll find a "1", because the
compiler does a
int a = (int) 1.34; => int a = 1;
because the constant is first cast into the appropriate value (1 for
int, because granuarlity of int is 1) and then moved into memory at
&a. For doubles, thats the same. But doubles have granuarlity
antiproportional to absolute (unsigned) size. You can imagine it as a
"floating point" in decimals: guess you have (for example) 10 digits,
a sign (on= + /off = -) and something that states: put decimal point
here...
if you do a = 0.123456789, then it would be printed as 0.123456789, if
you do a = 0.1234567895 than it would be also printet as 0.123456789
because you're below granuarlity.
if you do a a = 1234567890 ten wit will be printed as 1234567890, if
you substact 0.4 then you're below granuarlity and may get 1234567890
or 1234567889 depending on your system.

I understand, but how do I stop the computer from working outwith its
granularity? I understand that there are a finite number of digits in a
double, plus a sign, plus an exponent, but why does the computer display an
extra digit? I asked the computer to do 25/30 and store the answer as a
double. The answer is 0.8333333. The computer should surely only store as
many of the trailing 3s as it has space for. If it has 8 digits, then it
should store 0.83333333, but instead it shows a 7 as the last digit. This
isn't a rounding error, this is just wrong. How do I stop it from
showing/using the extra digits?

This doesn't even make sense. BY DEFINITION, the computer is forced to work within the
limits of its physical design. There is no "extra digit" being added; I have no idea why
you think it is an "extra" digit. It is NOT an "extra" digit, IT IS THE VALUE THE
COMPUTER HAS! If you ask the computer to print out a floating point number to maximum
precision, you are going to see the floating point number to its maximum precision, and
the number you are seeing is the ACTUAL, PRECISE VALUE THAT IS STORED IN THE COMPUTER.
That value is NOT 0.833,333,333,333,333,3 and it is not
0.833,333,333,333,333,33, it is
0.833,333,333,333,333,37. There is *NO* "extra" digit. This IS the value that was
computed. (Note that I put , separators in so you can count the digits!).
There is
absolutely NO way you can represent 0.833,333,333,333,333,33 in 64-bit IEEE 754 floating
point.
You can represent
0.833,333,333,333,333,26
which is represented as the 64-bit hexadecimal number
0xAAAAAAAAAAAAA
and you can represent
0.833,333,333,333,333,37
which is represented as the 64-bit hexadecimal number
0xAAAAAAAAAAAAB
Note that these differ by 1 bit in the low-order position. There are no other
representations possible, and in particular, you cannot possibly find a bit pattern that
represents
0.833,333,333,333,333,33
You obviously asked to display the value to maximum decimal places. If you want to see
fewer decimal places, you would ask it to show the value to fewer decimal places. You
obviously did not do this, so you get to see the entire value

[snip]
Please read the rest of the thread before jumping on the 'attack the OP'
bandwagon. This 0.83(r) number is not being displayed. I was always taught
to do my calculations to as many decimal places as possible, then only do
rounding at the last (display) stage, so my final calculations are rounded
to 0, 1 or 2 decimal places.
This 0.83(r) number the first step in a series of basic mathematics
calculations. The accuracy (not precision) of this number is relied on in
later calculations and in one example one number was subtracted from
another number and the result should have been zero. In actual fact, the
PC stored -0.00000000000007 (with the appropriate number of zeros).
My question was simple and everyone has jumped on my back with
binary/decimal conversion explanations, quotes about IEEE and standards, I
simply want to know how to get the computer to do 25 / 33 and store the
answer to a reasonable number of decimal places (at least 6). I don't want
it to store so many decimal places that it can't handle the number and
actually ends up storing a number that is mathematically incorrect (25/30
!= 0.833,333,333,333,333,37)! I understand the binary conversion and why
'double' can't store 0.833,333,333,333,333,33 and the last digit is
therefore random/jumbled/wrong/whatever you want to call it. My question
is simply, given that the last digit can never be relied on, why doesn't
it just not use the last digit when performing maths calculations?
A primary child could see that 25 / 30, then multiplied by 30 should be
25, but the computer gets it wrong because of this incorrect final digit!
How do I get around this problem using the basic, build in C++ data types?

GT

2007-10-08 10:21:34 UTC

Post by Les
I give up.

Me too - there are only 2 people in this entire group that have actually
read and understood the question! Everyone else simply did not understand
the question and replied with clever sounding explainations of IEEE
standards and binary/decimal conversion!

Joseph M. Newcomer

2007-10-07 01:59:21 UTC

What are you talking about, "adding a digit it can't handle"? There's no digit "added",
and it is representing a value that it BY DEFINITION can handle. You are just expressing
totally false expectations based on an erroneous world view.
joe

Post by .rhavin grobert

Post by GT
What do you mean by the number might be below double granularity on the
computer? What can I do about this?

your computer stores floatingpoints (float, double, long double) in a
format like
[sign][exponent][fraction]. that means that it can distinguish between
0.0000003 and 0.0000004 but perhaps not between 700000.0000003 and
700000.0000004. on the other hand, it can handle numbers up to
3.4028235*(10^38) for a 32-bit floating point variable.

But if the computer stores numbers in this format, then why does it add an
extra digit on the end that it can't handle and more importantly - how do I
stop it from adding the extra digit on the end?

Joseph M. Newcomer [MVP]
email: ***@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

Joseph M. Newcomer

2007-10-07 01:57:16 UTC

No, the computer "doesn't decide for itself". I have no idea why you would think it
decides for itself. It can't "decide for itself". The engineers who designed the IEEE
floating point unit decided that the precision of floating point is 80 bits, and you have
no control over this, nor does "the computer". That's how the floating point unit (FPU)
works, and you can't change it.

When you print something out, you give a formatting code. You have not shown the
formatting code you use, and the floating point formatting subroutine does not "decide for
itself"; the programmers who wrote the floating point formatting subroutine decided, long
ago, what floating point formatting meant, and they format the number according to a very
precisely defined algorithm, which in fact is specified by the ISO/ANSI C standards.
Combined with the specifications of IEEE 754 floating-point arithmetic, the results are
quite deterministic, are absolutely reproducible, and require no "decision" on the part of
"the computer". There's no spontaneous magic here; everything is working as it is
supposed to work.
joe

Post by .rhavin grobert

Post by GT
I have been debugging something for ages now. I have a method that does some
complex maths, but right at the beginning it works out a proportion and a
few ratios and the maths is simply wrong. In my code I (obviously) use
variables and the values vary each time the method is called, but there
seems to a problem with the maths. I have narrowed the problem down to the
following. Can someone else please try this simply calculation and see what
their computer gets.
double effortChangeProportion = (55.0 - 30.0) / 30.0;
This first line does 55-30 and divides the result by 30. In other words
25/30, which is 0.8333 (recurring 3s).
The computer manages to give the answer 0.83333333333333337 !!
effortChangeProportion++;
or
effortChangeProportion = effortChangeProportion + 1.0;
The second line of code (both alternatives give the same result) builds on
the first by simply adding 1 (so I can then multiply other numbers by this
proportion).
In this case 0.8333 becomes 1.8333 , but again the computer gets this wrong.
It tries to add 1 to 0.83333333333333337 and gets 1.8333333333333335.
Obviously this can easily be done in 1 line of code, but it is broken down
to demonstrate the maths going wrong twice!
Can anyone shed some light on this for me please?
GT

what happens if you do...
double test = 0.833333333333333333;
and print it's value? is it 0.83333333333333337?
perhaps the difference between 0.83333333333333333 and
0.83333333333333337 is just below doubles granuarlity an your machine?

That simple line also fails. What can I do about this? I tried multiplying
the number by a million, storing in an int, then dividing by a million
(truncating the number), but the divide by 1 million introduces extra digits
at the end of the number too. I don't understand why the computer would
generate a number that it can't handle - in my original calculation I just
do 25.0/30.0 and store the result in a double - the computer decides for
itself how many decimal places to use! Why would the computer calculate
numbers to a larger number of decimal places than it can handle! I'm using
an Intel CoreDuo T2500 laptop. Now 1 year old, but last year it was a high
spec Dell.
What do you mean by the number might be below double granularity on the
computer? What can I do about this?

Joseph M. Newcomer [MVP]
email: ***@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

GT

2007-10-08 09:35:04 UTC

Post by Joseph M. Newcomer
No, the computer "doesn't decide for itself". I have no idea why you would think it
decides for itself. It can't "decide for itself". The engineers who designed the IEEE
floating point unit decided that the precision of floating point is 80 bits, and you have
no control over this, nor does "the computer". That's how the floating point unit (FPU)
works, and you can't change it.
When you print something out, you give a formatting code. You have not shown the
formatting code you use [snip]

Yes I have - read the rest of the thread before jumping in with both feet.

GT

2007-10-04 15:02:14 UTC

Post by .rhavin grobert

Post by GT
I have been debugging something for ages now. I have a method that does some
complex maths, but right at the beginning it works out a proportion and a
few ratios and the maths is simply wrong. In my code I (obviously) use
variables and the values vary each time the method is called, but there
seems to a problem with the maths. I have narrowed the problem down to the
following. Can someone else please try this simply calculation and see what
their computer gets.
double effortChangeProportion = (55.0 - 30.0) / 30.0;
This first line does 55-30 and divides the result by 30. In other words
25/30, which is 0.8333 (recurring 3s).
The computer manages to give the answer 0.83333333333333337 !!
effortChangeProportion++;
or
effortChangeProportion = effortChangeProportion + 1.0;
The second line of code (both alternatives give the same result) builds on
the first by simply adding 1 (so I can then multiply other numbers by this
proportion).
In this case 0.8333 becomes 1.8333 , but again the computer gets this wrong.
It tries to add 1 to 0.83333333333333337 and gets 1.8333333333333335.
Obviously this can easily be done in 1 line of code, but it is broken down
to demonstrate the maths going wrong twice!
Can anyone shed some light on this for me please?
GT

what happens if you do...
double test = 0.833333333333333333;
and print it's value? is it 0.83333333333333337?
perhaps the difference between 0.83333333333333333 and
0.83333333333333337 is just below doubles granuarlity an your machine?

If it can't handle doubles, you would think it could handle floating point:

float test = 0.833333333333333333;

results in a number 0.83333331.

No need to printF the result, I can step into the code and watch this
hapenning!

AliR (VC++ MVP)

2007-10-04 15:56:09 UTC

float has alot less significant digits then a double.

AliR.

Post by .rhavin grobert

Post by GT
I have been debugging something for ages now. I have a method that does some
complex maths, but right at the beginning it works out a proportion and a
few ratios and the maths is simply wrong. In my code I (obviously) use
variables and the values vary each time the method is called, but there
seems to a problem with the maths. I have narrowed the problem down to the
following. Can someone else please try this simply calculation and see what
their computer gets.
double effortChangeProportion = (55.0 - 30.0) / 30.0;
This first line does 55-30 and divides the result by 30. In other words
25/30, which is 0.8333 (recurring 3s).
The computer manages to give the answer 0.83333333333333337 !!
effortChangeProportion++;
or
effortChangeProportion = effortChangeProportion + 1.0;
The second line of code (both alternatives give the same result) builds on
the first by simply adding 1 (so I can then multiply other numbers by this
proportion).
In this case 0.8333 becomes 1.8333 , but again the computer gets this wrong.
It tries to add 1 to 0.83333333333333337 and gets 1.8333333333333335.
Obviously this can easily be done in 1 line of code, but it is broken down
to demonstrate the maths going wrong twice!
Can anyone shed some light on this for me please?
GT

what happens if you do...
double test = 0.833333333333333333;
and print it's value? is it 0.83333333333333337?
perhaps the difference between 0.83333333333333333 and
0.83333333333333337 is just below doubles granuarlity an your machine?

float test = 0.833333333333333333;
results in a number 0.83333331.
No need to printF the result, I can step into the code and watch this
hapenning!

GT

2007-10-04 14:43:50 UTC

It just got worse. Try this single line of code:

double answer = 183333333 / 100000000.0;

The answer should be 1.83333333. The computer gets 1.8333333300000001

...And make things even simpler by using floating point:

float answer = 183333333 / 100000000.0;

The answer should be 1.83333333. The computer gets 1.8333334

Arrrrrrrrggggghhhhhhhhhh. How can such basic maths be wrong. ?? !! The 20
year old casio calculator with worn off numbers can get it right!

.rhavin grobert

2007-10-04 15:08:23 UTC

Post by GT
double answer = 183333333 / 100000000.0;
The answer should be 1.83333333. The computer gets 1.8333333300000001

and that sounds like on your machine is

((double) 1.83333333 == (double) 1.8333333300000001) = true;

Post by GT
float answer = 183333333 / 100000000.0;
The answer should be 1.83333333. The computer gets 1.8333334

as i said, it's below granuarlity. Try "long double" if you need it
more precisely.

Joseph M. Newcomer

2007-10-07 02:23:53 UTC

The last I looked, 'long double' was not supported, even in VS2005. From the VS2005
documentation:
"the long double type is identical to the double type"
joe

Post by .rhavin grobert

Post by GT
double answer = 183333333 / 100000000.0;
The answer should be 1.83333333. The computer gets 1.8333333300000001

and that sounds like on your machine is
((double) 1.83333333 == (double) 1.8333333300000001) = true;

Post by GT
float answer = 183333333 / 100000000.0;
The answer should be 1.83333333. The computer gets 1.8333334

as i said, it's below granuarlity. Try "long double" if you need it
more precisely.

Joseph M. Newcomer [MVP]
email: ***@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

Joseph M. Newcomer

2007-10-07 02:21:33 UTC

I have no idea why you are insisting that you are getting the wrong answers. You are
getting the correct answers. The only problem you have is unrealistic expectations. You
have confused decimal arithmetic and binary arithmetic. EVERY VALUE YOU HAVE SHOWN HERE
IS ABSOLUTELY CORRECT. Only your expectatations are wrong.

THE ANSWERS ARE NOT WRONG. YOUR EXPECTATIONS ARE WRONG.
joe

Post by GT
double answer = 183333333 / 100000000.0;
The answer should be 1.83333333. The computer gets 1.8333333300000001
float answer = 183333333 / 100000000.0;
The answer should be 1.83333333. The computer gets 1.8333334
Arrrrrrrrggggghhhhhhhhhh. How can such basic maths be wrong. ?? !! The 20
year old casio calculator with worn off numbers can get it right!

Joseph M. Newcomer [MVP]
email: ***@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

GT

2007-10-08 09:37:22 UTC

Post by Joseph M. Newcomer
I have no idea why you are insisting that you are getting the wrong
answers. [snip]

Then you need to go back to primary school mathematics classes!

What is 25/30? What is the result of that times by 30? And now subtract that
from 25 and display the result to 1 decimal place? Computer gets the answer
wrong - it says "-0.0". The answer should be "0.0" !

Les

2007-10-08 10:30:28 UTC

Post by Joseph M. Newcomer
I have no idea why you are insisting that you are getting the wrong
answers. [snip]

Then you need to go back to primary school mathematics classes!

OK.
So hundreds of thousands of us programmers need to go back to school.
Millions of PCs need to be recalled because "the math is wrong".

Alternatively you could find a class or textbook on Numerical Methods and
computing.

Post by GT
What is 25/30?

In integer math it is 0
In floating point math 0.833333 approximately

Post by GT
What is the result of that times by 30?

integer math answer 0
floating point math answer 24.999999 aprox
(no matter how many recurring 3's less than infinity you have the answer is
*always* 24.9 recurring.)

Post by GT
And now subtract that from 25 and display the result to 1 decimal place?

integer answer 25.0
in floating point math approx 0.000001
which according to you would be "wrong because the computer has added an
extra "1" from somewhere"

Post by GT
Computer gets the answer wrong - it says "-0.0". The answer should be
"0.0" !

I and others have tried, given your obvious inexperience, to help you
understand, with references and examples. I am obviously not a very good
teacher so I will leave it to others from now on.

Les

GT

2007-10-08 12:13:24 UTC

Post by Joseph M. Newcomer
I have no idea why you are insisting that you are getting the wrong
answers. [snip]

Then you need to go back to primary school mathematics classes!

OK.
So hundreds of thousands of us programmers need to go back to school.
Millions of PCs need to be recalled because "the math is wrong".
Alternatively you could find a class or textbook on Numerical Methods and
computing.

Post by GT
What is 25/30?

In integer math it is 0
In floating point math 0.833333 approximately

This is my point exactly - it is NOT 0.833337. That number is WRONG.

David Wilkinson

2007-10-08 12:32:04 UTC

Post by GT
What is 25/30?

In integer math it is 0
In floating point math 0.833333 approximately

This is my point exactly - it is NOT 0.833337. That number is WRONG.

GT:

No, your expectations are wrong. When the computer displays 0.833337, it
is showing the decimal conversion of the binary number that it stores
internally.

25/30 is a number that cannot be represented exactly in either decimal
or binary, and the two roundings are "incommensurate". with each other.

--
David Wilkinson
Visual C++ MVP

AliR (VC++ MVP)

2007-10-04 15:10:28 UTC

Maybe this will clear some of it up:
http://www.google.com/search?q=Significant+digits+double&hl=en

AliR.

Post by GT
I have been debugging something for ages now. I have a method that does
some complex maths, but right at the beginning it works out a proportion
and a few ratios and the maths is simply wrong. In my code I (obviously)
use variables and the values vary each time the method is called, but there
seems to a problem with the maths. I have narrowed the problem down to the
following. Can someone else please try this simply calculation and see what
their computer gets.
double effortChangeProportion = (55.0 - 30.0) / 30.0;
This first line does 55-30 and divides the result by 30. In other words
25/30, which is 0.8333 (recurring 3s).
The computer manages to give the answer 0.83333333333333337 !!
effortChangeProportion++;
or
effortChangeProportion = effortChangeProportion + 1.0;
The second line of code (both alternatives give the same result) builds on
the first by simply adding 1 (so I can then multiply other numbers by this
proportion).
In this case 0.8333 becomes 1.8333 , but again the computer gets this
wrong. It tries to add 1 to 0.83333333333333337 and gets
1.8333333333333335.
Obviously this can easily be done in 1 line of code, but it is broken down
to demonstrate the maths going wrong twice!
Can anyone shed some light on this for me please?
GT

GT

2007-10-04 15:27:28 UTC

Post by AliR (VC++ MVP)
http://www.google.com/search?q=Significant+digits+double&hl=en

That info on significant digits was all very interesting, but how do I get
my code to work properly?!? I want to force the computer to calculate and
store as many digits as it can handle and no extra spurious digits, so that
my calculations get nice accurate results. At the moment part of my
calculation ends up with -0.00000000000007 (haven't counted the zeros here,
but you get the idea!), where it should be zero.

Les

2007-10-04 15:57:23 UTC

Post by AliR (VC++ MVP)
http://www.google.com/search?q=Significant+digits+double&hl=en

That info on significant digits was all very interesting, but how do I get
my code to work properly?!? I want to force the computer to calculate and
store as many digits as it can handle and no extra spurious digits, so
that my calculations get nice accurate results. At the moment part of my
calculation ends up with -0.00000000000007 (haven't counted the zeros
here, but you get the idea!), where it should be zero.

If your algorithm is susceptible to differences at the 17th decimal place
(as in your original post) then your algorithm is wrong.
There are an infinite number of numbers between 0.830 and 0.840 (ie between
*any* two numbers) unfortunately computers only have a limited number of
bytes in which to store these numbers, and a "standard defined" way of
storing them if it uses for example IEEE format.
Many numbers cannot be represented exactly in binary and so a compromise is
assumed.
If you were to try "by hand" your calculation, then the recurring 3's of
25.0/30.0 would cause you to run out of paper before you could do the actual
math. So you mentally adjust the *correct* number to (an incorrect) one
which you can handle.

Les

GT

2007-10-04 16:14:48 UTC

Post by AliR (VC++ MVP)
http://www.google.com/search?q=Significant+digits+double&hl=en

That info on significant digits was all very interesting, but how do I
get my code to work properly?!? I want to force the computer to calculate
and store as many digits as it can handle and no extra spurious digits,
so that my calculations get nice accurate results. At the moment part of
my calculation ends up with -0.00000000000007 (haven't counted the zeros
here, but you get the idea!), where it should be zero.

If your algorithm is susceptible to differences at the 17th decimal place
(as in your original post) then your algorithm is wrong.
There are an infinite number of numbers between 0.830 and 0.840 (ie
between *any* two numbers) unfortunately computers only have a limited
number of bytes in which to store these numbers, and a "standard defined"
way of storing them if it uses for example IEEE format.
Many numbers cannot be represented exactly in binary and so a compromise
is assumed.
If you were to try "by hand" your calculation, then the recurring 3's of
25.0/30.0 would cause you to run out of paper before you could do the
actual math. So you mentally adjust the *correct* number to (an incorrect)
one which you can handle.

Almost - your last part is wrong. You don't *adjust* the number, you
*truncate* the number to a certain level of precision. Even if you truncate
the number to 2 decimal places it is 0.83 not 0.87 !!

In the case of double the level of precision seems to be 16 digits, but this
is actually irrelevant. Whatever the number of digits used, the last digit
should be a 3 not a 7 !! My algorithm is fine, the calculation is producing
an incorrect answer. At a point later in my calculation I end up subtracting
one number from another and the answer should be zero, but instead (due to
the above problem) the answer is not == 0.

Les

2007-10-04 16:39:59 UTC

Post by Les
If you were to try "by hand" your calculation, then the recurring 3's of
25.0/30.0 would cause you to run out of paper before you could do the
actual math. So you mentally adjust the *correct* number to (an
incorrect) one which you can handle.

Almost - your last part is wrong. You don't *adjust* the number, you
*truncate* the number to a certain level of precision. Even if you
truncate the number to 2 decimal places it is 0.83 not 0.87 !!

Semantics : by truncating the number you are adjusting the precision.
And the truncated number is still "incorrect" it is an approximation to the
actual number.

Can 0.83recurring (to however many decimal places) be represented *exactly*
IN BINARY?
In fact can any of the numbers you use be represented *exactly* IN BINARY?
THAT IS THE WHOLE POINT.

eg 0.1 cannot be represented exactly in binary.

Most computer do not use decimal arithmetic (there certainly have been some
in the past that did)

Post by GT
In the case of double the level of precision seems to be 16 digits, but
this is actually irrelevant. Whatever the number of digits used, the last
digit should be a 3 not a 7 !! My algorithm is fine, the calculation is
producing an incorrect answer. At a point later in my calculation I end up
subtracting one number from another and the answer should be zero, but
instead (due to the above problem) the answer is not == 0.

repeat
Most computer do not use decimal arithmetic.

Les

GT

2007-10-04 21:54:23 UTC

Post by Les
If you were to try "by hand" your calculation, then the recurring 3's of
25.0/30.0 would cause you to run out of paper before you could do the
actual math. So you mentally adjust the *correct* number to (an
incorrect) one which you can handle.

Almost - your last part is wrong. You don't *adjust* the number, you
*truncate* the number to a certain level of precision. Even if you
truncate the number to 2 decimal places it is 0.83 not 0.87 !!

Semantics : by truncating the number you are adjusting the precision.
And the truncated number is still "incorrect" it is an approximation to
the actual number.
Can 0.83recurring (to however many decimal places) be represented
*exactly* IN BINARY?
In fact can any of the numbers you use be represented *exactly* IN BINARY?
THAT IS THE WHOLE POINT.
eg 0.1 cannot be represented exactly in binary.
Most computer do not use decimal arithmetic (there certainly have been
some in the past that did)

Post by GT
In the case of double the level of precision seems to be 16 digits, but
this is actually irrelevant. Whatever the number of digits used, the last
digit should be a 3 not a 7 !! My algorithm is fine, the calculation is
producing an incorrect answer. At a point later in my calculation I end
up subtracting one number from another and the answer should be zero, but
instead (due to the above problem) the answer is not == 0.

repeat
Most computer do not use decimal arithmetic.

We all know computers are binary machines, not decimal, but why can't the
computer do the same basic maths as a pocket calculator?!? 25/30 is
0.833recurring and it should calculate, store and display as many digits as
can be stored at the desired level of granularity, not randomly invent an
extra digit on the end! I repeat my question:

How do I get the computer to calculate and store the number of digits that
it can handle and no more. I don't care if it calculates 6, 10, 15, 27
digits, but I do *require* a correct result 25/30 is *NOT* 0.83337 or
0.8333337 or 0.833333333333333333337. and 0.8333337 + 1 is *NOT* 1.8333335
!!

If the C++ basic data types can't handle this basic mathematics, then how
can we even trust it to do 1+1 etc.

Ashot Geodakov

2007-10-04 23:01:12 UTC

Post by GT
We all know computers are binary machines, not decimal, but why can't the
computer do the same basic maths as a pocket calculator?!? 25/30 is
0.833recurring and it should calculate, store and display as many digits
as can be stored at the desired level of granularity, not randomly invent
How do I get the computer to calculate and store the number of digits that
it can handle and no more. I don't care if it calculates 6, 10, 15, 27
digits, but I do *require* a correct result 25/30 is *NOT* 0.83337 or
0.8333337 or 0.833333333333333333337. and 0.8333337 + 1 is *NOT* 1.8333335
!!
If the C++ basic data types can't handle this basic mathematics, then how
can we even trust it to do 1+1 etc.

I always wondered myself, why the diagonal lines that I draw in PaintBrush
do not look like true lines. Instead they are these choppy pixels that jump
all over the place and try to convince me they are indeed lines.

Scott McPhillips [MVP]

2007-10-04 22:44:53 UTC

Post by GT
How do I get the computer to calculate and store the number of digits that
it can handle and no more.

You have received a lot of good answers and advice here, but you are not
absorbing it.

First you get a lot more educated about the underlying issue (the referenced
paper will help). And then you optimize your calculations and determine the
worst case error propagation. Once you know that you round the result to
the number of digits that you have determined will be reliable.

--
Scott McPhillips [VC++ MVP]

Alexander Grigoriev

2007-10-05 03:10:56 UTC

1. Take computational math 101.
2. Adjust your expectations.
3. If you calculate the "same" number using different sequence of
operations, you'll get different result.
4. There is no exact result with floating point. Live with it.

Post by GT
We all know computers are binary machines, not decimal, but why can't the
computer do the same basic maths as a pocket calculator?!? 25/30 is
0.833recurring and it should calculate, store and display as many digits
as can be stored at the desired level of granularity, not randomly invent
How do I get the computer to calculate and store the number of digits that
it can handle and no more. I don't care if it calculates 6, 10, 15, 27
digits, but I do *require* a correct result 25/30 is *NOT* 0.83337 or
0.8333337 or 0.833333333333333333337. and 0.8333337 + 1 is *NOT* 1.8333335
!!
If the C++ basic data types can't handle this basic mathematics, then how
can we even trust it to do 1+1 etc.

Les

2007-10-05 08:19:51 UTC

Post by GT
We all know computers are binary machines, not decimal, but why can't the
computer do the same basic maths as a pocket calculator?!?

Hand calculators (and I believe the calculator in Windows) use internal
decimal arithmetic.

0.833333 is not representable in binary.

If you need your program to work with full accuracy and precision to 17
decimal places then you will need to write your own package working in
decimal math, or as Luke said

"find an arbitrary precision math library on the web and use it."

Post by GT
25/30 is 0.833recurring and it should calculate, store and display as many
digits as can be stored at the desired level of granularity, not randomly
invent an extra digit on the end!

The extra digit is not _random_ it is caused by the binary value being the
closest it can get to the number you specify / calculate.
Then we can only repeat the same answers. You really need to know about
floating point math and "internal representation"

Post by GT
How do I get the computer to calculate and store the number of digits that
it can handle and no more. I don't care if it calculates 6, 10, 15, 27
digits, but I do *require* a correct result 25/30 is *NOT* 0.83337 or
0.8333337 or 0.833333333333333333337. and 0.8333337 + 1 is *NOT* 1.8333335
!!

(repeat, sorry)
If you need your program to work with full accuracy and precision to 17
decimal places then you will need to write your own package working in
decimal math, or as Luke said

"find an arbitrary precision math library on the web and use it."

Les

GT

2007-10-05 09:38:10 UTC

Post by GT
We all know computers are binary machines, not decimal, but why can't the
computer do the same basic maths as a pocket calculator?!?

Hand calculators (and I believe the calculator in Windows) use internal
decimal arithmetic.
0.833333 is not representable in binary.
If you need your program to work with full accuracy and precision to 17
decimal places then you will need to write your own package working in
decimal math, or as Luke said
"find an arbitrary precision math library on the web and use it."

As I have already said, I don't need precision to 17 decimal places - 5 or 6
would be sufficient. I am just doing a series of basic calculations. Using
floating point doesn't help - it just brings the erroneous digit closer to
the decimal point.

I would like to know how to use the basic C++ datatypes and where the last
digit is 'wrong' or 'unreliable', just don't use it. Everyone keeps saying
'just use the digits you can rely on', but how? I have a double
effortChangeProportion which has 15 or 16 decimal places. HOW do I tell it
to just use the first 10 decimal places, thereby dropping the final digit
which is knocking out my calculations? I can't find any methods on a double
which tell it to truncate digits.

In order to drop the last digit, I have tried multiplying my number by 1
million, storing it in an int, then dividing by 1 million in order to remove
any trailing digits, but of course, the divide by 1 million still introduces
an unwanted digit at the end. I understand the theory, but how do I get
around it simply without having to find maths libraries. You shouldn't have
to use maths libraries for some basic multiplication and division! I can
understand maths libraries for Sin curves, bezier plotting, complex numbers
etc, but double effortChangeProportion = 25/30; is about as basic as maths
can get - if the language's basic data types can't handle x digits, then it
should use, store, manipulate and display x-1 digits instead!

Les

2007-10-05 11:18:52 UTC

"GT" <***@hotmail.com> wrote in message news:47060578$0$29264$***@news.astraweb.com...

<snip>

Post by GT
As I have already said, I don't need precision to 17 decimal places - 5 or
6 would be sufficient. I am just doing a series of basic calculations.
Using floating point doesn't help - it just brings the erroneous digit
closer to the decimal point.

yes float is less precise than double.

It appears from what you say that somewhere along the way in your algorithm
the end digit becomes significant.
Perhaps if you posted your code someone might help identify ways in which
the "correct" answer might be achieved.

Les

GT

2007-10-05 12:30:31 UTC

Post by Les
<snip>

Post by GT
As I have already said, I don't need precision to 17 decimal places - 5
or 6 would be sufficient. I am just doing a series of basic calculations.
Using floating point doesn't help - it just brings the erroneous digit
closer to the decimal point.

yes float is less precise than double.
It appears from what you say that somewhere along the way in your
algorithm the end digit becomes significant.
Perhaps if you posted your code someone might help identify ways in which
the "correct" answer might be achieved.

You wanted to see the code. I have cut out the irrelevant stuff and replaced
variable names with example numbers, but this is an example set of figures
that we saw:

double effortChangeProportion = 1.0 + ((55.0 - 30.0) / 30.0); // should =
1.833 recurring, but it adds a 5 on the end!

double answer = 30 * effortChangeProportion; // should be 55, but the extra
digit causes this to be 55.00000something

double answer2 = 55 - answer ; // should be 55 - 55 = 0, but the variable
answer has slightly more than 55, so answer isn't 0

CString words;
words.Format(_T("%.1f"), answer2);
m_efResultBox3.SetWindowText(words);

answer2 should appear in my dialog as 0.0, but is actually shown as -0.0.
This looks very untidy and mathematically means something else to our
application!

The reason it shows -0.0 is because answer2 == -0.00000000000005 (number of
zeros == precision of storage)

The problem arrises at the very first step, but my question is actually
simple. Using this basic C++ data type, how can I prevent the computer from
using the last, incorrect digit?

Les

2007-10-05 16:04:28 UTC

Post by GT
You wanted to see the code. I have cut out the irrelevant stuff and
replaced variable names with example numbers, but this is an example set
double effortChangeProportion = 1.0 + ((55.0 - 30.0) / 30.0); // should =
1.833 recurring, but it adds a 5 on the end!

If at this point you assign
float fudge = effortChangeProportion ;

this will truncate the offending "extras".

Post by GT
double answer = 30 * effortChangeProportion; // should be 55, but the
extra digit causes this to be 55.00000something

double answer = 30.0 * fudge;

Post by GT
double answer2 = 55 - answer ; // should be 55 - 55 = 0, but the variable
answer has slightly more than 55, so answer isn't 0

now this should give 0.0
(it did on my little test program)

Les

BTW.
Looking at the memory while running my test program.

0.833333... when stored in a double is ab aa aa aa aa aa ea 3f in hex
(remember endian-ness)
This as you have found out equates to 0.83.....337
Changing the "ab" to "aa" (ie the final bit from "1" to "0")
equates to 0.83.....326

Now you want ...333 which the computer can't produce.
The difference between 333 and 326 is 7
The difference between 333 and 337 is 4
so 337 is "nearer" 333 and thus is chosen as the more accurate approximation
to 333

so changing just one final bit makes a difference of .....11 between those
two numbers that can be stored in 8 bytes.
Any number greater than 0.83....326 and less than 0.83....337 will have
one of these approximate values.

Michael K. O'Neill

2007-10-05 16:36:25 UTC

Post by GT
You wanted to see the code. I have cut out the irrelevant stuff and replaced
variable names with example numbers, but this is an example set of figures
double effortChangeProportion = 1.0 + ((55.0 - 30.0) / 30.0); // should =
1.833 recurring, but it adds a 5 on the end!
double answer = 30 * effortChangeProportion; // should be 55, but the extra
digit causes this to be 55.00000something
double answer2 = 55 - answer ; // should be 55 - 55 = 0, but the variable
answer has slightly more than 55, so answer isn't 0
CString words;
words.Format(_T("%.1f"), answer2);
m_efResultBox3.SetWindowText(words);
answer2 should appear in my dialog as 0.0, but is actually shown as -0.0.
This looks very untidy and mathematically means something else to our
application!
The reason it shows -0.0 is because answer2 == -0.00000000000005 (number of
zeros == precision of storage)
The problem arrises at the very first step, but my question is actually
simple. Using this basic C++ data type, how can I prevent the computer from
using the last, incorrect digit?

Even if floating point math worked the way you think it should work, you
still wouldn't get the answer you want. You stated that single precision
would be enough if only the last digit wasn't "dodgy" (your word). Let's
see what would happen:

double effortChangeProportion = 1.0 + ((55.0 - 30.0) / 30.0); // = 1.833333,
i.e., perfect last digit

double answer = 30 * effortChangeProportion; // = 54.99999, again, perfect
last digit

double answer2 = 55 - answer ; // = 0.00001, i.e., not zero

So, even if floating point math worked the way you think it should, your
algorithm still gets the wrong (non-zero) answer. Therefore, the fact that
you get a non-zero answer when floating point math works the way it's
designed to, should be totally unsurprising.

As others have mentioned, you need to re-design the algorithm, taking
account of the fact that floating point math is a discrete real number
system, not a continuous real number system.

If you tell us what you are trying to accomplish, maybe others here can
help. Right now, with the examples you have given, it seems that you are
trying to show that zero = zero, which doesn't really make much sense.

Mike

Joseph M. Newcomer

2007-10-07 03:04:40 UTC

You are seeing EXACTLY WHAT :YOU SHOULD SEE. I have no idea why you have such erroneous
expectations, especially after everything people have been trying to explain to you. Your
fundamental question is nonsensical as stated and consequently pointless to ask and
impossible to answer.
joe

Post by Les
<snip>

Post by GT
As I have already said, I don't need precision to 17 decimal places - 5
or 6 would be sufficient. I am just doing a series of basic calculations.
Using floating point doesn't help - it just brings the erroneous digit
closer to the decimal point.

yes float is less precise than double.
It appears from what you say that somewhere along the way in your
algorithm the end digit becomes significant.
Perhaps if you posted your code someone might help identify ways in which
the "correct" answer might be achieved.

You wanted to see the code. I have cut out the irrelevant stuff and replaced
variable names with example numbers, but this is an example set of figures
double effortChangeProportion = 1.0 + ((55.0 - 30.0) / 30.0); // should =
1.833 recurring, but it adds a 5 on the end!
double answer = 30 * effortChangeProportion; // should be 55, but the extra
digit causes this to be 55.00000something
double answer2 = 55 - answer ; // should be 55 - 55 = 0, but the variable
answer has slightly more than 55, so answer isn't 0
CString words;
words.Format(_T("%.1f"), answer2);
m_efResultBox3.SetWindowText(words);
answer2 should appear in my dialog as 0.0, but is actually shown as -0.0.
This looks very untidy and mathematically means something else to our
application!
The reason it shows -0.0 is because answer2 == -0.00000000000005 (number of
zeros == precision of storage)
The problem arrises at the very first step, but my question is actually
simple. Using this basic C++ data type, how can I prevent the computer from
using the last, incorrect digit?

Joseph M. Newcomer [MVP]
email: ***@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

.rhavin grobert

2007-10-05 15:48:12 UTC

Post by GT
I would like to know how to use the basic C++ datatypes and where the last
digit is 'wrong' or 'unreliable', just don't use it. Everyone keeps saying
'just use the digits you can rely on', but how? I have a double
effortChangeProportion which has 15 or 16 decimal places. HOW do I tell it
to just use the first 10 decimal places, thereby dropping the final digit
which is knocking out my calculations? I can't find any methods on a double
which tell it to truncate digits.
You cant tell yout computer "drop the last x digits in decimal system" when your number is stored in BINARY.

if you're interested in the last 5 digits of 0.444444443094876 _as a
number_ then you should do a

int iResult = (0.444444443094876 * 10000);
then do the rest of your calculation and divide by 10000 at the end
(according to your function you of course may do more than this).

if you're interested in the last 5 digits of 0.444444443094876 _as a
string_ then you should simply do a

CString str;
str.Format("%.5f", 0.444444443094876 );

rhavin;)

GT

2007-10-05 16:01:49 UTC

Post by .rhavin grobert

Post by GT
I would like to know how to use the basic C++ datatypes and where the last
digit is 'wrong' or 'unreliable', just don't use it. Everyone keeps saying
'just use the digits you can rely on', but how? I have a double
effortChangeProportion which has 15 or 16 decimal places. HOW do I tell it
to just use the first 10 decimal places, thereby dropping the final digit
which is knocking out my calculations? I can't find any methods on a double
which tell it to truncate digits.
You cant tell yout computer "drop the last x digits in decimal system"
when your number is stored in BINARY.

if you're interested in the last 5 digits of 0.444444443094876 _as a
number_ then you should do a
int iResult = (0.444444443094876 * 10000);
then do the rest of your calculation and divide by 10000 at the end
(according to your function you of course may do more than this).

Thanks, but I have tried this, but the divide by 10000 then re-introduces an
erroneous digit on the end of the number, so gains me nothing!

Post by .rhavin grobert
if you're interested in the last 5 digits of 0.444444443094876 _as a
string_ then you should simply do a
CString str;
str.Format("%.5f", 0.444444443094876 );
rhavin;)

.rhavin grobert

2007-10-05 16:24:37 UTC

Post by GT
I would like to know how to use the basic C++ datatypes and where the last
digit is 'wrong' or 'unreliable', just don't use it. Everyone keeps saying
'just use the digits you can rely on', but how? I have a double
effortChangeProportion which has 15 or 16 decimal places. HOW do I tell it
to just use the first 10 decimal places, thereby dropping the final digit
which is knocking out my calculations? I can't find any methods on a double
which tell it to truncate digits.

You cant tell yout computer "drop the last x digits in decimal system"
when your number is stored in BINARY.
if you're interested in the last 5 digits of 0.444444443094876 _as a
number_ then you should do a
int iResult = (0.444444443094876 * 10000);
then do the rest of your calculation and divide by 10000 at the end
(according to your function you of course may do more than this).

Thanks, but I have tried this, but the divide by 10000 then re-introduces an
erroneous digit on the end of the number, so gains me nothing!

it cant "introduce an erroneous digit". If you do a 1/3 then stating
"1/3" or "3^(-1)" are the only exact results.
stating "0.33333333" is not wrong, it's just not exact. the same
applies to "0.33333334" or even
"0.33333333333333333333333333333333333333333333333333333333333333333"

if you want a "user friendly" output, then you have to format a
string, if you want a result your programm can live with, you have to
code your programm according to your needs. If you need to know wether
toe result comes near zero, you can do something like

bool NearZero(double db)
{
return ((db*db) < 0.00001);
}

rhavin;)

GT

2007-10-08 09:43:21 UTC

Post by .rhavin grobert

Post by GT
I would like to know how to use the basic C++ datatypes and where the last
digit is 'wrong' or 'unreliable', just don't use it. Everyone keeps saying
'just use the digits you can rely on', but how? I have a double
effortChangeProportion which has 15 or 16 decimal places. HOW do I
tell
it
to just use the first 10 decimal places, thereby dropping the final digit
which is knocking out my calculations? I can't find any methods on a double
which tell it to truncate digits.

You cant tell yout computer "drop the last x digits in decimal system"
when your number is stored in BINARY.
if you're interested in the last 5 digits of 0.444444443094876 _as a
number_ then you should do a
int iResult = (0.444444443094876 * 10000);
then do the rest of your calculation and divide by 10000 at the end
(according to your function you of course may do more than this).

Thanks, but I have tried this, but the divide by 10000 then re-introduces an
erroneous digit on the end of the number, so gains me nothing!

it cant "introduce an erroneous digit". If you do a 1/3 then stating
"1/3" or "3^(-1)" are the only exact results.
stating "0.33333333" is not wrong, it's just not exact. the same
applies to "0.33333334" or even

I disagree:
0.3 is accurate (correct), but not very precise - only 1 decimal place
0.333 is accurate (correct) and precise to 3 decimal places
0.33333333 is accurate (correct) and precise to 8 decimal places
0.33333334 is inaccurate (*WRONG*) and precise to 8 decimal places

Joseph M. Newcomer

2007-10-07 03:06:42 UTC

If you stop saying "introduces an erroneous digit" and replace it with "I am incapable of
understanding reality" you will be closer to the correct answer to your problem, and that
tells you what you need to fix.
joe

Post by .rhavin grobert

Post by GT
I would like to know how to use the basic C++ datatypes and where the last
digit is 'wrong' or 'unreliable', just don't use it. Everyone keeps saying
'just use the digits you can rely on', but how? I have a double
effortChangeProportion which has 15 or 16 decimal places. HOW do I tell it
to just use the first 10 decimal places, thereby dropping the final digit
which is knocking out my calculations? I can't find any methods on a double
which tell it to truncate digits.
You cant tell yout computer "drop the last x digits in decimal system"
when your number is stored in BINARY.

if you're interested in the last 5 digits of 0.444444443094876 _as a
number_ then you should do a
int iResult = (0.444444443094876 * 10000);
then do the rest of your calculation and divide by 10000 at the end
(according to your function you of course may do more than this).

Thanks, but I have tried this, but the divide by 10000 then re-introduces an
erroneous digit on the end of the number, so gains me nothing!

Post by .rhavin grobert
if you're interested in the last 5 digits of 0.444444443094876 _as a
string_ then you should simply do a
CString str;
str.Format("%.5f", 0.444444443094876 );
rhavin;)

Joseph M. Newcomer [MVP]
email: ***@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

GT

2007-10-08 09:44:07 UTC

Post by Joseph M. Newcomer
If you stop saying "introduces an erroneous digit" and replace it with "I am incapable of
understanding reality" you will be closer to the correct answer to your problem, and that
tells you what you need to fix.

And if you take your feet out of your mouth long enough, you can get back to
school and learn some basic maths!

Norbert Unterberg

2007-10-05 22:50:04 UTC

Post by Les
If you need your program to work with full accuracy and precision to 17
decimal places then you will need to write your own package working in
decimal math, or as Luke said
"find an arbitrary precision math library on the web and use it."

As I have already said, I don't need precision to 17 decimal places - 5 or 6
would be sufficient. I am just doing a series of basic calculations. Using
floating point doesn't help - it just brings the erroneous digit closer to
the decimal point.

GT,
you seem not to understand the nature of the IEEE floating point representation.

Just to clarify one thing: The two "basic C++ data types" as you name them,
double and float, both represent floating point types, float with 32 bits,
double with 64 bits (double precision). So using float instead of double does
not mean "using floating point", but "using floating point with less precision
as double".

When moving from integer or fix comma arithmetic to floating point, you leave
the path of exact representation of numbers, and that in a double sense. The
first error is the limited number of binary digits that are available to store
the numeric value as a double data type. The second error is when you print the
numbers on the screen (debugger or printf output).

Example: Lets imagine that the floating point type just has 4 significant bits
to store the mantissa. You could only store numbers in units of 1/(2^4)
(0.0625). Now you want to store the value 2/3. Problem is, you can't. The
closest you get is 11*0.0625 = 0.6875. If you want to see the number in the
debug window, how many digits will it you see? You have four bits, that gives
you a precision of log10(16) which is about 1.2. Using one digit would lose
precision, so the debugger prints two digits: 0.69.

Do you see the two problems?
0.69 is not the real value that you have in your variable.
The variable has the value 0.6875.
0.6875 is still not the correct result, that would be 0.6666666666...

You are asking in your mails is how to strip the "9" from 0.69 (or make it 0.67)
to move the result nearer to 0.666666. You can't. You fail to see that 0.69 is
not what you have, even if the debugger shows it. The value in your variable
already is as precise as you can get, so there is no need to shift precision or
digits around.

What do we learn from this:

1. Learn to trust the floating point math. It is usually better than the values
you see displayed on the screen.

2. Never compare floating point numbers for equality. Floating point variables
contain values that are quite close to the real values, but not exactly. Code like:
double a, b;
a = 1.0/3.0;
b = a * 3.0;
if (b == 3.0) { do_something(); }
is just wrong, buggy code.
If you need to compare floating point numbers, compare to a very small value
that is good enough for your application. In the example above, you could write
something like this:
const double EPSILON = 0.00000001;
if (fabs(b-3.0) < EPSILON)
{
// b is very close to 3
}

3. Avoid adding/subtracting numbers of different magnitude. While 1/10000000 can
be represented quite well, 1/10000000 + 1 can not. In some cases this means that
you have to redesign algorithms to avoid these situations.

Post by GT
I would like to know how to use the basic C++ datatypes and where the last
digit is 'wrong' or 'unreliable', just don't use it. Everyone keeps saying
'just use the digits you can rely on', but how?

As I described above. Check your algorithms for #3. And then select an EPSILON
like in #2 to use the significant digits you really need.

Post by GT
I have a double
effortChangeProportion which has 15 or 16 decimal places. HOW do I tell it
to just use the first 10 decimal places, thereby dropping the final digit
which is knocking out my calculations? I can't find any methods on a double
which tell it to truncate digits.

You do it implicitly by deciding what EPSILON you need when comparing your results.

Post by GT
In order to drop the last digit, I have tried multiplying my number by 1
million, storing it in an int, then dividing by 1 million in order to remove
any trailing digits, but of course, the divide by 1 million still introduces
an unwanted digit at the end.

You mistake is that you only see the decimal digits that the debugger shows you.
Computers think in binary digits, not decimal digits, so the decimal digits do
not exist in the floating point math. These displayed numbers are not the values
that your variables really contain. By trying to remove non-existing decimal
digits you only make the result worse not better.

Norbert

Joseph M. Newcomer

2007-10-07 03:02:25 UTC

I have no idea whatsoever why you are using the word "wrong". Now you've introduced the
word "unreliable". WHAT PART OF WHAT WE HAVE ALL BEEN TELLING YOU ARE YOU MISSING?

The answer is correct. The answer is UTTERLY DETERMINISTIC and COMPLETELY RELIABLE. The
fact that it doesn't conform to your delusional system of decimal arithmetic is your
problem, not the computer's.

You cannot tell it to use 10 decimal places. You always get the precision of the
underlying floating point representation. Yes, you can play some games that will
perpetrate an illusion of less precision, but note that these introduce NEW artifacts of
errors, which will typically be substantially WORSE roundoff errors than what you are
currently seeing.

Please, stop whining and get in touch with reality.
joe

Post by GT
We all know computers are binary machines, not decimal, but why can't the
computer do the same basic maths as a pocket calculator?!?

Hand calculators (and I believe the calculator in Windows) use internal
decimal arithmetic.
0.833333 is not representable in binary.
If you need your program to work with full accuracy and precision to 17
decimal places then you will need to write your own package working in
decimal math, or as Luke said
"find an arbitrary precision math library on the web and use it."

As I have already said, I don't need precision to 17 decimal places - 5 or 6
would be sufficient. I am just doing a series of basic calculations. Using
floating point doesn't help - it just brings the erroneous digit closer to
the decimal point.
I would like to know how to use the basic C++ datatypes and where the last
digit is 'wrong' or 'unreliable', just don't use it. Everyone keeps saying
'just use the digits you can rely on', but how? I have a double
effortChangeProportion which has 15 or 16 decimal places. HOW do I tell it
to just use the first 10 decimal places, thereby dropping the final digit
which is knocking out my calculations? I can't find any methods on a double
which tell it to truncate digits.
In order to drop the last digit, I have tried multiplying my number by 1
million, storing it in an int, then dividing by 1 million in order to remove
any trailing digits, but of course, the divide by 1 million still introduces
an unwanted digit at the end. I understand the theory, but how do I get
around it simply without having to find maths libraries. You shouldn't have
to use maths libraries for some basic multiplication and division! I can
understand maths libraries for Sin curves, bezier plotting, complex numbers
etc, but double effortChangeProportion = 25/30; is about as basic as maths
can get - if the language's basic data types can't handle x digits, then it
should use, store, manipulate and display x-1 digits instead!

Joseph M. Newcomer [MVP]
email: ***@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

GT

2007-10-08 09:45:46 UTC

Post by Joseph M. Newcomer
I have no idea whatsoever why you are using the word "wrong". Now you've introduced the
word "unreliable". WHAT PART OF WHAT WE HAVE ALL BEEN TELLING YOU ARE YOU MISSING?

I have no problem understanding binary to decimal conversion and why the
final digit is unreliable in on-going calculations, but what we are missing
here is an answer to my question - how do I get the computer to work to a
number of decimal places that are mathematically reliable! Please re-read
the thread and reply when you have something useful to add.

Stuart Redmann

2007-10-05 08:44:53 UTC

GT wrote:

[snip]

Post by GT
How do I get the computer to calculate and store the number of digits that
it can handle and no more. I don't care if it calculates 6, 10, 15, 27
digits, but I do *require* a correct result 25/30 is *NOT* 0.83337 or
0.8333337 or 0.833333333333333333337. and 0.8333337 + 1 is *NOT* 1.8333335
!!
If the C++ basic data types can't handle this basic mathematics, then how
can we even trust it to do 1+1 etc.

There is little to add to what Scott and Alexander have already said. Working
with floating point numbers _always_ involves deep knowledge of their internal
representation. That's why there are a lot of math courses (numerics is the word
for the whole branch of the mathematics) which deals solely with those matters.

C++ (and no other programming language I know of) offers you IEEE floating point
arithmetics. This arithmetic can only try to give you the numbers that are
closest (due to the limited granularity) to the real result of the computation.
If this doesn't suffice, it is your responsibility as a programmer to keep track
of how many digits can actually be trusted (after all, that is what makes a
computer scientist so well-paid).

To give you a bit of theoretical background: If you are dealing with any system
that stores real numbers by simply storing their decimal digits, you will
_always_ have a lot of numbers that cannot be stored exactly. Which numbers can
be stored exactly depends on the base that is used for storing numbers. On a
binary system you cannot store fractions exactly whose denominator contains
other prime factors than two. Thus you cannot store say 1/3 exactly on your PC
(this is just impossible).

If you want to work with fractions and need the results with perfect accuracy,
you can use for example Mathematica. In contrast to C++ Mathematica doesn't use
IEEE numbers for reprenting fractions but stores numerator and denominator as
integers (no loss of information there). Mathematica can also handle
computations that involve Pi or 'e'. Mathematica will perform only symbolic
computations, so if you want to know the value of two times Pi, Mathematica will
tell you that it is '2 * Pi', which is in every respect the most accurate
answer. If you want to know the decimal places of this value, you'll have to
force Mathematica to perform IEEE computations by requesting 'evalf(2 * Pi)'.

Regards,
Stuart

Joseph M. Newcomer

2007-10-07 03:09:02 UTC

Actually, C#, VB, Pascal, FORTRAN, and essentially EVERY language that runs on an x86 uses
IEEE 754 floating point, so it isn't limited to C or C++.
joe

Post by Stuart Redmann
[snip]

Post by GT
How do I get the computer to calculate and store the number of digits that
it can handle and no more. I don't care if it calculates 6, 10, 15, 27
digits, but I do *require* a correct result 25/30 is *NOT* 0.83337 or
0.8333337 or 0.833333333333333333337. and 0.8333337 + 1 is *NOT* 1.8333335
!!
If the C++ basic data types can't handle this basic mathematics, then how
can we even trust it to do 1+1 etc.

There is little to add to what Scott and Alexander have already said. Working
with floating point numbers _always_ involves deep knowledge of their internal
representation. That's why there are a lot of math courses (numerics is the word
for the whole branch of the mathematics) which deals solely with those matters.
C++ (and no other programming language I know of) offers you IEEE floating point
arithmetics. This arithmetic can only try to give you the numbers that are
closest (due to the limited granularity) to the real result of the computation.
If this doesn't suffice, it is your responsibility as a programmer to keep track
of how many digits can actually be trusted (after all, that is what makes a
computer scientist so well-paid).
To give you a bit of theoretical background: If you are dealing with any system
that stores real numbers by simply storing their decimal digits, you will
_always_ have a lot of numbers that cannot be stored exactly. Which numbers can
be stored exactly depends on the base that is used for storing numbers. On a
binary system you cannot store fractions exactly whose denominator contains
other prime factors than two. Thus you cannot store say 1/3 exactly on your PC
(this is just impossible).
If you want to work with fractions and need the results with perfect accuracy,
you can use for example Mathematica. In contrast to C++ Mathematica doesn't use
IEEE numbers for reprenting fractions but stores numerator and denominator as
integers (no loss of information there). Mathematica can also handle
computations that involve Pi or 'e'. Mathematica will perform only symbolic
computations, so if you want to know the value of two times Pi, Mathematica will
tell you that it is '2 * Pi', which is in every respect the most accurate
answer. If you want to know the decimal places of this value, you'll have to
force Mathematica to perform IEEE computations by requesting 'evalf(2 * Pi)'.
Regards,
Stuart

Joseph M. Newcomer [MVP]
email: ***@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

Joseph M. Newcomer

2007-10-07 02:56:36 UTC

Why do you think pocket calculators do not have these errors? Why do you think pocket
calculators and computers are using the same kind of arithmetic? In either case, your
expectations are erroneous and you are not taking reality into consideration. You can
wish all you want for infinite-precision arithmetic, but it isn't going to happen.

Have you looked into the issues of "rational arithmetic"? This was a short-lived
experiment in the late 1960s and early 1970s to provide the concept of high precision in
the presence of division, and the trick was to not actually do the division, but
essentially carry integer numerators and denominators symbolically through the
computation. You can read up on this in old LISP manuals.

For example, there is no value 0.1 in rational arithmetic. There is a value "1 / 10". The
numerators and denominators are carried through and no actual conversion is done until
final printout, and then it is done to possibly hundreds of digits of precision. So you
would never have 0.83333333 as any possible representation of 25/30; you would actually
keep the symbolic numerator 25 and the symbolic denominator 30 as a rational pair. If you
later added 1 to it, you would get the symbolic value 55/30 as the rational pair (that is,
you would add 30/30 to 25/30 and get 55/30 as the result, and that's what you would store:
the tuple <55, 30>). If you multiplied this by 1/10, you would get 55/300. If you
divided it by 7, you would get 55/2100 (1/7 * 55/300). If you want precision, you pay a
high cost for it. Of course, you end up having to deal with printout. Note that 1/3 * 3
is not necessarily 1.0; so if you printed out the result 1/3 you would get 0.3333333333333
until you ran out of digits. If you put this back in, you would get a different rational
number to represent it, which would not be precisely 1/3, so if you multiplied it by 3 you
would get 0.999999999999999 until you ran out of digits. But if you displayed it as 1/3
and put it in, then multiplying 1/3 * 3 would give you 3/3, which ultimately would display
as 1. But if you are going to use float or double, you are going to deal with the
realities of finite-precision floating-point arithmetic, and you will have to adjust your
expectations to encompass reality, not some abstract mathematics.

By the way, I really did try to add 1 to 2147483647 and I did NOT get 2147483648. I got
-2147483648. How can I add 1 and get a negative number? Can you explain to me what is
wrong with my program? If you can, then you should be able to understand what is wrong
with yours.
joe

Post by Les
If you were to try "by hand" your calculation, then the recurring 3's of
25.0/30.0 would cause you to run out of paper before you could do the
actual math. So you mentally adjust the *correct* number to (an
incorrect) one which you can handle.

Almost - your last part is wrong. You don't *adjust* the number, you
*truncate* the number to a certain level of precision. Even if you
truncate the number to 2 decimal places it is 0.83 not 0.87 !!

Semantics : by truncating the number you are adjusting the precision.
And the truncated number is still "incorrect" it is an approximation to
the actual number.
Can 0.83recurring (to however many decimal places) be represented
*exactly* IN BINARY?
In fact can any of the numbers you use be represented *exactly* IN BINARY?
THAT IS THE WHOLE POINT.
eg 0.1 cannot be represented exactly in binary.
Most computer do not use decimal arithmetic (there certainly have been
some in the past that did)

Post by GT
In the case of double the level of precision seems to be 16 digits, but
this is actually irrelevant. Whatever the number of digits used, the last
digit should be a 3 not a 7 !! My algorithm is fine, the calculation is
producing an incorrect answer. At a point later in my calculation I end
up subtracting one number from another and the answer should be zero, but
instead (due to the above problem) the answer is not == 0.

repeat
Most computer do not use decimal arithmetic.

We all know computers are binary machines, not decimal, but why can't the
computer do the same basic maths as a pocket calculator?!? 25/30 is
0.833recurring and it should calculate, store and display as many digits as
can be stored at the desired level of granularity, not randomly invent an
How do I get the computer to calculate and store the number of digits that
it can handle and no more. I don't care if it calculates 6, 10, 15, 27
digits, but I do *require* a correct result 25/30 is *NOT* 0.83337 or
0.8333337 or 0.833333333333333333337. and 0.8333337 + 1 is *NOT* 1.8333335
!!
If the C++ basic data types can't handle this basic mathematics, then how
can we even trust it to do 1+1 etc.

Joseph M. Newcomer [MVP]
email: ***@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

GT

2007-10-08 09:48:00 UTC

Post by Joseph M. Newcomer
Why do you think pocket calculators do not have these errors? Why do you think pocket
calculators and computers are using the same kind of arithmetic? In either case, your
expectations are erroneous and you are not taking reality into
consideration. You can
wish all you want for infinite-precision arithmetic, but it isn't going to happen.
Have you looked into the issues of "rational arithmetic"? This was a short-lived
experiment in the late 1960s and early 1970s to provide the concept of high precision in
the presence of division, and the trick was to not actually do the division, but
essentially carry integer numerators and denominators symbolically through the
computation. You can read up on this in old LISP manuals.
For example, there is no value 0.1 in rational arithmetic. There is a value "1 / 10". The
numerators and denominators are carried through and no actual conversion is done until
final printout, and then it is done to possibly hundreds of digits of precision. So you
would never have 0.83333333 as any possible representation of 25/30; you would actually
keep the symbolic numerator 25 and the symbolic denominator 30 as a rational pair. If you
later added 1 to it, you would get the symbolic value 55/30 as the rational pair (that is,
the tuple <55, 30>). If you multiplied this by 1/10, you would get 55/300. If you
divided it by 7, you would get 55/2100 (1/7 * 55/300). If you want precision, you pay a
high cost for it. Of course, you end up having to deal with printout.
Note that 1/3 * 3
is not necessarily 1.0; so if you printed out the result 1/3 you would get 0.3333333333333
until you ran out of digits. If you put this back in, you would get a different rational
number to represent it, which would not be precisely 1/3, so if you multiplied it by 3 you
would get 0.999999999999999 until you ran out of digits. [snip]

... and when you format that answer for the screen, it would be rounded to
1.0. So NO PROBLEM. Do the maths to as many decimal places as you can
accurately handle, then do all rounding and display formatting in the last
stage.

Alexander Grigoriev

2007-10-05 03:05:04 UTC

The computer "truncates" bits, not decimal digits. It doesn't care that the
actual number is 0.8(3). It just rounds the result to the closest bit in the
double FP precision, which happens to be "round up", then you get last 7.

Post by GT
Almost - your last part is wrong. You don't *adjust* the number, you
*truncate* the number to a certain level of precision. Even if you
truncate the number to 2 decimal places it is 0.83 not 0.87 !!
In the case of double the level of precision seems to be 16 digits, but
this is actually irrelevant. Whatever the number of digits used, the last
digit should be a 3 not a 7 !! My algorithm is fine, the calculation is
producing an incorrect answer. At a point later in my calculation I end up
subtracting one number from another and the answer should be zero, but
instead (due to the above problem) the answer is not == 0.

Joseph M. Newcomer

2007-10-07 02:40:37 UTC

See all my earlier posts. Your expectations are erroneous, and all your problems stem
from a failure to understand reality.
joe

Post by AliR (VC++ MVP)
http://www.google.com/search?q=Significant+digits+double&hl=en

That info on significant digits was all very interesting, but how do I
get my code to work properly?!? I want to force the computer to calculate
and store as many digits as it can handle and no extra spurious digits,
so that my calculations get nice accurate results. At the moment part of
my calculation ends up with -0.00000000000007 (haven't counted the zeros
here, but you get the idea!), where it should be zero.

If your algorithm is susceptible to differences at the 17th decimal place
(as in your original post) then your algorithm is wrong.
There are an infinite number of numbers between 0.830 and 0.840 (ie
between *any* two numbers) unfortunately computers only have a limited
number of bytes in which to store these numbers, and a "standard defined"
way of storing them if it uses for example IEEE format.
Many numbers cannot be represented exactly in binary and so a compromise
is assumed.
If you were to try "by hand" your calculation, then the recurring 3's of
25.0/30.0 would cause you to run out of paper before you could do the
actual math. So you mentally adjust the *correct* number to (an incorrect)
one which you can handle.

Almost - your last part is wrong. You don't *adjust* the number, you
*truncate* the number to a certain level of precision. Even if you truncate
the number to 2 decimal places it is 0.83 not 0.87 !!
In the case of double the level of precision seems to be 16 digits, but this
is actually irrelevant. Whatever the number of digits used, the last digit
should be a 3 not a 7 !! My algorithm is fine, the calculation is producing
an incorrect answer. At a point later in my calculation I end up subtracting
one number from another and the answer should be zero, but instead (due to
the above problem) the answer is not == 0.

Joseph M. Newcomer [MVP]
email: ***@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

GT

2007-10-08 09:48:59 UTC

Post by Joseph M. Newcomer
See all my earlier posts. Your expectations are erroneous, and all your problems stem
from a failure to understand reality.

Odd - reality states that 25/30, then * 30 is 25! My grasp on reality is
sound, yours seems to be away with the fairies!

AliR (VC++ MVP)

2007-10-04 15:55:29 UTC

I'm not sure yet. I'm search on google for it. I see that windows
calculator program does exactly that. It only shows the significant digits
of a double. So it has to be doable.

AliR.

Post by AliR (VC++ MVP)
http://www.google.com/search?q=Significant+digits+double&hl=en

That info on significant digits was all very interesting, but how do I get
my code to work properly?!? I want to force the computer to calculate and
store as many digits as it can handle and no extra spurious digits, so
that my calculations get nice accurate results. At the moment part of my
calculation ends up with -0.00000000000007 (haven't counted the zeros
here, but you get the idea!), where it should be zero.

Alexander Grigoriev

2007-10-05 03:01:42 UTC

Windows calculator is not using 'double'. Try and see how big a factorial it
allows you to calculate. It's beyond FP math.

Post by AliR (VC++ MVP)
I'm not sure yet. I'm search on google for it. I see that windows
calculator program does exactly that. It only shows the significant
digits of a double. So it has to be doable.
AliR.

Post by AliR (VC++ MVP)
http://www.google.com/search?q=Significant+digits+double&hl=en

That info on significant digits was all very interesting, but how do I
get my code to work properly?!? I want to force the computer to calculate
and store as many digits as it can handle and no extra spurious digits,
so that my calculations get nice accurate results. At the moment part of
my calculation ends up with -0.00000000000007 (haven't counted the zeros
here, but you get the idea!), where it should be zero.

Luke alcatel

2007-10-04 16:02:07 UTC

I'll risk criticism by saying this thread is silly. A middle school child
learns not to express calculation results with more significant digits than
the data or instruments provide. A double provides 14-15 significant digits
so why are you surprised that the result is not what you theoretically
expect at the 15th digit?

Clue #1 adjust your print formats for fewer significant digits.
Clue #2 don't try to land on Jupiter if your flight control software uses
doubles but your navigation system requires 18 significant digits.
Clue #3 find an arbitrary precision math library on the web and use it.

LA

Post by AliR (VC++ MVP)
http://www.google.com/search?q=Significant+digits+double&hl=en

That info on significant digits was all very interesting, but how do I get
my code to work properly?!? I want to force the computer to calculate and
store as many digits as it can handle and no extra spurious digits, so
that my calculations get nice accurate results. At the moment part of my
calculation ends up with -0.00000000000007 (haven't counted the zeros
here, but you get the idea!), where it should be zero.

GT

2007-10-04 21:44:14 UTC

Post by Luke alcatel
I'll risk criticism by saying this thread is silly. A middle school child
learns not to express calculation results with more significant digits
than the data or instruments provide. A double provides 14-15 significant
digits so why are you surprised that the result is not what you
theoretically expect at the 15th digit?

I am surprised because I am using a C++ basic data type and it shouldn't
stumble over basic mathematics!

Post by Luke alcatel
Clue #1 adjust your print formats for fewer significant digits.

This number is the first stage in some complex calculations and the problem
is compounded resulting in a final number that is just wrong.

Post by Luke alcatel
Clue #2 don't try to land on Jupiter if your flight control software uses
doubles but your navigation system requires 18 significant digits.

I only require about 5 or 6 decimal places, but float and double both see
the same problem, so what am I supposed to do?

Post by Luke alcatel
Clue #3 find an arbitrary precision math library on the web and use it.

I found one - its called the basic C++ data types. int, float, double etc.

Why is everyone having a go at me? I asked a perfectly simple question that
is confusing 3 of us here. In front of me, I have a casio calculator, a
pencil and paper and a PC. I type some numbers into the calculator and
scribble on the paper and derive 5 numbers. The first stage in the
calculation is 25/30. The calculator gives us range of 5 number that
match/confirm what we worked out on paper, but the PC gives us something
else that is just wrong. We are not working on a low level C system here, we
are writing an MFC application with dialog boxes, menus and the likes. There
is not a printf in sight.

One of the results should be exactly 0, but the computer gives the
result -0.00000000007. Which is then displayed on the screen as -0.0. This
result is displayed on the screen (in an edit boxes) along with all the
positive numbers and is just wrong!

25/33 = 0.833 recurring. 0.833337 is not *innacurate* it is *WRONG*. End of
story.

Even if this 0.83333..3337 were true, then adding 1.0 to it CANNOT BE
1.8333...335 - it has changed the last digit!!! If it can't handle the last
digit, then it shouldn't display/use it.

If the computer can work to 8, 12, 15, 16, 18 digits, then why does it get
the last digit wrong?

I understand all this significant digit stuff, my problem is - Why does the
computer store and use the 'dodgy' last digit if it is actually WRONG and
causes incorrect results?

Michael K. O'Neill

2007-10-05 01:54:03 UTC

< snipped content>
Why is everyone having a go at me? I asked a perfectly simple question that
is confusing 3 of us here. In front of me, I have a casio calculator, a
pencil and paper and a PC. I type some numbers into the calculator and
scribble on the paper and derive 5 numbers. The first stage in the
calculation is 25/30. The calculator gives us range of 5 number that
match/confirm what we worked out on paper, but the PC gives us something
else that is just wrong. We are not working on a low level C system here, we
are writing an MFC application with dialog boxes, menus and the likes. There
is not a printf in sight.

It doesn't look like everyone is having a go at you. They are only tring to
explain why you are seeing the results you get. Don't pay any attention to
the "3" that are also confused; pay attention to the ones who are not.

Calculators, pencil and paper, and computers, all work with numbers
differently, and you get different results because of that. Calculators
often have internal precision of 16 or more decimal digits, but display only
the first 12 or so, and thus give the impression that they are more accurate
than they are. Pencil and paper allows you to work on the infinitely
quantized real number system, and thus end up with no numberical errors at
all. Computers work with the IEEE-754 standard for floating point
arithmetic, and thus have 15-16 decimal digits of accuracy for doubles, and
7-8 for floats. See "IEEE 754" at http://en.wikipedia.org/wiki/IEEE_754

One of the results should be exactly 0, but the computer gives the
result -0.00000000007. Which is then displayed on the screen as -0.0. This
result is displayed on the screen (in an edit boxes) along with all the
positive numbers and is just wrong!
25/33 = 0.833 recurring. 0.833337 is not *innacurate* it is *WRONG*. End of
story.

The number you get is not "wrong"; it's the correct number according to the
IEEE-754 standard, and it is what it is. Simply because it's not the number
you expect does not make it wrong. As mentioned in my other post, the last
digit that you see is beyond the precision guarantees of the standard, and
thus is essentially a random number.

Even if this 0.83333..3337 were true, then adding 1.0 to it CANNOT BE
1.8333...335 - it has changed the last digit!!! If it can't handle the last
digit, then it shouldn't display/use it.

Yes, of course it's possible that 0.83333..3337 + 1.0 = 1.8333...335. By
adding one, you have used up one additional decimal digit of accuracy (at
the front of the number) and thus the last digit (which is beyond the
guarantee of precision anyway) must almost certainly change.

If the computer can work to 8, 12, 15, 16, 18 digits, then why does it get
the last digit wrong?
I understand all this significant digit stuff, my problem is - Why does the
computer store and use the 'dodgy' last digit if it is actually WRONG and
causes incorrect results?

Just because the last digit is "dodgy" doesn't make it irrelevant. Many
algorithms are implemented specifically to take advantage of the random
nature of the last digit, to improve improved results over a large number of
operations (think of the "central limit theorem").

To recap the main points being made here: floating point arithmentic on a
computer is necessarily imbued with with a certain amount of quantization
inaccuracy in the last decimal digit beyond the precision guarantee, and
that last decimal digit causes numerical inaccuracies in the form of
randomness in the last digit. An algorithm that relies on the last decimal
digit for proper operation is doomed to failure, and must be re-designed.

As an example of a decent re-design, if your algoritm calls for (a-b)/b+1,
then re-write the equation as (a-b)/b+1=a/b-1+1=a/b simply. That will give
better results than the multi-part equations in the code that you showed us.
It will still have a random digit in the last digit beyond the precision
guarantee.

Mike

Les

2007-10-05 08:43:48 UTC

Post by GT
Why is everyone having a go at me?

I apologise if you felt that I was having a go at you.
As I said, many people (myself included in the past) have struggled to
understand the cause of this problem which is due to the way floating point
numbers are stored in their internal binary representation.

This is the concept I and others have been trying to get across.

If I may show you an example from Another Language (but the principle is the
same)
(From Dave Eklund, Compaq Fortran Engineering)

"Consider 5.0 divided by powers of 10.:
do i = 1,10
x = 5.0/(10.**i)
type 1, x, x
1 format (1x, f40.30, 1x, b)
enddo
which produces in decimal and binary :

0.500000000000000000000000000000 111111000000000000000000000000
0.050000000745058059692382812500 111101010011001100110011001101
0.004999999888241291046142578125 111011101000111101011100001010
0.000500000023748725652694702148 111010000000110001001001101111
0.000049999998736893758177757263 111000010100011011011100010111
0.000004999999873689375817775726 110110101001111100010110101100
0.000000499999998737621353939176 110101000001100011011110111101
0.000000050000000584304871154018 110011010101101011111110010101
0.000000004999999969612645145389 110001101010111100110001110111
0.000000000499999985859034268287 110000000010010111000001011111
Only that first one is EXACT! Notice that the others,
while "close" to .05, .005, .0005. etc. are not EXACTLY .05, .005, .0005
etc. Some are a little bigger, some smaller (popularly called "nines
disease"). In fact, with the exception of 0.500, all the others CANNOT be
exactly represented as sums of powers of 2!"

"SUMS OF POWERS OF 2" That is the key to understanding computer floating
point math.

Les

Joseph M. Newcomer

2007-10-07 03:41:29 UTC

See below...

Post by Luke alcatel
I'll risk criticism by saying this thread is silly. A middle school child
learns not to express calculation results with more significant digits
than the data or instruments provide. A double provides 14-15 significant
digits so why are you surprised that the result is not what you
theoretically expect at the 15th digit?

I am surprised because I am using a C++ basic data type and it shouldn't
stumble over basic mathematics!

***
May I say at this point, after seeing all the explanations, that this now constitutes a
stupid remark. In case you missed the reality check, YOU ARE GETTING THE CORRECT ANSWER
AND YOUR EXPECTATIONS OF MATHEMATICS ARE ERRONEOUS. Fix your expectations. Stop whining.
*****

Post by Luke alcatel
Clue #1 adjust your print formats for fewer significant digits.

This number is the first stage in some complex calculations and the problem
is compounded resulting in a final number that is just wrong.

****
When I was working on my PhD in 1974, and in case you need help in arithmetic, that was 33
years ago, not counting the roundoff errors intrinsic in the computation, I went to Joe
Traub, then one of the world's experts on floating point computations on computers, and
asked him about issues of optimizing floating point computations. His answer: "don't.
During some computations, we end up introducing errors that are actually orders of
magnitude higher than the final result, but we carefully do other computations that cancel
them out, so we get the correct result at the end. If you change the order of computation
in any way at all, you change the computation we have carefully designed". So the people
who ARE in contact with reality, which is apparently everyone who does serious floating
point, have understood this problem for decades. Just because you don't understand the
problem, and have no contact with reality, don't complain to us that you are getting
"incorrect" or "undependable" results, or even that it doesn't correspond to your
delusional system of mathematics. It is a completely self-consistent form of arithmetic,
and if it displease you, then you will have to expend massive effort and computational
cost to produce something that actually implements your delusional system. The correct
solution is to do what has been done for the last 60 years of numerical computation using
computers, and come to terms with the concept of roundoff error, which is intrinsic to
finite-precision arithmetic. You will have to understand that you cannot EVER compare any
floating point result to any other known value, whether it is 0 or any other value, and
expect to get equality. You will ALWAYS compare to a "fuzz factor". If your algorithm is
sensitive to the fact that you get a very tiny negative value, then your algorithm is
WRONG, and you will need to fix it. If you print out a very tiny negative number of fewer
digits of precision and get a minus sign, then your printout is wrong. You will have to
do what tens of thousands of programmers have done for decades, and actually learn what
floating point arithmetic is all about. If you persist in your delusional system, you
will have no success, because you are clearly not in touch with any form of reality that
programmers actually deal with on a daily basis, and your program will never work, because
it will always be based on erroneous assumptions.
****

Post by Luke alcatel
Clue #2 don't try to land on Jupiter if your flight control software uses
doubles but your navigation system requires 18 significant digits.

I only require about 5 or 6 decimal places, but float and double both see
the same problem, so what am I supposed to do?

****
LEARN HOW FLOATING POINT WORKS! ADJUST YOUR ATTITUDE SO YOU TAKE REALITY INTO
CONSIDERATION AS PART OF YOUR PROGRAMMING.
*****

Post by Luke alcatel
Clue #3 find an arbitrary precision math library on the web and use it.

I found one - its called the basic C++ data types. int, float, double etc.

****
No, you have not. The fact that you think that basic C++ data types such as int, float,
double etc have arbitrary precision shows that you are totally clueless. Note that there
has been a serious effort to educate you, and you are rejecting reality because it doesn't
conform to your delusions. Well, that loses. And you will continue to lose as long as
you fail to understand reality. Your expectations are entirely and completely delusional.
No form of reality in the history of computing has conformed to your delusions, although
there have been sincere attempts to do so over the years, none have ever proven to be
effective in practical computations. Sure, you can use infinite precision arithmetic
packages to compute the value of pi to three billion digits if you want, but you know,
there really isn't very much call for three billion digit arithmetic otherwise.
****

Post by GT
Why is everyone having a go at me? I asked a perfectly simple question that
is confusing 3 of us here. In front of me, I have a casio calculator, a
pencil and paper and a PC. I type some numbers into the calculator and
scribble on the paper and derive 5 numbers. The first stage in the
calculation is 25/30. The calculator gives us range of 5 number that
match/confirm what we worked out on paper, but the PC gives us something
else that is just wrong. We are not working on a low level C system here, we
are writing an MFC application with dialog boxes, menus and the likes. There
is not a printf in sight.

****
And we gave you a perfectly simple answer. Your view of reality is wrong, and until you
change your view of reality, you will never find a computer that matches your delusions.
Perhaps what you need to do is build a robotic arm that punches buttons on your
calculator, and a video camera to read the result, and OCR recognition to handle it, but
you know, as arithmetic units go, that's going to rather slow. Or, perhaps you could
create a robotic system to use a slide rule. Ultimately, as long as you want to use float
or double, you are going to have to understand the limits of binary floating point
representations, and your continuing refusal to do so shows that you have unreasonable
expectations that can never be met by any floating point unit on any computer in history,
so all we see now is someone whining that reality doesn't match your delusions, and
refuses to accept what everyone has been telling you.

****

Post by GT
One of the results should be exactly 0, but the computer gives the
result -0.00000000007. Which is then displayed on the screen as -0.0. This
result is displayed on the screen (in an edit boxes) along with all the
positive numbers and is just wrong!

****
See my earlier comment. This is a correct result. Your expecations is wrong.
****

Post by GT
25/33 = 0.833 recurring. 0.833337 is not *innacurate* it is *WRONG*. End of
story.

****
25/30 =
0.833333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333
is not only *inaccurate*, it is *wrong*. Why do you think the value you are talking about
is any less inaccurate or is any more wrong. Given any finite representation of digits,
it is IMPOSSIBLE to represent 25/30 correctly. Why is one inaccurate representation good
while another inaccurate representation is bad? Simple: YOU ARE CLUELESS. And you are
absolutely refusing to understand reality.
*****

Post by GT
Even if this 0.83333..3337 were true, then adding 1.0 to it CANNOT BE
1.8333...335 - it has changed the last digit!!! If it can't handle the last
digit, then it shouldn't display/use it.

****
I actually explained that in an earlier message. It boils down to the fact that you are
persisting in believing that computers do decimal arithmetic in spite of massive evidence
to the contrary, and in spite of the fact that many people have attempted to explain to
you WHY you are wrong. The answers you are getting are CORRECT.

So stop whining that binary floating point doesn't work like a piece of paper and pencil.
It doesn't, it never did, it never will, and your belief that it should is completely
inconsistent with reality. So what needs to change is your attitude.

Every other programmer who has done floating point has understood reality, so why do you
think that you are so privileged that the computer has to conform to your reality?
*****

Post by GT
If the computer can work to 8, 12, 15, 16, 18 digits, then why does it get
the last digit wrong?

*****
Why do I get a negative value when I do

2147483647 + 1

If I expect an int to hold more that the value 2147483647 then my expectations are wrong,
and my grasp of reality is wrong. It is no more valid to expect that
2147483647 + 1 = 2147483648
than it is to expect 25/30=0.833,333,333,333,33
****

Post by GT
I understand all this significant digit stuff, my problem is - Why does the
computer store and use the 'dodgy' last digit if it is actually WRONG and
causes incorrect results?

****
Your whole concept of WRONG is wrong. Your whole concept of "dodgy" is wrong. Your
concept of arithmetic is wrong. We are trying, seriously trying, to educate you that what
you are seeing is RIGHT and only your expecations are wrong, and you keep missing the
point.
joe
****
Joseph M. Newcomer [MVP]
email: ***@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

David Ching

2007-10-07 04:00:02 UTC

I want to ask: can anyone recommend a library that produces results like a
calculator (I believe it is called fixed precision decimal)? I understand
that the floating point types in C++ are not meant for this, but what search
terms do I google to find a library that offers this functionality?

I had thought BCD (binary coded decimal) would be a good start, but there
isn't anything on sourceforge, for example that has this....

Thanks,
David

BobF

2007-10-07 10:44:51 UTC

David - You might try "guard bits". That's a method that's been used in the
past. It's the old trick where you divide 10 by 3, see .333333333, then
multiply by 3. Sometimes you would see .999999 and sometimes you would see
10. Those that displayed 10 were said to employ guard bits.

The methods may have changed over the years, but it should get you on the
trail.

Post by David Ching
I want to ask: can anyone recommend a library that produces results like a
calculator (I believe it is called fixed precision decimal)? I understand
that the floating point types in C++ are not meant for this, but what
search terms do I google to find a library that offers this functionality?
I had thought BCD (binary coded decimal) would be a good start, but there
isn't anything on sourceforge, for example that has this....
Thanks,
David

David Ching

2007-10-07 15:09:55 UTC

Post by BobF
David - You might try "guard bits". That's a method that's been used in
the past. It's the old trick where you divide 10 by 3, see .333333333,
then multiply by 3. Sometimes you would see .999999 and sometimes you
would see 10. Those that displayed 10 were said to employ guard bits.
The methods may have changed over the years, but it should get you on the
trail.

Thanks Bob. It seems guard bits are used to implement rounding of a result
to the precision supported by the hardware. I'm not sure it corrects the
fundamental problem of representing results exactly for a given number of
digits, like a calculator does, which seems to be what GT wants.

I have never had a problem with the float/double C++ types, so I've not
looked into the floating point inaccuracies in much detail. I've not even
followed this thread very closely. But it seems that what he is asking for
is how to emulate a $1 calculator in terms of accuracy. No more and no
less, and I've yet to see a simple answer like, "use this library and your
problem will be solved." Since I may well have this problem some day, I
would like to see such an answer.

Thanks,
David

Giovanni Dicanio

2007-10-07 15:42:34 UTC

But it seems that what he is asking for is how to emulate a $1 calculator
in terms of accuracy. No more and no less, and I've yet to see a simple
answer like, "use this library and your problem will be solved." Since I
may well have this problem some day, I would like to see such an answer.

Hi David,

I think that the OP might find the "decimal" C# data type to be useful.

http://msdn2.microsoft.com/en-us/library/364x0z75(VS.80).aspx

I'm not expert in C++/CLI, C# and .NET, but my understanding is that decimal
is a .NET type, so maybe the OP could use the C++/CLI extension and use
decimal in C++, too.

Decimal type has higher precision than floats.

Giovanni

Joseph M. Newcomer

2007-10-07 19:27:20 UTC

One of the things that IEEE-754 supports is various user-selectable forms of roundoff
management, which can be set in the FPU control register. If you are really desperate to
have total control over the FPU, you can actually look up what all these bits are. They
were the "committee" approach because there were solid arguments on both sides for doing
either round-up, round-down, round-to-closest (taking sign into effect) and so on, so
instead of annoying one group or the other by choosing a single method, the standard
requires that a conforming implementation support ALL of them.
joe

Post by David Ching

Post by BobF
David - You might try "guard bits". That's a method that's been used in
the past. It's the old trick where you divide 10 by 3, see .333333333,
then multiply by 3. Sometimes you would see .999999 and sometimes you
would see 10. Those that displayed 10 were said to employ guard bits.
The methods may have changed over the years, but it should get you on the
trail.

Thanks Bob. It seems guard bits are used to implement rounding of a result
to the precision supported by the hardware. I'm not sure it corrects the
fundamental problem of representing results exactly for a given number of
digits, like a calculator does, which seems to be what GT wants.
I have never had a problem with the float/double C++ types, so I've not
looked into the floating point inaccuracies in much detail. I've not even
followed this thread very closely. But it seems that what he is asking for
is how to emulate a $1 calculator in terms of accuracy. No more and no
less, and I've yet to see a simple answer like, "use this library and your
problem will be solved." Since I may well have this problem some day, I
would like to see such an answer.
Thanks,
David

Joseph M. Newcomer [MVP]
email: ***@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

GT

2007-10-08 10:02:55 UTC

Post by David Ching

Post by BobF
David - You might try "guard bits". That's a method that's been used in
the past. It's the old trick where you divide 10 by 3, see .333333333,
then multiply by 3. Sometimes you would see .999999 and sometimes you
would see 10. Those that displayed 10 were said to employ guard bits.
The methods may have changed over the years, but it should get you on the
trail.

Thanks Bob. It seems guard bits are used to implement rounding of a
result to the precision supported by the hardware. I'm not sure it
corrects the fundamental problem of representing results exactly for a
given number of digits, like a calculator does, which seems to be what GT
wants.
I have never had a problem with the float/double C++ types, so I've not
looked into the floating point inaccuracies in much detail. I've not even
followed this thread very closely. But it seems that what he is asking
for is how to emulate a $1 calculator in terms of accuracy. No more and
no less, and I've yet to see a simple answer like, "use this library and
your problem will be solved." Since I may well have this problem some
day, I would like to see such an answer.

David and Bob, thank you. I thought I was going insane. Somebody finally
understood my very simple question. "guard bits" sound like exactly what I
need to solve this very basic and simple question!

David Ching

2007-10-08 10:45:00 UTC

Post by GT
David and Bob, thank you. I thought I was going insane. Somebody finally
understood my very simple question. "guard bits" sound like exactly what I
need to solve this very basic and simple question!

It did sound like they weren't hearing you. I felt the same way when I
asked for a simple switch to make the default char type wchar_t and got all
manner of reasons why, to preserve the functionality of some 8 bit embedded
microprocessor that no one uses anymore, it was not possible, when clearly
it WAS possible. My only take away from all this is that some C++ wizards
tend to be eccentric and don't care much for pragmatic answers mere mortals
can use. Don't be discouraged! :-)

BTW, guard bits are something implemented in hardware, so you have no
control over that. I have never run into a problem like (1/3) * 3 != 1. So
perhaps the problems you are seeing really won't make a difference in actual
usage.

-- David

David Wilkinson

2007-10-08 12:19:56 UTC

Post by GT
David and Bob, thank you. I thought I was going insane. Somebody finally
understood my very simple question. "guard bits" sound like exactly what I
need to solve this very basic and simple question!

GT:

Maybe I paid too much money for my old HP scientific calculator, but if I do

(1/3)*3 - 1

then it gives the answer -1.0e-12.

I think you are still clinging to an incorrect set of assumptions about
how floating point (float or double) works. It is almost always wrong to
compare two floating point numbers for equality, and that's just the way
it is.

--
David Wilkinson
Visual C++ MVP

Joseph M. Newcomer

2007-10-07 19:25:17 UTC

The FPU keeps 80-bit values. It also uses an implicit "hidden" bit, since the high-order
bit of a floating point value is always redundant, since it is always 1. It is implied
but never present, which gives an additional bit of mantissa. There's a lot of detail
described in the FPU manual, which you can actually download from the Intel Web site (I
did this a few years ago, so I don't remember the link, but I recall it took very little
searching to actually find the download once I got to the Intel site)
joe

Post by BobF
David - You might try "guard bits". That's a method that's been used in the
past. It's the old trick where you divide 10 by 3, see .333333333, then
multiply by 3. Sometimes you would see .999999 and sometimes you would see
10. Those that displayed 10 were said to employ guard bits.
The methods may have changed over the years, but it should get you on the
trail.

Post by David Ching
I want to ask: can anyone recommend a library that produces results like a
calculator (I believe it is called fixed precision decimal)? I understand
that the floating point types in C++ are not meant for this, but what
search terms do I google to find a library that offers this functionality?
I had thought BCD (binary coded decimal) would be a good start, but there
isn't anything on sourceforge, for example that has this....
Thanks,
David

Joseph M. Newcomer [MVP]
email: ***@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

Alexander Grigoriev

2007-10-08 02:47:21 UTC

By default, in Windows the FPU is set for 64 bit internal precision. This
way, the FP result won't depend on optimizations (of loding and storing the
intermediate results, vs keeping them in the registers). You can set
explicitly it to 80, though.

Post by Joseph M. Newcomer
The FPU keeps 80-bit values. It also uses an implicit "hidden" bit, since the high-order
bit of a floating point value is always redundant, since it is always 1.
It is implied
but never present, which gives an additional bit of mantissa. There's a lot of detail
described in the FPU manual, which you can actually download from the Intel Web site (I
did this a few years ago, so I don't remember the link, but I recall it took very little
searching to actually find the download once I got to the Intel site)
joe

Post by BobF
David - You might try "guard bits". That's a method that's been used in the
past. It's the old trick where you divide 10 by 3, see .333333333, then
multiply by 3. Sometimes you would see .999999 and sometimes you would see
10. Those that displayed 10 were said to employ guard bits.
The methods may have changed over the years, but it should get you on the
trail.

Post by David Ching
I want to ask: can anyone recommend a library that produces results like a
calculator (I believe it is called fixed precision decimal)? I understand
that the floating point types in C++ are not meant for this, but what
search terms do I google to find a library that offers this
functionality?
I had thought BCD (binary coded decimal) would be a good start, but there
isn't anything on sourceforge, for example that has this....
Thanks,
David

Joseph M. Newcomer [MVP]
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

Geoff

2007-10-07 14:59:49 UTC

On Sat, 6 Oct 2007 21:00:02 -0700, "David Ching"

Post by David Ching
I want to ask: can anyone recommend a library that produces results like a
calculator (I believe it is called fixed precision decimal)? I understand
that the floating point types in C++ are not meant for this, but what search
terms do I google to find a library that offers this functionality?
I had thought BCD (binary coded decimal) would be a good start, but there
isn't anything on sourceforge, for example that has this....
Thanks,
David

"fixed point math library"

David Ching

2007-10-07 15:44:45 UTC

Post by Geoff
"fixed point math library"

Thanks Geoff. It seems the primary advantage of fixed point is speed on
some processors, but it is not used much on modern PC's because the floating
point processor negates the speed advantage. The accuracy is not improved.

Well, after re-reading some posts here, it seems that GT can just use
float/double and display the result to the desired number of digits (which
is far less than the accurate digits in these types), and that's that.

Now I can see why alternative libraries are not very common. Float/Double
works just fine (even for financial applications) and you just show the
desired number of decimal places and not worry if the 17th digit is right or
not.

Thanks,
David

Giovanni Dicanio

2007-10-07 15:57:42 UTC

Float/Double works just fine (even for financial applications) and you
just show the desired number of decimal places and not worry if the 17th
digit is right or not.

I think that floating points are *not* OK for financial applications, where
is very important exspecially in sums to not throw away also the cents, and
not having rounding errors.
I think that decimal is OK for financial (IIRC, VB6 and OLE had decimal
type, too); instead floating points (like double) are OK for graphics, or
physical simulations (where little rounding errors are OK).

Giovanni

David Ching

2007-10-07 16:13:34 UTC

Post by Giovanni Dicanio
I think that floating points are *not* OK for financial applications,
where is very important exspecially in sums to not throw away also the
cents, and not having rounding errors.
I think that decimal is OK for financial (IIRC, VB6 and OLE had decimal
type, too); instead floating points (like double) are OK for graphics, or
physical simulations (where little rounding errors are OK).

This is the confusing part for me. Raymond Chen (OldNewThing) explains that
float/double supports n "significant digits" but it doesn't mean that... it
means there is at most 1 / 2^n error. So when you're talking about
finances, where the numbers only go to the second decimal place, at what
point do the errors become significant? Since they define the accuracy as 1
/ 2^n error, how does that map to how many dollars and cents you can
calculate before getting worried about losing cents?

The .NET Decimal class has the same issue, I think, except that "n" is
higher than for double. They don't guarantee that dollars and cents remain
accurate up to so many quad-zillion dollars, which is the metric of
importance in financial calculations (I would think).

-- David

BobF

2007-10-07 18:12:17 UTC

Post by David Ching

Post by Giovanni Dicanio
I think that floating points are *not* OK for financial applications,
where is very important exspecially in sums to not throw away also the
cents, and not having rounding errors.
I think that decimal is OK for financial (IIRC, VB6 and OLE had decimal
type, too); instead floating points (like double) are OK for graphics, or
physical simulations (where little rounding errors are OK).

This is the confusing part for me. Raymond Chen (OldNewThing) explains
that float/double supports n "significant digits" but it doesn't mean
that... it means there is at most 1 / 2^n error. So when you're talking
about finances, where the numbers only go to the second decimal place, at
what point do the errors become significant? Since they define the
accuracy as 1 / 2^n error, how does that map to how many dollars and cents
you can calculate before getting worried about losing cents?
The .NET Decimal class has the same issue, I think, except that "n" is
higher than for double. They don't guarantee that dollars and cents
remain accurate up to so many quad-zillion dollars, which is the metric of
importance in financial calculations (I would think).

Accuracy is important for financials beyond whole cents when calculating
interest and such. IIRC, the institutions round in their own favor such
that they *expect* a certain amount over when they run balances.

Geoff

2007-10-07 19:12:01 UTC

Post by David Ching

Post by Giovanni Dicanio
I think that floating points are *not* OK for financial applications,
where is very important exspecially in sums to not throw away also the
cents, and not having rounding errors.
I think that decimal is OK for financial (IIRC, VB6 and OLE had decimal
type, too); instead floating points (like double) are OK for graphics, or
physical simulations (where little rounding errors are OK).

This is the confusing part for me. Raymond Chen (OldNewThing) explains
that float/double supports n "significant digits" but it doesn't mean
that... it means there is at most 1 / 2^n error. So when you're talking
about finances, where the numbers only go to the second decimal place, at
what point do the errors become significant? Since they define the
accuracy as 1 / 2^n error, how does that map to how many dollars and cents
you can calculate before getting worried about losing cents?
The .NET Decimal class has the same issue, I think, except that "n" is
higher than for double. They don't guarantee that dollars and cents
remain accurate up to so many quad-zillion dollars, which is the metric of
importance in financial calculations (I would think).

Accuracy is important for financials beyond whole cents when calculating
interest and such. IIRC, the institutions round in their own favor such
that they *expect* a certain amount over when they run balances.

FWIW, in Excel 2000 the equation =25/30 expressed to 30 requested
decimal places: 0.833333333333333000000000000000

Same thing goes for Excel 2003 on another PC.

Conclusion: Excel uses IEEE floating point. Truncation occurs at 15
decimal points.

Joseph M. Newcomer

2007-10-07 23:46:45 UTC

When I was in the financial programming business, my boss wrote a program for a local
bank. It was some complex investment prediction thing. He spent days trying to track
down the $50,000 error in the computations, which he ws certain was due entirely to
roundoff. When the bank called to ask for the progress, he explained that he still had a
$50,000 inconsistency in the data. Their response: "Wow! Only $50,000 differential?
That's wonderful! Send us the results!" They were happy to have a computation that had
an error that small.
joe

Post by David Ching

Post by Giovanni Dicanio
I think that floating points are *not* OK for financial applications,
where is very important exspecially in sums to not throw away also the
cents, and not having rounding errors.
I think that decimal is OK for financial (IIRC, VB6 and OLE had decimal
type, too); instead floating points (like double) are OK for graphics, or
physical simulations (where little rounding errors are OK).

This is the confusing part for me. Raymond Chen (OldNewThing) explains
that float/double supports n "significant digits" but it doesn't mean
that... it means there is at most 1 / 2^n error. So when you're talking
about finances, where the numbers only go to the second decimal place, at
what point do the errors become significant? Since they define the
accuracy as 1 / 2^n error, how does that map to how many dollars and cents
you can calculate before getting worried about losing cents?
The .NET Decimal class has the same issue, I think, except that "n" is
higher than for double. They don't guarantee that dollars and cents
remain accurate up to so many quad-zillion dollars, which is the metric of
importance in financial calculations (I would think).

Accuracy is important for financials beyond whole cents when calculating
interest and such. IIRC, the institutions round in their own favor such
that they *expect* a certain amount over when they run balances.

Joseph M. Newcomer [MVP]
email: ***@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

Joseph M. Newcomer

2007-10-07 19:59:15 UTC

We normally work assuming ± 1/2 LSB error, on the average. This becomes confusing because
there are another several bits in the 80bit FPU, so the ±1/2LSB actually applies to the
extended precision, but it is truncated/rounded when a 64-bit FP result is stored.

The arguments about decimal vs. binary floating point are very hard to sustain these days.
In the old days, it was often faster to use decimal arithmetic, because it was treated as
fixed-integer arithmetic with just a funny carry-logic, e.g., 0x9+0x1=0x0+carry and the
carry was carried into the next 4-bit decimal digit. Machines like the 8088 had decimal
artihmetic instructions because, without an 8087 FPU chip, floating point was dead slow,
comparable in performance to the 1620 arithmetic, but decimal add/subtract was quite fast,
and multiply wasn't totally unacceptable. Since you typically didn't need 17 digits of
precision, 64-bit floating point wasn't needed, but the 8 digits of 32-bit floating point
wouldn't handle large dollar amounts ($9,999,999.99 + 0.01 = $9,999,999.99 or the binary
equivalent thereof), and 64-bit floating point was FAR too slow to be realistic on such
machines.

Now that floating point add, subtract, and multiply are as fast as integer add (and in a
pipelined superscalar architecture like the Pentium, it can issue two integer and one
floating point operations per clock cycle, so the floating point is nearly "free"), the
arguments are far less persuasive.
joe

Post by David Ching

Post by Giovanni Dicanio
I think that floating points are *not* OK for financial applications,
where is very important exspecially in sums to not throw away also the
cents, and not having rounding errors.
I think that decimal is OK for financial (IIRC, VB6 and OLE had decimal
type, too); instead floating points (like double) are OK for graphics, or
physical simulations (where little rounding errors are OK).

This is the confusing part for me. Raymond Chen (OldNewThing) explains that
float/double supports n "significant digits" but it doesn't mean that... it
means there is at most 1 / 2^n error. So when you're talking about
finances, where the numbers only go to the second decimal place, at what
point do the errors become significant? Since they define the accuracy as 1
/ 2^n error, how does that map to how many dollars and cents you can
calculate before getting worried about losing cents?
The .NET Decimal class has the same issue, I think, except that "n" is
higher than for double. They don't guarantee that dollars and cents remain
accurate up to so many quad-zillion dollars, which is the metric of
importance in financial calculations (I would think).
-- David

Joseph M. Newcomer [MVP]
email: ***@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

Joseph M. Newcomer

2007-10-07 19:51:26 UTC

Actually, as someone who started out doing financial calculations, the decimal form is
equally bad, and any belief that decimal is "better" than binary floating point is usually
based on historical artifacts; for example, most early "commercial" machines had no FPU at
all (IBM 1620, 1401/1440/1460, Honeywell 200, GE 300-series); some did not even have
multiply or divide instructions. For the 1620, the FPU was an added-cost option, as was
the multiply/divide instruction, and the integer-add-subtract instruction (it was known as
the CADET- Can't Add, Doesn't Even Try, because it had no integer ALU at all; it did
arithmetic by doing table lookup in memory. A 20us memory cycle meant that to add two
values and store the result took

120us to fetch the instruction
for(i = 0; i < number of digits of operand; i++)
{
20us to fetch the first operand digit of the computation
20us to fetch the second operand digit of the computation
20us to look up the result in the table
20us to store the result
}

so a 10-digit add (required to deal with values to $9,999,999.99 with an extra digit to
handle roundoff) took 520us, that is, it could compute only about 1900 results/sec
assuming every instruction was a 10-digit add. Put tests, branches, etc. in this mix and
you were doing less than 1000 results/sec. In those days we really DID care about
optimizing every line of assembly code!

But the bottom line was that we still had cumulative error, even keeping an extra digit of
precision, so it was no different than the issue that has been beaten to death here about
binary floating point. So while you get, for certain PARTICULAR values, a slightly
more-consistent-with-decimal-arithmetic result by using decimal or fixed integer
representations, ultimately, for OTHER values, you get the same kinds of errors that have
been described for floating point, and the result is that in fact there is no particular
advantage of one form over the other, except that decimal arithmetic is several orders of
magnitude slower than fixed point integer or binary floating point.

True story from the 1960s: I was a grad student with a much older student, who was an
ordained minister as well. He told the story about a time, around 1964, when a coworker
came to him and inquired if his brand of religion had the equivalent of the "seal of
confession" (in Roman Catholicism, everything said in confession is private; not even
courts can compel a priest to discuss these issues). Bill assured him that he would treat
it as such. The coworker explained that he was working on the payroll program, and had
decided that he could take advantage of the roundoff error issues we've been discussing by
rounding intermediate results to the nearest penny and adding the fractions of a cent thus
freed up to his own paycheck. The problem, he explained, was that when he went to the
place in the code where he would do this, it had already been done and was looking for
somebody else's Social Security Number. "My problem," he explained, "is what to do. If I
go to my boss, he'll ask why I was even looking at that place in the code, since it had
nothing to do with the fix. And it might be HIS SSN that's there!" Bill looked at him
and said "You're a programmer; generalize it to a table of SSNs and add a comment about
how to add your own SSN to the list, then divide the booty among everyone!" His coworker
was someone taken aback, and Bill explained that he had been pulling his leg. "So what
was the outcome?" I asked. "Now that," BIll said, "I will treat as being under the Seal
of Confession". So I have no idea what the ultimate result was.

So the roundoff problem, which was old in 1964, already had exploits being done with it!
joe

Post by Giovanni Dicanio

Float/Double works just fine (even for financial applications) and you
just show the desired number of decimal places and not worry if the 17th
digit is right or not.

I think that floating points are *not* OK for financial applications, where
is very important exspecially in sums to not throw away also the cents, and
not having rounding errors.
I think that decimal is OK for financial (IIRC, VB6 and OLE had decimal
type, too); instead floating points (like double) are OK for graphics, or
physical simulations (where little rounding errors are OK).
Giovanni

Joseph M. Newcomer [MVP]
email: ***@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

Giovanni Dicanio

2007-10-08 08:06:12 UTC

Post by Joseph M. Newcomer
Actually, as someone who started out doing financial calculations, the decimal form is
equally bad, and any belief that decimal is "better" than binary floating point is usually
based on historical artifacts

Hi Joe,

I don't agree very much on that.

I still think that decimal is better when you need precision like in
financial calculations:

<quote url="http://msdn2.microsoft.com/en-us/library/364x0z75(VS.80).aspx">
The decimal keyword denotes a 128-bit data type. Compared to floating-point
types, the decimal type has a greater precision and a smaller range, which
makes it suitable for financial and monetary calculations. The approximate
range and precision for the decimal type are shown in the following table.
</quote>

Moreoever, decimal is 128-bit, while double is 64-bits (8 bytes; 8 bytes * 8
bits/byte = 64 bits), so also "intuitively" decimal can store more
information than double; but the difference is also in *how* this
information is encoded in decimal vs double.

If in a particular problem you value *range* over precision, then you choose
double.
In fact, double range is 10^(-324) to 10^308. But double precision is 15-16
digits.

Instead, if for a given problem you value *precision* over range, then you
can use decimal.
In fact, decimal range is 10^(-28) to 10^28 (lot less than double), but the
precision is much higher than double, in fact it is 28-29 significant
digits.

Also, languages like Visual Basic that are also oriented toward business and
financial applications, had Decimal type, too.

And decimal has been introduced also in .NET and C#. If double could
substitute decimal, I think that Microsoft had not introduced decimal in a
new great technology like .NET, IMHO...

Post by Joseph M. Newcomer
True story from the 1960s: [...]

Thanks for sharing that :)
I very much like reading your stories and historical perspectives.
(And you are right about seal of Confession for Roman Catholicism [which is
the main religion here in Italy], and the fact that not even courts can
compel a priest to discuss what is under that seal!)

I'm looking forward to read your reply, so you can clarify some points about
decimal and double that are unclear to me or that I misunderstood.

Thanks,
Giovanni

Joseph M. Newcomer

2007-10-07 19:04:05 UTC

Most BCD has precision limits, such as 15 digits. The only problem is that precision of
more than 15 digits, which is required for such computations as 25/30, won't work in BCD
fixed point, integer fixed point, or any other form of arithmetic that exists on computers
where other than symbolic values are used.

Generally, BCD offers no advantages over fixed-point binary arithmetic. FPB was used for
years to do graphics work; for example, I represent my coordinates as 24.8 bits of
precision. That way, cumulative roundoff won't distort images nearly as much, because I
use the upper 24 bits of the value as my coordinate, and the low-order 8 bits just "hang
onto" the roundoff error. But if I rotate something often enough using this technique,
the cumulative errors because I only have an 8-bit fraction will begin to distort it.

The LISP folks did some long-precision arithmetic in the 70s; the google search key would
be BIGNUM (when Guy Steele was in California during one summer, the length of the famous
El Camino Real ["The royal road"] was explained; he said that it was too long to be a Real
and dubbed it El Camino BIGNUM).

I once worked on a machine which had no multiply instruction; instead, multiplication was
done by a subroutine call. The machine was a decimal machine. The call was something
like

MULTIPLY(result, decimal, multiplier, decimal, multiplicand, decimal)

and the "decimal" values were the number of decimal points. For financial calculations,
we kept 3 decimal digits of precision and only printed two. That way we had all the
roundoffs balance out and the totals were approximatley correct. In 1964 it was
well-known that you could not divide 25/30 and get a precise number, and that was 43 years
ago. There was even a couple paragraphs in the manual describing the effects of decimal
precision and roundoff. The FORTRAN floating-point library I used in 1963 kept the values
in decimal (an IBM 1620) and devoted a chapter of the manual to floating-point roundoff
issues. They were no different then than they are today.

The only system that I know that actually preserves "indefinite" accuracy is the LISP
libraries that did "rational arithmetic" by keeping symbolic numerator/denominator pairs.
They understood how to add, subtract, multiply and divide rational numbers, and could even
compute sin, cos, etc. symbolically. I think MATLAB or one of those systems may still
support rational arithmetic.

The reason that the 7090/7094, PDP-6/10/20, and many other machines of the era had 36-bit
words (not 32-bit) was that John von Neumann had computed that for all real physical
problems, 36 bits was sufficient to get accurate answers, taking into account the roundoff
errors in the least significant bit. By using 9 bits of exponent and 26 bits of mantissa
(plus 1 bit of sign), he computed that this had sufficient dynamic range (exponent) and
sufficient precision (mantissa) for all realistic computations that were possible in the
foreseeable future. Like most of his work, he was absolutely right.

People like Alderson, the wizard of JPL (Niven's "Alderson Drive" was named in his honor)
understood floating-point roundoff, and he was in great demand for writing navigation code
for onboard computers for projects like interplanetary space missions, because he
understood how to cause the errors to be irrelevant instead of cumulative, so the
spacecraft arrived where it was supposed to, and not ten million kilometers off-course.

The ultimate issue here is that unlimited decimal precision rarely buys anything (except
in very esoteric cases like the Windows Calculator program), and in real situations,
binary floating point (note that a floating point multiply takes one (1) CPU clock cycle
on a Pentium, that is, 350ns on a 2.8GHz machine) always does the right thing; it is only
the responsibility of the programmer to not let errors become cumulative. And that's just
basic programming skill. By the way, I don't do floating point where these issues matter,
so I'm not an expert on how to do it; but I do know that the best efforts of a host of
people to educate the OP on this matter seems to be wasted effort.
joe

Post by David Ching
I want to ask: can anyone recommend a library that produces results like a
calculator (I believe it is called fixed precision decimal)? I understand
that the floating point types in C++ are not meant for this, but what search
terms do I google to find a library that offers this functionality?
I had thought BCD (binary coded decimal) would be a good start, but there
isn't anything on sourceforge, for example that has this....
Thanks,
David

Joseph M. Newcomer [MVP]
email: ***@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

Geoff

2007-10-07 04:19:07 UTC

On Thu, 4 Oct 2007 22:44:14 +0100, "GT"

Post by GT
I understand all this significant digit stuff, my problem is - Why does the
computer store and use the 'dodgy' last digit if it is actually WRONG and
causes incorrect results?

What you are encountering is a fundamental limitation in the IEEE 754
64-bit binary representation of floating point numbers used to
represent those numbers in what is essentially an integer process
inside the CPU. This is not an error, this is a fundamental design
compromise made when the IEEE 754 standard was introduced and
designed.

If you use Newcomer's floating point explorer you will see in the bit
pattern of the mantissa that there is no distinction between the bit
pattern for

0.83333333333333337 (0xAAAAAAAAAAAAB)
and
0.83333333333333333 (0xAAAAAAAAAAAAB)

Likewise for
0.83333333333333333
0.83333333333333334
0.83333333333333335
0.83333333333333336
0.83333333333333337
0.83333333333333338
0.83333333333333339

(The full 64-bit representation of your number is 0x3FEAAAAAAAAAAAAB
but Newcomer chooses not to express the full hexadecimal
representation in his program, we must forgive him.) He does provide
source, so this is easily fixed.

Now, when you ask the C++ libraries to "print" the floating point
result as a string the printing routines must print the representation
of the bit pattern and the authors of the library decided to print the
7 rather than the 3 through 9 that interpreting (0xAAAAAAAAAAAAB) as a
float would bring.

In other words, if you want to compute floating point to 17 digits of
precision, expect to encounter these kinds of errors in those last 2
digits. If you don't want to see them, then format your output to 15
digits of precision. In this case you are using printf("%.17f", num)
and it is your program that is wrong, not the math. If you use
printf("%.15f\n", num) (0.83333333333333337 becomes 0.833333333333333)
and the output is correct for all floating point results representable
by 64 bits.

You should also evaluate the macro DBL_DIG in <float.h> for your
compiler and environment when engaging in floating point operations so
you understand the fundamental limitations of your compiler and
libraries. If you had evaluated DBL_DIG you probably would have found
the Microsoft compiler supports DBL_DIG = 15 or 15 digits of
precision. The IEEE 754 FPU carries it out to 17 digits in order to
guaranty accuracy to 15 digits of precision.

Another problem of floating point math is adding or subtracting
numbers that are close in value, as you have done to obtain your
-0.00000000000000007 result. You should structure your code to avoid
this.

See P. J. Plaugher's "The Standard C Library" for an understanding of
how difficult it is to design and test floating point operations
inside integer computers.

If 15 decimal points of precision is not enough for your application,
I suggest you design your own floating point library and do not depend
on the floating point processors in your computers.

You could also ask Microsoft how they get the Calculator application
to yield 0.83333333333333333333333333333333 when doing 25/30.

Furthermore, since 25/30 is a repeating-decimal, irrational number and
cannot be represented by any finite set of digits you cannot expect to
take this calculation seriously to arbitrary decimal points on any
finite computer. If 25 and 30 represent measurable quantities they
cannot be known (measured) to arbitrary precision. In other words, if
you are building a real device you can measure 25.0 or 25.0000 plus or
minus the least significant digit, plus or minus the error in your
measuring device. As finite reals, measured with a given uncertainty,
you cannot expect your results to exceed the precision with which the
operands are known. So in your example, if you know 25.0 and 30.0 to
three significant figures you cannot express 25.0/30.0 any more
precisely than 0.833 anyway and all this fuss about 17 digits of
precision is for nothing.

http://en.wikipedia.org/wiki/Significant_figures

Joseph M. Newcomer

2007-10-07 20:11:30 UTC

See below..

Post by Geoff
On Thu, 4 Oct 2007 22:44:14 +0100, "GT"

Post by GT
I understand all this significant digit stuff, my problem is - Why does the
computer store and use the 'dodgy' last digit if it is actually WRONG and
causes incorrect results?

What you are encountering is a fundamental limitation in the IEEE 754
64-bit binary representation of floating point numbers used to
represent those numbers in what is essentially an integer process
inside the CPU. This is not an error, this is a fundamental design
compromise made when the IEEE 754 standard was introduced and
designed.
If you use Newcomer's floating point explorer you will see in the bit
pattern of the mantissa that there is no distinction between the bit
pattern for
0.83333333333333337 (0xAAAAAAAAAAAAB)
and
0.83333333333333333 (0xAAAAAAAAAAAAB)
Likewise for
0.83333333333333333
0.83333333333333334
0.83333333333333335
0.83333333333333336
0.83333333333333337
0.83333333333333338
0.83333333333333339
(The full 64-bit representation of your number is 0x3FEAAAAAAAAAAAAB
but Newcomer chooses not to express the full hexadecimal
representation in his program, we must forgive him.)

****
Yes, I only show the mantissa precision. Counting the sign and the exponent the full
64-bit value is

0x3FEAAAAAAAAAAAAA
****

Post by Geoff
He does provide
source, so this is easily fixed.
Now, when you ask the C++ libraries to "print" the floating point
result as a string the printing routines must print the representation
of the bit pattern and the authors of the library decided to print the
7 rather than the 3 through 9 that interpreting (0xAAAAAAAAAAAAB) as a
float would bring.
In other words, if you want to compute floating point to 17 digits of
precision, expect to encounter these kinds of errors in those last 2
digits. If you don't want to see them, then format your output to 15
digits of precision. In this case you are using printf("%.17f", num)
and it is your program that is wrong, not the math. If you use
printf("%.15f\n", num) (0.83333333333333337 becomes 0.833333333333333)
and the output is correct for all floating point results representable
by 64 bits.
You should also evaluate the macro DBL_DIG in <float.h> for your
compiler and environment when engaging in floating point operations so
you understand the fundamental limitations of your compiler and
libraries. If you had evaluated DBL_DIG you probably would have found
the Microsoft compiler supports DBL_DIG = 15 or 15 digits of
precision. The IEEE 754 FPU carries it out to 17 digits in order to
guaranty accuracy to 15 digits of precision.
Another problem of floating point math is adding or subtracting
numbers that are close in value, as you have done to obtain your
-0.00000000000000007 result. You should structure your code to avoid
this.
See P. J. Plaugher's "The Standard C Library" for an understanding of
how difficult it is to design and test floating point operations
inside integer computers.
If 15 decimal points of precision is not enough for your application,
I suggest you design your own floating point library and do not depend
on the floating point processors in your computers.
You could also ask Microsoft how they get the Calculator application
to yield 0.83333333333333333333333333333333 when doing 25/30.
Furthermore, since 25/30 is a repeating-decimal, irrational number and
cannot be represented by any finite set of digits you cannot expect to
take this calculation seriously to arbitrary decimal points on any
finite computer. If 25 and 30 represent measurable quantities they
cannot be known (measured) to arbitrary precision. In other words, if
you are building a real device you can measure 25.0 or 25.0000 plus or
minus the least significant digit, plus or minus the error in your
measuring device. As finite reals, measured with a given uncertainty,
you cannot expect your results to exceed the precision with which the
operands are known. So in your example, if you know 25.0 and 30.0 to
three significant figures you cannot express 25.0/30.0 any more
precisely than 0.833 anyway and all this fuss about 17 digits of
precision is for nothing.
http://en.wikipedia.org/wiki/Significant_figures

Joseph M. Newcomer [MVP]
email: ***@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

Geoff

2007-10-07 04:34:14 UTC

On Thu, 4 Oct 2007 22:44:14 +0100, "GT"

Post by GT
I only require about 5 or 6 decimal places, but float and double both see
the same problem, so what am I supposed to do?

Compute using double operands and double as the resultant in all your
computations, use %.6f when formatting your output.

Geoff

2007-10-07 04:44:45 UTC

On Thu, 4 Oct 2007 22:44:14 +0100, "GT"

Post by GT
One of the results should be exactly 0, but the computer gives the
result -0.00000000007. Which is then displayed on the screen as -0.0. This
result is displayed on the screen (in an edit boxes) along with all the
positive numbers and is just wrong!

-0.0 = 0.0 for all intents and purposes. The only difference is the
sign bit in the internal representation.

You cannot perform floating point math on a computer and compare the
result to integer 0 or even a float 0. You must compare it to what
your application will allow to be close enough to zero to be
considered zero. Therefore your statements for comparing a floating
point value to zero to six figures should be the equivalent of:

If 0.000001 > n < -0.000001 then n = 0 where n is type double and the
bracketing values are acceptable in your application.

Geoff

2007-10-07 04:47:01 UTC

On Thu, 4 Oct 2007 22:44:14 +0100, "GT"

Post by GT
We are not working on a low level C system here, we
are writing an MFC application with dialog boxes, menus and the likes. There
is not a printf in sight.

No, the printf's are inside the MFC that prints the dialogs. The fact
that they are hidden from you doesn't mean they don't express
themselves in your application.

Michael K. O'Neill

2007-10-04 17:55:14 UTC

Post by AliR (VC++ MVP)
http://www.google.com/search?q=Significant+digits+double&hl=en

That info on significant digits was all very interesting, but how do I get
my code to work properly?!?

Your code **is** working properly. It's your algorithm that's wrong.

Your algorithm assumes that computers have infinite precision. They don't.
With doubles, you get 15-16 decimal digits of precision. With floats you
get 7-8 decimal digits of precision. Anything after that is essentially
random.

Think of it this way (using floats, for simplification): There are an
infinite number of numbers between 1.2345678 and 1.2345679. But with a
float, there are only a limited number of bits (32 bits) with which to
represent them. So the last decimal digit is essentially random, owing to
the quantizing effect of individual bits.

Those tiny amounts of imprecision accumulate quickly, especially if the
algorithm isn't designed to expect them. You wrote, for example, that the
following code:

double effortChangeProportion = (55.0 - 30.0) / 30.0;

yields 0.83333333333333337, and you were surprised with that since you
expected 0.83333333333333333 (i.e., the final digit should be truncated or
rounded or something to be a 3, not a 7). The intermediate subtraction,
before the division, shows that you had already sacrificed the final decimal
digit in the effort to store the value of the difference of 55.0-30.0=25.0
(i.e., the difference is greater than 10, which uses up one of the 15-16
decimal digits of precision).

Your code might perform better if you re-wrote it as

double effortChangeProportion = (55.0/30.0) - 1.0;

which is equivalent mathematically, but YMMV depending on your expectations
for numerical values of your two numbers, since for some numerical values,
your initial code might actually be better.

See this page (one of many available by Googling): "IEEE-754 Floating-Point
Conversion From Decimal Floating-Point To 32-bit and 64-bit Hexadecimal
Representations Along with Their Binary Equivalents" at
http://babbage.cs.qc.edu/IEEE-754/Decimal.html

As a thought experiment, imagine that (with floats) you loop one billion
times and in each iteration you add one one-billionth to an initial starting
value of exactly 1.0. Is the reult 2.0? Of course not. The result is
still 1.0, because of loss of significant digits.

You need to re-design your algorithm to accommodate the expected amount of
precision. Read the link that Les posted below, "What every computer
scientist should know about floating point arithmetic." at
http://docs.sun.com/source/806-3568/ncg_goldberg.html

Mike

Joseph M. Newcomer

2007-10-07 02:39:12 UTC

Your code IS working properly. Your expectations are not working properly.

The result of a floating point computation that is "supposed" to be zero is only ever
"approximately" zero, and your result looks very much like the result of such a
computation.

Note that this phenomenon has been known and understood since the invention of binary
floating point arithmetic. There are probably fifty years of computer literature dealing
with this phenomenon, and the fact that it is new to you doesn't change the nature of the
problem or any of the issues. Bottom line: your expectations are wrong. You will have to
change your expectations to conform to reality.

I should point out that when I started using floating point in 1963, this problem was old,
and when we learned to program in FORTRAN (FORTRAN II, actually), our instructors spent
time explaining all about floating-point precision. We knew that you could not compare
results to 0, we knew that there would be roundoff error, and we learned this somewhere in
the first couple hours of training. The manuals all explained it. It was taken for
granted that these issues would arise. John Von Neumann knew about this problem in the
mid-1940s, which is why he thought floating-point arithmetic units would never catch
on...because he believed the programmer should use fixed point integer arithmetic so the
programmer would always be aware of the roundoff issues. So the problem is now well over
60 years old in terms of understanding that it is a problem in computers. I'm sure that
the old astronomers such as Johannes Kepler, who worked out orbital mechanics and proved
the planets move in elliptical orbits, also understood roundoff error. In that sense, the
understanding of roundoff error is centuries old. His work was published in 1611. So
roundoff error should not surprise you, it isn't anything new.
joe

Post by AliR (VC++ MVP)
http://www.google.com/search?q=Significant+digits+double&hl=en

That info on significant digits was all very interesting, but how do I get
my code to work properly?!? I want to force the computer to calculate and
store as many digits as it can handle and no extra spurious digits, so that
my calculations get nice accurate results. At the moment part of my
calculation ends up with -0.00000000000007 (haven't counted the zeros here,
but you get the idea!), where it should be zero.

Joseph M. Newcomer [MVP]
email: ***@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

GT

2007-10-08 09:53:26 UTC

Post by Joseph M. Newcomer
Your code IS working properly. Your expectations are not working properly.

I expected the C++ basic data types to be able to handle simple maths
properly. Clearly it can only store digits to a certain number binary
places, therefore the trailing digit in a decimal conversion is unreliable
and 9/10 times mathematically *WRONG*. I expected there to be some logic
built in to the data type to take care of this. Would you buy a calculator
that gave wrong results? That is reality!

Les

2007-10-04 15:37:07 UTC

Please read :

"What every computer scientist should know about floating point arithmetic."

http://docs.sun.com/source/806-3568/ncg_goldberg.html

many people have stumbled over this point.
Les

Giovanni Dicanio

2007-10-04 17:55:31 UTC

Post by Les
"What every computer scientist should know about floating point arithmetic."
http://docs.sun.com/source/806-3568/ncg_goldberg.html

I completely agree with this suggestion.

Moreover, forget about using operator== (like x == y) on floating point
numbers.

If you want to compare floating point numbers, you should use kind of
"fuzzy" compare, i.e.

// x == y becomes:
if ( fabs( x - y ) < tolerance )
... // x "==" y

(tolerance depends on your problem, on the context, it may be e.g. 1.0e-5...
it depends...)

Giovanni

xrxst32

2007-10-05 05:54:33 UTC

Post by GT
I have been debugging something for ages now. I have a method that does some
complex maths, but right at the beginning it works out a proportion and a
few ratios and the maths is simply wrong. In my code I (obviously) use
variables and the values vary each time the method is called, but there
seems to a problem with the maths. I have narrowed the problem down to the
following. Can someone else please try this simply calculation and see what
their computer gets.
double effortChangeProportion = (55.0 - 30.0) / 30.0;
This first line does 55-30 and divides the result by 30. In other words
25/30, which is 0.8333 (recurring 3s).
The computer manages to give the answer 0.83333333333333337 !!
effortChangeProportion++;
or
effortChangeProportion = effortChangeProportion + 1.0;
The second line of code (both alternatives give the same result) builds on
the first by simply adding 1 (so I can then multiply other numbers by this
proportion).
In this case 0.8333 becomes 1.8333 , but again the computer gets this wrong.
It tries to add 1 to 0.83333333333333337 and gets 1.8333333333333335.
Obviously this can easily be done in 1 line of code, but it is broken down
to demonstrate the maths going wrong twice!
Can anyone shed some light on this for me please?
GT

Try the DECIMAL datatype of OLE which uses decimal arithmetics
(instead of binary of the IEEE datatypes float/double).
The speed is much slower because there's no hardware support but the
accuracy is much better.
There some nice wrapper on CodeProject like
http://www.codeproject.com/com/decimalwrap.asp

Steve Kelley

2007-10-05 17:54:12 UTC

I'm going to go out a limb here and say that the whole issue in this
thread is irrelevant. Digits in calculation results are not considered
significant beyond the number of significant digits of the least precise
value in the calculation. Therefore, anything beyond the number of
significant digits of the least precise value do not matter and the
computer is counting just fine. The last digit in your example simply
does not matter.

Post by GT
I have been debugging something for ages now. I have a method that does some
complex maths, but right at the beginning it works out a proportion and a
few ratios and the maths is simply wrong. In my code I (obviously) use
variables and the values vary each time the method is called, but there
seems to a problem with the maths. I have narrowed the problem down to the
following. Can someone else please try this simply calculation and see what
their computer gets.
double effortChangeProportion = (55.0 - 30.0) / 30.0;
This first line does 55-30 and divides the result by 30. In other words
25/30, which is 0.8333 (recurring 3s).
The computer manages to give the answer 0.83333333333333337 !!
effortChangeProportion++;
or
effortChangeProportion = effortChangeProportion + 1.0;
The second line of code (both alternatives give the same result) builds on
the first by simply adding 1 (so I can then multiply other numbers by this
proportion).
In this case 0.8333 becomes 1.8333 , but again the computer gets this wrong.
It tries to add 1 to 0.83333333333333337 and gets 1.8333333333333335.
Obviously this can easily be done in 1 line of code, but it is broken down
to demonstrate the maths going wrong twice!
Can anyone shed some light on this for me please?
GT

--
Steve Kelley

Joseph M. Newcomer

2007-10-07 01:48:56 UTC

Oh, my goodness! ANOTHER "floating point doesn't work" post. Why do you think that
decimal arithmetic rules apply to floating point?

NOTE: FLOATING POINT WORKS AS IT IS SUPPOSED TO. YOUR EXPECTATIONS ARE WRONG.

There are two possible answers to your arithmetic:

0.83333333333333326
0.83333333333333337

Note that
0.83333333333333333

is NOT one of the possible answers. This is because THIS IS HOW FLOATING POINT ARITHMETIC
WORKS. These are the only two possible answers, based on the low-order bit of a 64-bit
floating point value. Download my Floating Point Explorer and see for yourself.

http://www.flounder.com/floating_point_explorer.htm

Read any book on how floating point works. Look up floating point in wikipedia. They
will all explain how floating point works. goggle for ieee floating point. There are tons
of references explaining exactly why you got the correct result for your computation. All
that's missing is the fact that you think the answer should be something else, and it
can't be.

Under various settings of roundoff, the ...37 version is only off by +...04, but the ...26
version is off by -....07, so the ...37 version is closer, and therefore that's the way
the rounding works.

Note that your complaint would be just like complaining that, given an int, that

2147483647 + 1

does not equal

2147483648

but this can't happen using 32-bit signed integer arithmetic. For the same reasons,
25.0/30.0 cannot under any circumstances on an x86 equal 0.83333333333333333 because there
is no way to represent this number in an IEEE 754 64-bit floating point number, which is
what the x86 chip uses. There are only two representations that are close, and I just
gave them to you. And the answer you got was the closest answer.

joe

Post by GT
I have been debugging something for ages now. I have a method that does some
complex maths, but right at the beginning it works out a proportion and a
few ratios and the maths is simply wrong. In my code I (obviously) use
variables and the values vary each time the method is called, but there
seems to a problem with the maths. I have narrowed the problem down to the
following. Can someone else please try this simply calculation and see what
their computer gets.
double effortChangeProportion = (55.0 - 30.0) / 30.0;
This first line does 55-30 and divides the result by 30. In other words
25/30, which is 0.8333 (recurring 3s).
The computer manages to give the answer 0.83333333333333337 !!
effortChangeProportion++;
or
effortChangeProportion = effortChangeProportion + 1.0;
The second line of code (both alternatives give the same result) builds on
the first by simply adding 1 (so I can then multiply other numbers by this
proportion).
In this case 0.8333 becomes 1.8333 , but again the computer gets this wrong.
It tries to add 1 to 0.83333333333333337 and gets 1.8333333333333335.

****
The only two possible answers to this are

1.8333333333333335
1.8333333333333337

As it turns out, due to the way floating point works, 1.8333333333333335 is the better
answer, because the precision stored in the ALU is actually greater than 64 bits, so the
temporary value might still be there. You are living in some dream world where you think
that floating point is done in decimal; it is not. It is done in binary, and the rules of
BINARY floating point are what apply. You are thinking the rules of DECIMAL arithmetic
apply, and they do not, and never have.
joe
****

Post by GT
Obviously this can easily be done in 1 line of code, but it is broken down
to demonstrate the maths going wrong twice!
Can anyone shed some light on this for me please?
GT

Joseph M. Newcomer [MVP]
email: ***@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

GT

2007-10-08 09:56:30 UTC

Post by Joseph M. Newcomer
Oh, my goodness! ANOTHER "floating point doesn't work" post. Why do you think that
decimal arithmetic rules apply to floating point?
NOTE: FLOATING POINT WORKS AS IT IS SUPPOSED TO. YOUR EXPECTATIONS ARE WRONG.
0.83333333333333326
0.83333333333333337

***
Both of which are mathematically incorrect and unreliable in future
calculations!
***

Post by Joseph M. Newcomer
Note that
0.83333333333333333

***
Is mathematically correct and reliable, but not storable, so my question
(for the 27th time) is why doesn't the basic data type ignore the last,
unreliable digit and use 0.8333333333333333 (1 digit less than it can
store)?
***

David Wilkinson

2007-10-08 12:50:11 UTC

Post by Joseph M. Newcomer
0.83333333333333326
0.83333333333333337

Both of which are mathematically incorrect and unreliable in future
calculations!

Post by Joseph M. Newcomer
Note that
0.83333333333333333

Is mathematically correct and reliable, but not storable, so my question
(for the 27th time) is why doesn't the basic data type ignore the last,
unreliable digit and use 0.8333333333333333 (1 digit less than it can
store)?

GT:

If you multiply each of the above numbers by 30, none of the results
gives you exactly 25.

When C or C++ displays a floating point (float or double) number in
decimal form, it has no idea where that number came from, or how many
valid digits (binary or decimal) it has. Depending on how the number was
generated, a double might have 16 valid decimal digits, or it might have
none.

Floating point arithmetic buys you a lot, in particular the ability to
perform many complex calculations without worrying about overflow.

One downside is that you typically cannot compare two numbers for
equality, but you must learn to deal with that.

Another is that some formally correct computational algorithms are
unstable in finite precision arithmetic.

--
David Wilkinson
Visual C++ MVP

366 Replies
73 Views
Permalink to this page
Disable enhanced parsing

Thread Navigation

GT 2007-10-04 14:24:51 UTC

.rhavin grobert 2007-10-04 14:36:32 UTC

GT 2007-10-04 15:00:22 UTC

.rhavin grobert 2007-10-04 15:19:48 UTC

GT 2007-10-04 15:28:32 UTC

.rhavin grobert 2007-10-04 15:40:13 UTC

GT 2007-10-04 16:10:28 UTC

.rhavin grobert 2007-10-04 16:22:34 UTC

Joseph M. Newcomer 2007-10-07 02:19:14 UTC

GT 2007-10-08 09:57:54 UTC

Les 2007-10-08 10:15:00 UTC

GT 2007-10-08 10:21:34 UTC

Joseph M. Newcomer 2007-10-07 01:59:21 UTC

Joseph M. Newcomer 2007-10-07 01:57:16 UTC

GT 2007-10-08 09:35:04 UTC

GT 2007-10-04 15:02:14 UTC

AliR (VC++ MVP) 2007-10-04 15:56:09 UTC

GT 2007-10-04 14:43:50 UTC

.rhavin grobert 2007-10-04 15:08:23 UTC

Joseph M. Newcomer 2007-10-07 02:23:53 UTC

Joseph M. Newcomer 2007-10-07 02:21:33 UTC

GT 2007-10-08 09:37:22 UTC

Les 2007-10-08 10:30:28 UTC

GT 2007-10-08 12:13:24 UTC

David Wilkinson 2007-10-08 12:32:04 UTC

AliR (VC++ MVP) 2007-10-04 15:10:28 UTC

GT 2007-10-04 15:27:28 UTC

Les 2007-10-04 15:57:23 UTC

GT 2007-10-04 16:14:48 UTC

Les 2007-10-04 16:39:59 UTC

GT 2007-10-04 21:54:23 UTC

Ashot Geodakov 2007-10-04 23:01:12 UTC

Scott McPhillips [MVP] 2007-10-04 22:44:53 UTC

Alexander Grigoriev 2007-10-05 03:10:56 UTC

Les 2007-10-05 08:19:51 UTC

GT 2007-10-05 09:38:10 UTC

Les 2007-10-05 11:18:52 UTC

GT 2007-10-05 12:30:31 UTC

Les 2007-10-05 16:04:28 UTC

Michael K. O'Neill 2007-10-05 16:36:25 UTC

Joseph M. Newcomer 2007-10-07 03:04:40 UTC

.rhavin grobert 2007-10-05 15:48:12 UTC

GT 2007-10-05 16:01:49 UTC

.rhavin grobert 2007-10-05 16:24:37 UTC

GT 2007-10-08 09:43:21 UTC

Joseph M. Newcomer 2007-10-07 03:06:42 UTC

GT 2007-10-08 09:44:07 UTC

Norbert Unterberg 2007-10-05 22:50:04 UTC

Joseph M. Newcomer 2007-10-07 03:02:25 UTC

GT 2007-10-08 09:45:46 UTC

Stuart Redmann 2007-10-05 08:44:53 UTC

Joseph M. Newcomer 2007-10-07 03:09:02 UTC

Joseph M. Newcomer 2007-10-07 02:56:36 UTC

GT 2007-10-08 09:48:00 UTC

Alexander Grigoriev 2007-10-05 03:05:04 UTC

Joseph M. Newcomer 2007-10-07 02:40:37 UTC

GT 2007-10-08 09:48:59 UTC

AliR (VC++ MVP) 2007-10-04 15:55:29 UTC

Alexander Grigoriev 2007-10-05 03:01:42 UTC

Luke alcatel 2007-10-04 16:02:07 UTC

GT 2007-10-04 21:44:14 UTC

Michael K. O'Neill 2007-10-05 01:54:03 UTC

Les 2007-10-05 08:43:48 UTC

Joseph M. Newcomer 2007-10-07 03:41:29 UTC

David Ching 2007-10-07 04:00:02 UTC

BobF 2007-10-07 10:44:51 UTC

David Ching 2007-10-07 15:09:55 UTC

Giovanni Dicanio 2007-10-07 15:42:34 UTC

Joseph M. Newcomer 2007-10-07 19:27:20 UTC

GT 2007-10-08 10:02:55 UTC

David Ching 2007-10-08 10:45:00 UTC

David Wilkinson 2007-10-08 12:19:56 UTC

Joseph M. Newcomer 2007-10-07 19:25:17 UTC

Alexander Grigoriev 2007-10-08 02:47:21 UTC

Geoff 2007-10-07 14:59:49 UTC

David Ching 2007-10-07 15:44:45 UTC

Giovanni Dicanio 2007-10-07 15:57:42 UTC

David Ching 2007-10-07 16:13:34 UTC

BobF 2007-10-07 18:12:17 UTC

Geoff 2007-10-07 19:12:01 UTC

Joseph M. Newcomer 2007-10-07 23:46:45 UTC

Joseph M. Newcomer 2007-10-07 19:59:15 UTC

Joseph M. Newcomer 2007-10-07 19:51:26 UTC

Giovanni Dicanio 2007-10-08 08:06:12 UTC

Joseph M. Newcomer 2007-10-07 19:04:05 UTC

Geoff 2007-10-07 04:19:07 UTC

Joseph M. Newcomer 2007-10-07 20:11:30 UTC

Geoff 2007-10-07 04:34:14 UTC

Geoff 2007-10-07 04:44:45 UTC

Geoff 2007-10-07 04:47:01 UTC

Michael K. O'Neill 2007-10-04 17:55:14 UTC

Joseph M. Newcomer 2007-10-07 02:39:12 UTC

GT 2007-10-08 09:53:26 UTC

Les 2007-10-04 15:37:07 UTC

Giovanni Dicanio 2007-10-04 17:55:31 UTC

xrxst32 2007-10-05 05:54:33 UTC

Steve Kelley 2007-10-05 17:54:12 UTC

Joseph M. Newcomer 2007-10-07 01:48:56 UTC

GT 2007-10-08 09:56:30 UTC

David Wilkinson 2007-10-08 12:50:11 UTC

about - legalese

Loading...