Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Floats Are Weird (exozy.me)
154 points by soopurman on Feb 19, 2024 | hide | past | favorite | 115 comments


That's a neat trick, "cancelling" out catastrophic cancellation. Of course it doesn't work anymore once x is smaller than machine epsilon, whereas expm1(x) / x will continue to work.

In general herbie is pretty good at suggesting the right functions / rearrangements. For this example, it finds expm1: https://herbie.uwplse.org/demo/378a0682ec4d3e735c790a58fa97e...


Gotta love that Herbie also provides just `1` as alternative. "66.8% accurate, 305.0× speedup"

But the expression you get if you disable "numerics" ruleset (which includes expm1) is pretty wild also:

(if (<= (exp x) 4504410275303423/4503599627370496) (+ 1 (+ (* 1/24 (pow x 3)) (* x (+ 1/2 (* x 1/6))))) (/ (+ (exp x) -1) (log (exp x)))))


That doesn't surprise me, as before floats are involved you need to transform the series into something you can compute (doing this by hand is a good way to understand what's going on underneath).

https://developers.redhat.com/blog/2015/01/02/improving-math... gives a basic intro to how these functions are actually implemented, and we can if you compute exp(x) - 1 near 0 that 1 + (small stuff) - 1 is always going to be a problem (it'd be nice if we had a sufficiently smart compiler that could understand how to inline these things, but then it'd probably do surprising things as well).


I guess "floats are weird" is a catchier title than "numerical computing is an acquired skill based in part on understanding the various consequences of value representation density being inversely proportional to the absolute value".


It's at least nice to see somebody reasoning through where error comes from instead of just throwing up their hands as if the error was inexplicable. I agree that the title is unfortunate though.


I think the title is alright. The curious thing is that both `f` and `g` have catastrophic cancellation, so you expect both to be inaccurate, but magically `g` recovers from it.

Alternative title: catastrophic cancellation cancels out


> I think the title is alright

I disagree (for the title I see now, which is “Floats are weird”). The problem isn’t about floats, but about inexact computation and, as you say, catastrophic cancellation.

If, for example, you do all your computations in 7.6 digit base 11 numbers (seven digits before the undecimal point, six after it), you get the same problem.


Ages ago, when I was in college, a "numerical methods" course was more or less obligatory... has that changed?

Anyway, toying with a slide rule is a fun way of making these things very obvious, because you are essentially working with something like 3-digit floating point. When you find yourself subtracting 1.234 from 1.235, but you know that both the "4" and the "5" are really fuzzy around the edges, it becomes obvious that the resulting 0.001 does not really mean much. Similarly, when you take 1.234e+6 (again with a really fuzzy "4") and add 2, you realize that 1.234002e+6 is not really a sane answer.


I assume it depends on the department. CS students are afraid of non-integer values, engineering students are afraid of codes longer than 200 lines. Just kidding, mostly.

I do somewhat wonder if numerical methods have suffered from their success; like everybody uses LAPACK via Numpy but understanding it under the hood, that job always belongs to the next department over wherever you are, haha.


How floats work is a non-trivial part of my 3rd semester CS course (Computer Architecture) in the US, I'd imagine the same applies to every other good college.


AFAIK in Europe numerical methods is mandatory in most Telco/EE degrees, but not in CS/SWE. The opposite happens with discrete maths btw


Here in Spain if you can't grasp numerical methods in CS you can't even think on earning a degree.


Here in Canada, CS is part of the Math Department at the University of Waterloo, but part of the Philosophy Department at University of Toronto.

Waterloo graduates take a Numerical Methods course in second year. I fear for the philosophers.


For this specific case, use Python's expm1, see https://docs.python.org/3/library/math.html#math.expm1 and history of expm1 at https://en.wikipedia.org/wiki/Exponential_function#expm1

  def f(x):
    return math.expm1(x)/x
The expm1(x) means "exp(x) minus 1".

  >>> f(1e-15)
  1.0000000000000007
The technique described gives

  1.0000000000000004
which is 1 step smaller than the value computed via expm1():

  >>> math.nextafter(f(1e-15), 0)
  1.0000000000000004
Both are within 1 ulp of the more precise value from WolframAlpha:

  1.0000000000000005000000000000001666666666666667083333333333333416...


To be precise, it’s C’s `expm1`. Like other things in math.h, it’s just adopted as is to most popular languages, even including PHP and JS.


To be precise, it's most likely Kahan's `expm1`, since Wikipedia attributes it to him. It adds that expm1 was first implemented on an HP calculator in the 1970s.

It became part of BSD 4.3 due to Kahan and one of his students. The documentation at https://archive.org/details/unix-programmers-reference-manua... mentions "expm1" was available in HP BASIC, and mentions the Apple C compiler at the time used "exp1", which you can verify from the Apple Numerics Manual Second Edition 1988 at https://archive.org/details/apple-numerics-manual-second-edi...

That makes it BASIC's expm1 before it was C's. ;)

While yes, Python's expm1 uses the platform libm's expm1 through C, Python's math module will avoid using the vendor libm for cases where the vendor implementation is known to have numerical issues.

For one such example, on macOS the following:

  #include <math.h>
  #include <stdio.h>

  int main(void) {
    double x = -1.0;
    x = nextafter(x, 0.0); // -0.9999999999999999
    printf("lgamma(%.19f) -> %.17f\n", x, lgamma(x));
    return 0;
  }
reports

  lgamma(-0.9999999999999998890) -> 36.25168935623486988
while Python, using its own lgamma() implementation:

  import math
  x = math.nextafter(-1.0, 0.0)
  print(f"lgamma({x}) -> {math.lgamma(x)}")
reports:

  lgamma(-0.9999999999999999) -> 36.7368005696771
showing that Python's lgamma is not a pass-through to C's.

FWIW, WolframAlpha says the value is 36.73680056... so Python's is more accurate.

EDIT: Python's math.expm1 was added in version 3.2 back in February 20th, 2011, and did not depend on C's expm1. Instead, it used Kahan's method directly:

  /* Mathematically, expm1(x) = exp(x) - 1.  The expm1 function is designed
     to avoid the significant loss of precision that arises from direct
     evaluation of the expression exp(x) - 1, for x near 0. */

  double
  _Py_expm1(double x)
  {
    /* For abs(x) >= log(2), it's safe to evaluate exp(x) - 1 directly; this
       also works fine for infinities and nans.

       For smaller x, we can use a method due to Kahan that achieves close to
       full accuracy.
    */

    if (fabs(x) < 0.7) {
        double u;
        u = exp(x);
        if (u == 1.0)
            return x;
        else
            return (u - 1.0) * x / log(u);
    }
    else
        return exp(x) - 1.0;
  }


expm1 is specified in IEEE 754-2008 (didn't bother to check old revisions)


Officially arrived in C99 if my memory serves ...


Re money systems: I have found that problems arise no matter whether floats are used or something else: The system can be too exact.

My current approach is that I try to guess (or ask) how a customer understands or checks a statement/invoice. The customer usually takes a calculator and enters the rounded numbers they see. I then try to do all programmatic calculations exactly the same way: Operation - round - operation - round etc, with rounding-steps at exactly the same places where these values will be visible to the customer, and with the same number of decimal places (can vary). Sometimes that is not possible of course, which is usually solved with a final row called 'rounding errors in the customer's favor').

For my (smaller) systems I find floats sufficient and easier to work with compared to the alternatives, which tend to make code very verbose and annoying to read.


I thought "money systems" always worked in whole units of some fraction of your currency, e.g. "millicents" or similar, and shunned floating point arithmetic like the plague.

> The system can be too exact

Indeed, high precision, low accuracy! Fixed-point arithmetic can be both precise and accurate if everyone agrees on the same system. Maybe we don't, and that's the problem?


IBM partially has a hold on banking because they have supported decimal floating point. By switching binary floating-point from radix 2 to radix 10 (decimal exponents) they get rid of some of the problems.

IEEE has had these Decimal types for a while, but IBM is the only company shipping CPUs with them.

Fixed decimal is still used where appropriate, but Decimal floating point is a critical feature for some processes.

While personal taxes typically drop the cents, tariffs and other taxes are often small percentages as an example.

There are software implementations if you don't need performance.

But Decimal64 has a lot more represented values and it is easier to trap on some edge cases.

As Decimal floating point types were added into C23 we will see if hardware support follows.

Note this is also why HP calculators were more accurate for a given bit size, because the engineering ones used radix 10 floats.


What does radix 10 floating point bring to the table in terms of precision other than a whole new set of rounding problems?

As far as I know, all accounting systems use 'round to nearest cent' integers (or whatever the lowest denomination is in a given currency). Is there a case where calculations using integer cents is in any way inferior to radix-ten fixed point?


In Europe, after MIFID II, equities are priced with tick sizes (i.e. the minium whole unit) that depend on the magnitude of the price, i.e are de-jure (decimal) floating points that behave weirdly.


money systems work with whatever tool is convenient to the operations staff which these days is typically Microsoft Excel and Visual Basic, neither of which come with a built in Decimal type.

this will typically get transmuted through multiple layers of javascript, xml, back end java, cobol, SQL, python, some domain specific language, and back again.


That's called the stability of the algorithm. The first algorithm is not stable whereas the second with the logarithm is.

Another example: compare the two algorithms, that are actually pure-mathematically equal:

  def fc(x): 
      return sqrt(x+1) - sqrt(x)
  def fd(x): 
      return 1/(sqrt(x+1) + sqrt(x))

For large values of x (1e16 for example) plot the results and see the difference.

  xs = np.linspace(10\*14,10\*16,10000) 
  plt.figure(figsize=(8, 6), dpi=120) 
  plt.plot(xs,[fc(x) for x in xs]) 
  plt.plot(xs,[fd(x) for x in xs])


i think it is our educational system that is weird.

we teach students from arithmetic to differential equations assuming they are working with Real Numbers and never really talking about what that means or what a Field is. Then we present computation as something that can easily be done with Real numbers. Which is a lie because we don't even know if Pi plus E is rational or not.

Then we present computers as tools of computation, handwaving away the fact that Computer floating point math is not using real numbers, and is not a Field nor anything even remotely like a Field. Floats are not even closed under basic arithmetic operations. And god help someone who learned Geometry on graph paper then tries to apply it to a computer with Floats because Floats cannot even represent a uniform grid in space, they vary in resolution by definition depending on distance from origin.

So in the end by trying to gloss over the details of how Real Numbers work, and how Computers work, we have actually misrepresented both Real Numbers and Computers to generations of people, for going on 80 years now. And we keep having to have these articles, or "your calculator is wrong" videos go viral every few months/years because people were not taught basic facts about Reals and Computers.


Great points, let me riff on that though -- what's really "weird" (but totally understandable in the historical context of underpowered computers until recently) is that we think of floats as the default representation of a real at all, when more reasonable modern default choices for all purposes outside of high performance applications would either be a rational composed of bigints, or a bigdecimal, or maybe a generating function for the coefficients of a Taylor series expansion or something like that if you really want to get nuts. In any case you can have as much precision as you can afford, although for anything but rationals it'll be an approximation always, and the question is just how many digits you care for. A lot of choices but the only one that's really absurd is the actual default of floats which are terrible for every purpose except their original raison detre, which was performance in a time when computers were so slow that buying a floating point unit to put in your computer was a normal desirable upgrade for a PC owner (and still crucial for high thoroughput computing, video games, graphics rendering, ML, etc)

I do remember being taught the difference between rationals and irrationals in high school.

Another aside... The really crazy difference is between the definable and non definable numbers. The first which includes everything from pi to every busy beaver number is countably finite just like the integers, only the nondefinables (numbers whose shortest mathematical definition would not be finite or not exist) give the reals their uncountably infinite nature


> when more reasonable modern default choices for all purposes outside of high performance applications would either be a rational composed of bigints, or a bigdecimal, or maybe a generating function for the coefficients of a Taylor series expansion or something like that if you really want to get nuts.

Rationals of bigints doesn't work in general. First of all, you still have to do approximations if you take e.g. a square root or a sine. And even if you have an algorithm that is closed under rational numbers (like the Simplex method for solving linear programs), the number of digits you need in your integers blows up unless you have a tiny problem.

My rough sketch is that inv(A) involves det(A) in the denominator, and the number of digits in det(A) is exponential in nature (n! if A is n-by-n).


Well, yeah, nothing works in general, which is why I prevaricated amongst several ideas -- but the key point is that if something has to be a default, floats work worst of all if accuracy or precision are your goals (but again, make sense for performance and performance alone -- but we really should be making accuracy the default and performance something you have to opt in to these days)

(Rational of bigints also has the advantage of being simple to understand, but I'd happily endorse pretty much anything but floats, even bigdecimals.)



This is a whole scientific field, called numerical analysis: https://en.wikipedia.org/wiki/Numerical_analysis


It’s not so much that floats are weird, but that they are not reals. Most of the problems happen due to confusion about this fact.

Reals are much weirder/more interesting than floats, imo.


the axiomatization of floating point arithmetic is much larger than the axiomatization of real arithmetic, so it is reasonable to describe the former as more complicated.


Complexity of the definition isn’t particularly interesting, the properties of the object being defined can be.

The reals are stranger than most people realize.


Well, not to be too glib, but I don't think they're too much stranger than I realize, and I think floats are much harder to reason about.


Well I didn’t count anyone who has taken a proper intro analysis course , but that’s safely within “most”.

Floats are a bit of a pain in the ass but there is nothing particularly strange about them.

The reals contain a lot of interesting structure, and things that are still being studied and understood.


What's a strange thing about reals?


to start, there are an uncountable infinity of them so 100% are not representable. Also when notated typically, there are 2 representations of lots of numbers.


"Why should I believe in a real number if I can’t calculate it, if I can’t prove what its bits are, and if I can’t even refer to it? And each of these things happens with probability one! The real line from 0 to 1 looks more and more like a Swiss cheese, more and more like a stunningly black high-mountain sky studded with pin-pricks of light."

...see chapter 5 of Meta Math! by Gregory Chaitin: https://arxiv.org/pdf/math/0404335.pdf


This is an interesting question to answer in a way that works regardless of mathematical background; let me try something a bit handwavy.

The way people are taught about the real numbers is typically a progression, you work with integers as a child, get your head around negative numbers etc., eventually you are shown "algebra" and equation solving, and you will be introduced to "roots" as solutions. You'll learn about e.g. sqrt(2) as the solution to 2=x^2 and probably have some discussion of irrational numbers then, rather than rationals, but it may miss most details. If you study in university at all you'll probably get some sort of lecture on countable vs. uncountable infinities and probably not look too closely unless it's a pretty mathematically oriented class.

Along the way you'll also be introduced to "special" numbers, pi and e at the minimum, usually motivated from somewhere else (e.g. we "found" pi due to geometry, e comes from logs, etc.).

So the picture you get is that you have all the "normal" every day numbers, then some more a little bit weird like sqrt(2), and a few special ones like "pi" and "e" that are useful.

Truth is, from the point of view of the reals, this is all backwards.

Numbers like 1,1/2, sqrt(2), etc. are called "algebraic" because you can find a polynomial equation with integer coefficients that has it as a solution (so x^2=2 means sqrt(2) is algebric, x=7 means 7 is, etc.). Anything non-algebric is called transcendental. Pi is transcendental, as is e, which means no matter how hard you try you can't find a polynomial of that type such that ax^n + bx^n-1 + ... = pi. It turns out to be hard to prove that something is transcendental, we have only proven a double handful or so.

So the weird part is transcendental numbers are "almost all" of them. In a technical sense: the set of transcendental numbers has measure 1 in the field (of real numbers,o r complex for that matter).

One way to think about it is if I gave you a bucket containing the real numbers between [0,100] and you randomly picked numbers out for the rest of your life, you would expect to never pick an algebraic number.

So all the numbers all humans use for day to day math and accounting, as well as most scientific math, etc. (obvious pi, e and friends contribute) ... all those number come from a subset of the reals so small as to be negligible in the grand scheme of things. To a first approximation, the reals consist of numbers nobody ever uses :)


yeah i have never really seen someone breakdown the axioms of floating point mathematics, like how do you define addition in a system that is not closed under "normal" addition.


mathematicians have spent a lot of time investigating objects that lack parts of the normal algebraic structure. In a way it's fields like the real numbers that are the unusual (rather than "normal").


Then what's the best way to handle these cases? Are there any set of rules we should use while implementing the mathematical equations dealing with limits in floating points.


Floating point numbers have the highest precision when they're near 0. So if you know a number is going to be near 1, like exp(x) when x is small, then it's better to calculate the offset from 1 rather than the number itself. Programming languages make 'expm1' available for this purpose, which computes exp(x)-1.


Basically

- don't subtract almost equal numbers

- don't add numbers of vastly different magnitudes



I love that that page is nearly the same as it when I saw it ~20 years ago. Things things are still the right way.


Other commenters have pointed to tricks that are undoubtedly useful, but the real answer imo is to understand where rounding error will be introduced in your problem, and where it can become catastrophic so that you can use careful alternatives in those cases.


I'll second this. Implementing tricks can improve some use cases but make others worse.


For a summation, add the smaller numbers first. Smaller as in 0.0000000000053, not like -5172365126.


For summation just go with Kahan summation or some improved variant of it https://en.wikipedia.org/wiki/Kahan_summation_algorithm


And if you can't remember that, you can use a pairwise summation like (((a+b)+(c+d))+((e+f)+(g+h)))+... which gives a decent error reduction in return.


how about small as in ²³⁷


A number of years back I was programming in ADA, a language I generally liked, and had a case where the compiled version of a floating point constant differed from the value generated when the same string was entered at run time. Evidently, the compiler and the run time used different routines to convert strings to floats.

The difference wasn't much, but it was there.


Nothing weird about it. It should be obvious that subtracting two floats that are very close to each other results in a loss of numerical precision:

1.000000003456e0 - 1.000000002345e0 = 0.000000001111e0 = 1.111numericalnoise e-9

It's exactly the same issue here. `math.exp(1e-15)` is `1.000000000000001`. If you subtract 1, you get 1 significant digit and numerical noise.


The weird thing here is that the only change is making the denominator ln(exp(x)) instead of x. Catastrophic cancellation is still happening in the numerator (it’s still exp(x)-1), and the denominator winds up being some really tiny number.

It’s just that, due to the quirks of the floating point calculations involved, the numerator and denominator wind up being nearly the same noisy approximation to x, whereas in the original calculation that wasn’t true.


That's not what the post says if I understand correctly - the post explains why in certain situations the "noise" disappears, and in other cases it doesn't.

See comparison between f and g functions.


I see! yes, the magic is you can cancel the noise by repeating it twice:

``` In [1]: math.exp(1e-15)-1 Out[1]: 1.1102230246251565e-15

In [2]: math.log(math.exp(1e-15)) Out[2]: 1.110223024625156e-15 ```

risky business though, I imagine it's implementation dependent


Agreed, this is risky business. The intermediate values still need to fit into floats and are still losing precision.

From the article:

    g(1e-9)  returns 1.0000000005,
    g(1e-12) returns 1.0000000000005,
    g(1e-15) returns 1.0000000000000004
but... g(1e-16) throws ZeroDivisionError: float division by zero.


It’s not (or shouldn’t be), it’s simply a result of math, as the article explains in length.


Many libm implementations don't have an accurate `log` or `exp` routine, so there does exist a risk. (Of course, it's also true that many of them also special-case `log(x) ~= x - 1` and `exp(x) ~= x + 1` for small enough `x`.)


The math hinges on that there is the same error for exp(x) at both places. So as long as exp(x) is deterministic then this should be alright.


I don't know of any libm that have log or exp sufficiently inaccurate for this to break. Do you?


Indeed, any well-known enough libm wouldn't do that. But I can imagine some less-known libms with wild error bounds.


Kind of OT: Why do so many people say “phenomena” when they want to use the singular case?


I decided years ago that the next time I hear someone suggesting we use floats / doubles to represent money amounts, I am going to punch them in the face.


This gets repeated a lot, and I don't disagree. But I find odd that doubles would be so unsuitable for monetary (and other similar) arithmetic; in principle you have 15 significant digits which should be more than enough, and precise control how the results are rounded. And all the basic arithmetic should return correctly rounded values to the last ULP. So it is weird that those tools are still not good enough and it is also difficult (at least for me) to fully characterize why exactly they are not suitable.

Part of me wonders if this (justified!) fear of floats is in part because a history of bad implementations (looking at x87) and difficulties in controlling floating-point env (looking at libs randomly poking fpenv), and less due floats intrinsically being bad.


The problems of floats are not the number of significant digits, it’s the imprecision of the representation (floats don’t just cut off at the end), that these imprecisions compound, and that float operations are not commutative. At the end of the day, 0.1 + 0.2 != 0.3 is a fact you have to live with.

X87 does not really factor into it, if anything in your view of the world x87 floats would be better since x86-EP is 80 bits. Except its involvement now leads to intermediate-precision-driven inconsistencies.

Control (which you mention) and consistency are the issues, as well as the interaction between that and comparators.

Guarding against floating-point issues or considering precision errors is neither part of school-learned arithmetics, nor of most CS programs, to almost every developer just flings around floats like they’re genuine reals, and when problems start surfacing floats are so threaded through without consideration it becomes very hard to untangle, which leads to local patch jobs which make the problem worse.


> At the end of the day, 0.1 + 0.2 != 0.3 is a fact you have to live with

That is the one example that floats around a lot, but its also imho not very good one. '0.1', '0.2', and '0.3' are not floating point values, so the premise is flawed.

       0.1000000000000000055511151231257827021181583404541015625
     + 0.200000000000000011102230246251565404236316680908203125
    != 0.299999999999999988897769753748434595763683319091796875
is far less surprising.

Also `round(0.1 + 0.2, 15) == 0.3` is true (in python), so being conscious about rounding things appropriately goes long way. And I imagine that correct rounding is relevant in monetary calculations no matter what sort of numbers you are using, so while while floats the situation might be more pronounced I don't see it being such fundamental problem.


> That is the one example that floats around a lot, but its also imho not very good one. '0.1', '0.2', and '0.3' are not floating point values, so the premise is flawed.

No, it’s the entire point. None of the values we deal with day to day are binary floating point, and certainly not currencies. So this sort of representational approximations is a major and constant issue of using floats.

> Also `round(0.1 + 0.2, 15) == 0.3` is true (in python), so being conscious about rounding things appropriately goes long way.

See above, rounding off and collecting error after every arithmetic operation is not the expected norm and what developers are taught.

> And I imagine that correct rounding is relevant in monetary calculations no matter what sort of numbers you are using

While that is true, it does not normally need to be done after every arithmetic operation, especially not after additions of already rounded off values.


> See above, rounding off and collecting error after every arithmetic operation is not the expected norm and what developers are taught.

Question is, is that a problem with developers or floats? :)

Ecosystem and tooling might help here, iirc that is something Kahan himself has been complaining about a lot. For example hypothetically you could have something like FP contexts or specialized high-level types where you can easily express how many digits are you expecting and the runtime/compiler would manage rounding etc so that you'd get more often correct results. Tbh that's just top of my head, and I didn't think too much about it.

But I think the question remains, how much of the problems are actually intrinsic to FP, and if you did actually cross t's and dot i's then would there still be some intractable problems in using FP with money? So is the problem "just" that FP can easily be misused, or that its impossible to use correctly?

I want to emphasize that I do not recommend anyone go using FP for money now. I'm just curious because its something I don't fully understand and well, HN has smart people that can hopefully help me there.


Question is, is that a problem with developers or floats?

Of course with floats. Requirements come from decimal-expecting people and developers have to convert requirements into an algorithm. If there’s a fundamental semantic or at least syntactic obstacle, it’s not a problem with developers.

Iow, if a language/system only has floats as “numbers”, it sucks for most business-level calculations.


> None of the values we deal with day to day

'0' is twice-defined by IEEE 754, together with ±inf and nan. They aren't very interesting values in terms of monetary transactions, though.


> and that float operations are not commutative. At the end of the day, 0.1 + 0.2 != 0.3 is a fact you have to live with.

Addition and multiplication of floating point numbers is commutative (sole theoretical exception: there principally exist multiple representations of NaNs, even though in practice processors do not make use of this freedom, i.e. in practice these floating point operations are completely commutative).


Yes, but not associative. I guess this is what GP was trying to say (and which actually can cause issues).


> At the end of the day, 0.1 + 0.2 != 0.3 is a fact you have to live with.

But 0.1 + 0.2 == 0.3 — if you round both sides to 2 (or however many you need) decimal points before comparing them.


And that’s a problem, now you have to round-off defensively which complexifies the code and you have to decide how often and how defensively you round things.

Plus you’ve got the added fun that fp rounding routines don’t necessarily take a rounding mode.


> Guarding against floating-point issues or considering precision errors is neither part of school-learned arithmetics, nor of most CS programs

What the fuck? I didn't even take a CS major and we got the imprecision of floats bashed into our heads. What are CS majors being taught?


Floats vs money is like `ascii` vs `utf-8`.

You can get a bunch of bits, but how you handle it make a ton of differences

Plus, so little languages care about us living in the business world. You can count with fingers in a single hand the languages that are meant for business app.

All the others are bad languages (and libraries, frameworks, data stores), ill-suited for the job. And so, all of them need to reimplement (badly, ad-hoc, bug-ridden) version of money and friends.

And weirdly, no hardware support at all, so our friends on C say: "Look, no important to handle money, bye" and nobody else does it either.


The problem is intrinsic to floats, not just due to bad implementations.

Some values like 0.3 simply cannot be represented precisely with floats.


The rule as stated is way too strong, for example option prices are money amounts but floats are unavoidable in calculating them.

For a less exotic example, consider that Excel, widely used by actual accountants, uses floats throughout to represent numbers.


lol…

If you’d ever had to bill millions of customers for precise amounts of electricity and gas at precise prices… you would hate floats and you’d hate that any idiot will act as though excel is gospel truth.


That's also the case with integer or fixed-point calculations. You generally don't care about the accuracy of specific calculation (unless it results in edge cases like a catastrophic cancelation in floats), but you do make sure that the resulting invoice is free from any sort of numeric artifacts like the sum of ratios equals to 99.9% or 100.1%.


> for example option prices are money amounts but floats are unavoidable in calculating them.

How so? Can't you just use long integers with cents or 1/1000th of a cent?


Why would you? For most forward-looking calculations, the uncertainty of the future completely swamps any cent-rounding.

Even for plain-vanilla bond price calculations, floats are the right tool for the job. Say you have a bond that pays $5 every year for 10 years, then $100. What's that worth today?

Well, you have a forecast yield curve of interest rates. Say it's quoted as continuously compounded rates, so then you get something like price = sum_{t=1..10}($5*exp(-r(t)*t)) + $100*exp(-r(10)*10).

But wait, say you actually have 1000 different potential paths of interest rates, and you want to average over all of them.

Oh, and there's a 1% chance of default every year.

Oh, and actually these are mortgages, so there's a path-dependent chance of them refinancing every year, if the rates get low enough.

And then there's an overall economic forecast, so if you have a bunch of mortgages, there's a bigger chance they'll all default at the same time.

And so on. Rounding the cents isn't really worth the worry, once you're putting noisy forecasts through `exp` (or worse special functions).

This applies for vanilla bond valuation, any option, any future. More so if you want risk measures (what if rates go up 0.10%? volatility increases?), and so on.

Floats work just fine for this.


This is true in complex scenarios but not true in other finances scenarios. For example, there is a reason why any electronic exchange will use integers with implied decimal precision as the wire format and will continue to use such representations before and after encoding/decoding. We do not need to do hugely complex operations, it is mostly simple comparisons and some simple maths operations. We absolutely need exact precision and speed and it is difficult to get that when using doubles.

In parts of the stack where things are more complex and outside of the critical path, then yes, you use floating point.

Also, it isn't just that rounding the cents isn't worth it, it is that if you work with implied decimal integers with an implied 2DP, then you're going to end up with massively inaccurate results after a few operations.


I heard that countless times, and I understand the reason (mainly: floats are binary, money amounts are decimal), but then, how do you split a $10 bill between 3 people.

$3.33 for each is not good, because in the end you have to pay $10, not $9.99. You can use a double, which is more precise, but in a less predictable way. You can use factions, which is exact, but it may become unmanageable after some time. Or you can have one of the three pay $3.34, but which one? I guess there are rules for that, probably a lot more complicated than "use an integer number of cents".


Accountants avoid academic penny drama.

If you have a few grown-up adult parties, e.g. cofounders, partners, then split like [(n-1) x round(total/n), whats_left]. The last one is how sql select sees it.

If you have potentially penny-hysterical kinds (taxes, anonymous group customers), round in their favor and throw pennies into your own expenses.

If n ~~ total, e.g. $100.00 over 700 people, don’t do that, it’s bad accounting.

I worked with finance and accounting half my life. They just don’t fall into these philosophical dilemmas.


We use long double to present financial money amounts with a little safety on top of it before it’s consumed by whatever JavaScript (Typescript really) frontend it heads to. Works fine. Outside of the need for speed it’s one of the few areas we use c in our backend services.

We don’t store the data in floats or anything resembling it, however.


Are you using also doubles for calculations or just presentation? Either way doesnt pass the smell test for me.


It depends we sometimes do since quadruple precision with checks tends to be safe, but for the most parts we don’t as most things are basically transactions unless you need to display something.


Is it fraud to willingly/knowingly use floats for money?


Sometimes (for example in models of finance or insurance markets) there do exist good reasons to use floating-point numbers for money.


Please stop repeating this nonsense. It's purely based on ignorance and only pushes more people into it.

This is universally repeated by people that have not written any modern financial software and who don't understand floats.

If all you do is add US dollars and pennies, then maybe you can get by with integers. Once you do anything else in modern finance integers puke completely. There's a reason scientific computing doesn't use integers, but prefers floats/double - they are much easier to use to get the best answers per compute.

For example, if you need to do anything with interest, which is fundamental, you'll soon find doubles are vastly better than bitsize equivalent integers, no matter what scaling/fixed-point/other tricks you employ. Add in currency differences (Yen to USD is a large multiplier, etc), any longer range calculations (50-100 year loans or flows), aggregation of varying items, derivatives, and on and on. Telling anyone in the field you're going to use integers will get you laughed at - the problem is not floats, the problem is the programmer hasn't taken a single class on numerical analysis and has not enough skill to do financial software.

For example, one of the simplest things one needs to do is compute compound interest, say computing mortgage tables, with say principal P (left), annual (or weekly, or daily..) rate R, for N years, periodic payment m, and you want to compute payments and schedule.

In reals, this is simple: each cycle you do something like

    r = (1+R/12)
    P -= m
    P = (1+r)*P
Then you write out values you need.

Trying to write this software with integers, no matter what scaling, fixed-point, shifting, and other tricks you employ, is going to be vastly more complex and error prone than simply using doubles. If you don't think so, pick a bit budget, say 32, or 64, for you base number type, and show me your code. Then I'll show you the naive one with doubles vastly outperforms your code.

This continues through all of modern finance.

So stop repeating this ignorance that one should use integers for money - that is purely a result of being ignorant about how to write robust numerical code, and only pushes people down a far worse rabbit hole when they hit issues with ints and try to patch them one at a time via ad-hoc hacks.

There's a reason scientists use floating point, not ints, to do real numerical work.


violence typically does not solve issues of rounding


I mean, it depends on what you're doing.


Assuming you want your money to actually add up correctly, then floats are always the wrong choice.

If you’re not interested in accurate accounting, the. Sure floats are fine, but when you’re working with money, accurate accounting tends to be the expectation.


Addition is one of those things that does work pretty predictably and error free with floats. The problem with 0.30000000000000004 etc is usually that the things you are adding are not what you expect (float(0.1) != 0.1), i.e. the difficulty usually is string<->float conversions rather than float arithmetic itself.


Even that error is a problem. Doing a single addition might be fine, but doing thousands or millions of additions, and those tiny errors add up to something appreciable.

If you’re doing money operations at a scale where the computational difference between using true number type with infinite precision vs floats is worth thinking about, then you’re also in the territory where tiny floating point errors really stack up into a problem.

As a consequence, there’s very few scenarios where using floats for money actually makes sense. Either your computation is so simple there’s no real benefit to using floats rather than infinite precision number types, or your computation is so large, that floating point errors sum to meaningful amounts.

The only scenario I can think of where floats might be the right choice is on tiny embedded system where computational power and memory is very limited, and working with infinite precision types is a real problem. But in that scenario, you probably don’t need to be educated on the issues with floats. But if you’re the kind of person who unsure if you should use floats for money, then the answer is almost certainly a resounding “No. Do not use floats for money values”.


I would argue the more correct answer is if you don't know what maths to use for money (including e.g. any legal rules about how you do it), you shouldn't be doing maths on money.


That strikes me as an unnecessarily elitist answer that holds us back.

I'm hardly a mathematician, or even college educated, but AFAICT this all boils down to the fact that I can type a number into the computer, and it can't represent it exactly internally, so it misrepresents it, silently.

Where I come from, that's called "a bug", regardless of cause.

Non-mathematicians (and even non-accountants and non-financiers and the like) have to math money all the time. They do it in daily life. Some of them even write programs to do it, because they know enough about computers to do that.

They don't expect that their expensive smartphone is going to screw up the calculation due to some esoteric representational reason that they need four or eight years of college to be aware of, let alone to understand or explain.

And they shouldn't need to!

I would argue that if computers can't do the job correctly in every case without the user jumping through hoops, then we should be continuing to develop methods to make it better.


That's not what I'm talking about. There are legal frameworks about how to compute something (which I only vaguely know exist, so I know enough that I personally should not be implementing any of this without significant outside input and expertise), and if you do it wrong we get deaths. This has been a major issue in at least two countries (the UK with the Post Office and Australia with Robodebt). I'd argue its more elitist thinking "I can program numbers into a computer so I can do anything involving numbers", rather than referring to non-software-developer experts.

On needing years in college to learn this: the "weirdness" of floats should be covered (it was for me) in high school science, and is also drummed in in first year science labs. Any time you actually need to work with numbers (rather than saying checking whether a group is abelian), you're dealing with how to compute and these rules predate electronics.


That’s because you haven’t looked at what floats are designed for. They were created for the purpose of high performance scientific computing, so they quite deliberately, and explicitly, trade off perfect accuracy for much greater performance.

Computers are perfectly good at providing infinite precision numbers and doing arithmetic on them, and most languages expose explicit number types for that purpose. But there’s a reason why floats are called floats, and not numbers. It’s because floats aren’t numbers (at least not base 10 numbers)! They’re a pretty accurate, but highly performant, approximation of base 10 numbers.


string->float conversion is conversion from accounting realm to computer abstraction. If the conversion is not accurate and the results can not be converted back into accounting realm without artificial artifacts the use of such computation is problematic.


The conversion is accurate up to 15 digits. Round-tripping through floats should not cause artifacts if you know what you are doing and remain within that 15 digit bound all time, and I believe that same applies for all basic arithmetic operations. So the question is, what are actually the cases where floats introduce "artifacts"


If you're the one settling the books at your bank, sure. If you just need to display the price of something, a float+money formatter is mostly fine.


Mostly. You might be surprised how much of a headache it creates when things are off by a little in ways customers don't understand even when you're not settling transactions like that. Customers get confused when they get receipts or invoices where things don't add up, even when it saves them a penny! I've seen rounding down make people mad because it didn't add up when discounts were applied even when they were the benefit of the extra cent. Obviously rounding up makes people mad out of principle because it's adding cost that wasn't agreed upon.


I have left another comment. A trillion times this. Floats anywhere give two outcomes: customer accuses you of salami slicing from them or you give away millions.


Every so often, someone shares a picture where NaN got printed onto a price sticker.


I’ve had weird float bugs that I didn’t have time for and fixed by operating on them as strings. Nowadays I’ve found pretty good libraries in common languages designed to handle the weird edges.


What libraries can you recommend?


For JS for example https://github.com/MikeMcl/decimal.js

The trick is 90% just realizing these types of libraries exist so if and when you run into these requirements you’re not reinventing the wheel on some of these issues.


Hotter take: Floats are bad and shouldn't be used.

It's clear that even experienced programmers do NOT understand how they work, how they don't work, and the pile of edge cases.

Use something saner, like binary coded decimal, larger types, and use floats as a last and very approximate resort.


No, it's clear people don't know how to compute (because they're not taught how to). The vast majority of issues that people encounter with floating point aren't floating point issues, it's "how do I perform this calculation without infinite space and time", and these issues occur whether or not you do it by hand, or use a machine to do it for you. This issue becomes really obvious when you teach early-undergrad science labs, because people conflate the number of decimal points used with the number of significant figures used. This can be seen between the definition of the speed of light (which has in effect infinite precision because we defined what it is) as 299792458 m/s, vs the gravitational constant with 0.000 000 000 066 74 m^3 /kg /s^2 which is generally regarded as the most inaccurate and hardest to measure physical constant (usually G * M can be measured more accurately, so you'd rather use that).


As if even experienced programmers understand anything about numerics, regardless of number format used. And all numbers have pitfalls, reals are not representable on computers, not all reals are even computable. Floats at least are well-documented and predictable, and there is 30+ years of literature on how to do error analysis for them.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: