[Not] Floating Point Math

Floating point math is something I always took for granted in software development, but I’ve recently been exposed to some of its nuances and how to better calculate values in some cases.

Crash course, floating point is effectively just math with decimal places. Different languages use different terminology for the specific data types (“float” and “double” for double precision are common), but they’re all pretty much the same in concept. In decimal (base 10), it’s really not a big deal. Positive exponents go left or more significant from the decimal point, while negative exponents go right or less significant. You multiply or divide by 10, you go up or down in significance. This is all pretty elementary for most of us.

The problem is that computers don’t operate in decimal, they use binary (base 2), zeroes and ones. So numbers with digits after the decimal point, fractions of a whole number, aren’t easily represented in binary. Instead, floating point data types use a combination of precision and accuracy to make a best guess. If you go to a very level of precision with a floating point value, you lose some of the accuracy. This is where the term “floating point” comes from. The computer can move that decimal point around depending on the particular number, and whether it needs more precision or more accuracy. This varies based on specific language or architecture, but anything that supports floating point is basically doing the same thing.

Quick little example:
0.1 + 0.1 + 0.1 = 0.3

Anybody with even a rudimentary understanding of mathematics can confirm that. In decimal, that’s very simple. However, with binary and floating point math, things are much more complicated. In fact, some languages probably would not evaluate that equation to be true, because 0.1 in binary isn’t EXACTLY 0.1, nor is 0.3 exactly 0.3 usually. Plus, it depends on how the computer decided to build that floating point variable, so the data just isn’t the same. It checks out mathematically, but computers have a hard time with that decimal place.

SO, you have the dilemma. Floating point math is funky. Typically, it’s safer to avoid it when possible. Occasionally, you can’t. Specifically, when measuring for manufacturing applications, you need to be accurate and precise. If we can’t know that the floating point number is going to be exactly what we’re expecting, how can we use it for equations with any certainty?

One answer that I discovered, the reason for this discussion, is using integers and multipliers. I encountered a system that was using an integer field to store what was effectively a decimal value, in conjunction with a multiplier to standardize it. Think scientific notation, but reversed in a sense. Imagine how Avogadro’s number is often displayed as 6.022 x 10^23. This system took decimal values and rendered them exclusively as whole numbers. An integer data type simply cannot hold a floating point, that’s how it’s designed. So a number that might be measured as 0.0053 would be stored as 53 with a multiplier of 10000.

At first, I thought this seemed absolutely wasteful. Why add extra calculations when you could just store the value as observed. However, a colleague pointed out how floating point numbers are imprecise, and noted that storing 2 integers removed any level of uncertainty. An integer, is an integer, is an integer. 53 is going to be 53, no matter how you calculate it. So any checks for that number can be precise, and know that idiosyncrasies in calculating the value aren’t going to affect whether the computer sees that as an equal number.

With the example from earlier, this system would evaluate like so:
1 + 1 + 1 = 3 (with a multiplier of 10)

Using integers, this math will always check out. The multiplier can be used to standardize with other fields, or when displaying to a user, but the underlying calculations will work as intended every time.

The main limitation to this design decision is knowing how specific you need a value to be. Fortunately, most systems will only be able to measure to a certain precision anyway, and often will not need anything more specific even if they had the capability of measuring it. As long as you pick a multiplier that allows a sufficient number of decimal places, anything data you lose beyond that is trivial and unnecessary. My one gripe with this solution is that it must use decimal places instead of significant figures, but that’s another rant for another day.

In addition to the certainty of knowing exactly what a value is, integer math is much easier for computers to execute than floating point math. There’s a reason why floating point calculations are often used in benchmark tests for processors. Floating point math is hard, so computers require more time and power to actually evaluate it. Even adding a second integer, in the form of a multiplier, calculations on these numbers are most likely going to still be significantly faster.

So, while it may seem like a trivial difference in the context of mathematics, this shifted mentality is worlds apart from the perspective of programming. Even though I knew the limitations of floating point math, I never would have considered using an integer pair with value and multiplier to circumvent those limitations. It just goes to show, one plus one is always two, as long as you aren’t doing anything with floating point calculations.

Comments