Wednesday, January 20, 2021

The Problem of Rounding Halves

While the vast majority of posts on this blog are about computer science topics, it was created as a technical blog, not specifically a programming blog.  As such, I may from time to time, write posts on other technical topics.  For example, the topic of this post is math theory.


I have a friend who questions the common practice, taught in public schools in most, if not all, Western countries, of rounding up halves.  For example, say you are rounding to the closest whole number, and you are presented with 0.5.  You round up.  Why?  Is it because 0.5 is closer to 1 than it is to 0?  No, because it isn't.  It's exactly equally distant from both.  The bias for rounding down is exactly equal to the bias for rounding up, so why do we round up on halves?  The answer is that someone decided it should be so.

In theory, math is based on pure, logical rules, but in the case of rounding, the boundary rules are purely arbitrary.  If you are on the boundary of equidistance between your rounding options, you always round up, not because it is a logical rule defined by math itself, but because someone decided it should be done that way, probably to avoid answering what turns out to be a pretty hard question.  But it doesn't really matter does it?  The direction we round on the boundary doesn't need to be logical does it?  Well, let's find out.

Say we have a data set with 10 samples, between 0 and 1.  We want to take an average of the data set, so we can use it to make a decision, and we are treating the samples as votes.  The samples were generated by rating like/dislike for the proposal between 0 and 10, and the results are stored as decimal fractions based on percentage scale (1 is 0.1, 2 is 0.2, etc...).  Because we are treating them like votes, we are rounding them to the nearest whole number.  Any ranking below 0.5 is treated as a vote opposed to the proposal, and any ranking 0.5 or higher is treated as being in favor of the proposal.  This makes sense right?  It's almost exactly how we vote in government elections.  The votes are taken as whole numbers, even though the actual voters are almost never 100% for or against any given candidate.  The vote has to fit into a discrete value.  Now, let's generate a sample set: 0.2, 0.5, 0.3, 0.5, 0.1, 0.5, 0.5, 0.7, 0.2, 0.5.  Reducing these to votes, we get 0, 1, 0, 1, 0, 1, 1, 1, 0, 1.  That's 6 votes for the proposal versus 4 votes against, so we would say the proposal is supported by a 60% majority.  But, the raw data actually shows a very different result.  Four responses are opposed to the proposal, one is in favor of it, and five are balanced.  The average of the raw data is 40%.  So wait, when we round the data, to turn the responses into vote, using the arbitrary round-up-halves rule, we get 60% for, but when we average the unrounded data we get 60% against?  (This is not contrived, I just came up with values where half are on the boundary and the other half are mostly against, and this exact juxtaposition just happened.)  The rounding rule actually flipped the results, so they are the opposite of the true outcome.

So, is there way to overcome this issue?  Is there a mathematically logical solution for rounding halves?  There is, but you aren't going to like it.  The answer is, half of the time round down, and the other half round up.  You might be tempted to say we should just throw out the halves, but when we do that, we get 4 to 1 against with rounding (80% against, rather than the canonical 60%).  Sure, rounding will never perfectly match the raw data, and at least this way makes the most democratic decision, but 20% is a huge deviation from the raw data.  It looks like an overwhelming majority supports the proposal, rather than a moderate majority.  On top of that, there are cases where throwing out the halves will make the wrong decision (for example, in decisions that are more than two ways).

We have a new problem now though: How do we decide which to round which way, and what happens when we have an odd number of halves?  We can't just alternate, because when we do have an odd number of halves, we run into the same problem.  If we always round the first one down, we are favoring rounding down (because on odds one more rounds down than up), and if we always round the first one up, we are favoring rounding up.  Sure, the problem only arises when there is an odd number of halves, and even then the vast majority (all but the final one) are rounding evenly, which significantly reduces the problem, but it doesn't solve it.

There is only one solution: Non-determinism.  Wait, that's just randomness, and randomness isn't logical is it?  Professional mathematicians may be unaware of this or just plain reject it, but at the quantum level, the universe runs on randomness, and thus if randomness isn't logical, the universe isn't logical, and math is an abstraction that doesn't even apply to the universe.  In other words, if math can describe the universe, then yes, randomness is a logical part of math.  The correct answer to the problem of rounding halves is that the direction of rounding should be random, with equal probability.  In education we could simulate this with die rolls, and on very simple rounding problems, students could be expected to provide all possible solutions.  In well crafted simulations, we already do this, often without even realizing it, by using integer data types or abstracting numerical properties in ways that eliminate the rounding problem.  In places where the results are critical, we would want to use true quantum randomness, when rounding halves.  And yeah, sometimes we would still get the wrong outcome, but this is the best math can do, when it comes to rounding halves, and in the long run, at least it will all average out.

No comments:

Post a Comment