Numerical strings comparison

  1. Inevitable casting
  2. Scientific notation
  3. Conclusion
  4. Comments

Why PHP sometimes consider two apparently different strings equal? '8000058E-1345823534' == '8000079E-1468962218' is true for example.

This kind of questions often come up on PHP forums, so I decided to explain it in detail.

There are two reasons for this

Inevitable casting

First, when comparing strings, and a string looks like a number, it is converted to a number for a comparison.

But why PHP does that? The reason is simple, though it took me quite a time to realize. PHP is a loosely typed language, allowing intermixing variables of different type in a single expression. In HTTP, every data value is a string. So all PHP input variables are. Old database drivers also returned all data types as strings. So you can tell that if a user enters 100 in the form field and 20 is the result from database, we cannot compare these values right away (to tell which is bigger): given both are strings, the result will be quite discouraging.

Therefore, when we have to deal with numbers, inevitably PHP must "detect" numbers in order to compare them properly, because, although to test for the equality it is not that important, but to test whether a number is less than or greater than another, it is critical to compare them as numbers, not as strings, because string "2" is greater than string "1000". So now you can tell that if a string value looks like a number, it is converted to a number for a comparison.

Given there is a single function to handle all comparisons, such a magic is involved with "equal to" comparisons as well.

Scientific notation

Second, a string '8000079E-1468962218' looks like a floating point number in the scientific notation which is understood by PHP. And as it was explained above, PHP tries to convert it to a number. So for example "1000" == "1e3" is true.

But '8000058E-1345823534' and '8000079E-1468962218' look entirely different even for numbers? Why they considered equal anyway?

Because these numbers are so small, 1468962218 zeroes after the decimal delimiter! Given floating point operations are inherently imprecise, the resulting value is effectively a zero.

So you can tell that there is a more confusing statement, '8000058E-1345823534' == '0' which is also true.

In the end, '8000058E-1345823534' == '8000079E-1468962218' turns out to be 0 == 0 which perfectly explains the result.

Conclusion

This is why you are encouraged to use the strict comparison operator, === instead of ==, because when the former is used, no such magic is ever involved, values get compared as is.


Related articles: