What's wrong with popular articles telling you that foo is faster than bar?
- The main problem
- Single vs. double quotes
- Logical inconsistencies
- The tests quality
- The real performance improvements
- Comments (3)
There are many articles, or even whole sites dedicated to running various tests comparing the performance of different syntax constructs, claiming that one is faster than another.
The main problem
Such tests are wrong on many levels, from the very statement of question to the implementation errors. But what is most important, these tests are pointless and harmful in the first place.
- pointless because they don't bear any practical value. Not a single real life project has ever been optimized using any suggestion from such a test. Simply because it is not a syntax that matters but the data manipulation.
- harmful because they make wildest superstitions to arise, and worse yet - make unsuspecting users to write a bad code under the impression they "optimize" it.
This should be enough to close the question. But even if we accept the game rules and pretend that such tests have any reason to run, the results will show nothing but tester's ignorance and lack of experience.
Single vs. double quotes
Take the notorious "single vs. double" case. Of course, neither is "faster". First, there is a thing called opcode cache, which stores the once parsed php script in the cache. And PHP code then stored in the form of opcode, where identical string literals are stored as identical entities, no matter which quotes have been used in the script. Which means there would be not even a theoretical difference in the execution time.
But even if we won't use the opcode cache (though we must, if our concern is the real performance improvement), we will only find out that the difference in the parsing code is so small (a few conditionals comparing single-byte characters, literally a few CPU instructions) that it would be totally undetectable. Which means that any results gotten by the tester will show only the problems in the testing environment. There is a very thorough article, Disproving the Single Quotes Performance Myth from a PHP Core contributor Nikita Popov that explains the matter in detail. Nevertheless, a new eager tester appears almost every month to reveal the fictional "difference".
Some tests are just illogical from the very statement of question's point of view. For example, a question "Are throws super expensive?" is essentially a "Is handling an error slower than not handling it?". But come on! Of course adding some essential feature to the code would make it "slower". But it doesn't mean that this feature shouldn't be added at all, under such a ridiculous pretext. Thinking this way, the fastest program is one that does absolutely nothing. A program should be useful and correct in the first place. And then, only in case it runs slow, it must be optimized. So if the question itself just makes no sense, why test the performance? Ironically, the tester even failed with this test's correct implementation, which will be shown in the next chapter.
Or there is another example, the test titled "is
$row[id] slower than
$row['id']?" is essentially a "Which code is faster, one that produces an error or one that doesn't?" (as using
id without quotes is an error of the level
E_NOTICE and will be deprecated in the future PHP versions) WTF? What's the point in measuring the erroneous code at all? An error should be fixed simple because it's an error, not because it makes a code to run slower. Ironically, the tester even failed with this test's correct implementation, which will be shown in the next chapter.
The tests quality
Again, even when running a completely useless test, one should make it consistent, by means of measuring comparable matters. But as a rule, such tests feature a half-assed effort which produces just weird and irrelevant results.
For example, our clueless tester claims they would measure the "using extensive amounts of
try catch blocks" but in the actual test they would measure not a
try catch but
throw, throwing the Exception in each iteration. But such a test would be just incorrrect, as in the real life errors do not occur on every execution.
Of course, we shouldn't run any performance tests using a non-release PHP version (alpha or beta) and shouldn't compare a regular feature to an experimental one. And if a tester claims to compare "json to xml parsing", they shouldn't use an experimental XML parsing function.
Some "tests" demonstrate that the tester just have no idea what are they talking about. There is an example in one of such recent articles mentioned above. The author tried to test whether a code that produces an error ("Use of undefined constant") is indeed slower than a code which syntax is correct (a string literal correctly defined), but failed to carry out even such a deliberately pointless test, comparing the performance of a quoted numeric literal to that of unquoted one. Of course the use of unquoted numeric literal (as opposite to that of string) is perfectly legitimate in PHP, so the author tested a complete different approach, getting the wrong results.
There are other things to take into a consideration, such as testing environment. There are PHP extensions such a s XDebug that can affect the results dramatically. Or there is the opcode cache which should be turned on in order to get the sensible results.
The way the test is performed is also matters. As whole php process dies on each request, it makes sense to test the whole lifetime cycle, from initiating a connection to the web-server to closing that connection. There are utilities, like Apache benchmark or Siege that let you to do so.
The real performance improvements
Okay, what's the conclusion one should draw from this article? That performance tests are pointless by definition? Of course not. But what is important, is the reason why we are running them. A test right out of the blue is a waste of time. There should be always a reason to run a test. And such a reason is called "profining". When your application starts to run slow, you must profile it, which stands for measuring different code parts' performance to find the slowest one. Once it's found, then we must figure out the cause. Most of time it's either an unnecessary big amount of data to process or some external process to access. For the first case the optimization would be to limit the amount of data and for the second we must cache the result locally.
For example, it doesn't matter whether an explicit loop or an internal PHP's syntax sugar array function is "faster", but the amount of data we are feeding to them. In case it is unreasonably big, we must cut it down, or move the processing elsewhere (to a database). It will gain us an enormous performance improvement, a real one. Whereas the difference between the methods we are calling a loop to process that data, is barely noticeable at all.
Only after performing such obligatory performance improvements, or in case there is no way to cut down the amount of data, we can continue to the actual performance tests. But it shouldn't be out of the blue again. To start a performance comparison between the explicit loop and a built-in function, we must be positively sure that the loop itself is the source of the problem, not any payload (spoiler: of course, it's the payload).
A recent example from my practice: there was a query using Doctrine Query Builder that had to accept several thousand parameters. The query itself runs pretty fast, but it takes some time for Doctrine to digest several thousands of parameters. So the query has been rewritten to raw SQL, with parameters sent directly to PDO's execute, which takes no time to process such an amount.
Does it mean that I should never use the Doctrine Query Builder? Of course not. It is perfect 99% of time and I am continue to use it for all queries. In's only an exceptional case like this makes one to fall back to less convenient but more performant method.
The query and the parameters were constructed in a loop. Had I a weird idea to attack the way the loop is called, I'd just lost a lot of time without any result. And it's the very essence of the performance improvement - to optimize a code that is actually slow in your particular case, not a code that has been slow long time ago in a galaxy far far away, or a code that someone just had a fancy to call slow based on some pointless measurements.