Joy: What is going on?
Sadness: We're abstracting! There are four stages. This is the first. Non-objective fragmentation!
Bing Bong: Alright, do not panic. What is important is that we all stay together. [suddenly his abstract arm falls off]
Joy: Oh! [Sadness and Joy start falling apart too]
Sadness: We're in the second stage. We're deconstructing! [as Bing Bong falls to pieces]
Bing Bong: I can't feel my legs! [picks one leg up] Oh, there they are.
© Cartoon Inside Out
Everyone loves to write beautiful code. To abstraction, lambda, SOLID, DRY, DI, etc. etc. In this article, I want to explore how much it costs in terms of performance and why.
To do this, we take a simple task, divorced from reality, and we will gradually bring beauty into it, measuring performance and looking under the hood.
Disclaimer: This article should in no way be construed as a call to write bad code. It is best if you tune in advance to say after reading “Cool! Now I know how it is inside. But, of course, I will not use it. ” :)
A task:
- Dan text file.
- We break it into lines.
- Trim the spaces left and right
- Discard all blank lines.
- We replace all non-single spaces with single ones (“ABC” -> “ABC”).
- Lines with more than 10 words, according to the words, are flipped backwards (“An Bn Cn” -> “Cn Bn An”).
- We calculate how many times each row occurs.
- Print all the lines that occur more than N times.
As an input file, by tradition we take php-src / Zend / zend_vm_execute.h for ~ 70 thousand lines.
As a runtime, take PHP 7.3.6.
Let's look at the compiled opcodes here
https://3v4l.org .
Measurements will be made as follows:
First approach, naive
Let's write a simple imperative code:
$array = explode("\n", file_get_contents('/Users/rjhdby/CLionProjects/php-src/Zend/zend_vm_execute.h')); $cache = []; foreach ($array as $row) { if (empty($row)) continue; $words = preg_split("/\s+/", trim($row)); if (count($words) > 10) { $words = array_reverse($words); } $row = implode(" ", $words); if (isset($cache[$row])) { $cache[$row]++; } else { $cache[$row] = 1; } } foreach ($cache as $key => $value) { if ($value > 1000) { echo "$key : $value" . PHP_EOL; } }
Runtime ~ 0.148 s.
Everything is simple and there’s nothing to talk about.
The second approach, procedural
We refactor our code and take out the elementary actions in the function.
We will try to adhere to the principle of sole responsibility.
Footcloth under the spoiler. function getContentFromFile(string $fileName): array { return explode("\n", file_get_contents($fileName)); } function reverseWordsIfNeeded(array &$input) { if (count($input) > 10) { $input = array_reverse($input); } } function prepareString(string $input): string { $words = preg_split("/\s+/", trim($input)); reverseWordsIfNeeded($words); return implode(" ", $words); } function printIfSuitable(array $input, int $threshold) { foreach ($input as $key => $value) { if ($value > $threshold) { echo "$key : $value" . PHP_EOL; } } } function addToCache(array &$cache, string $line) { if (isset($cache[$line])) { $cache[$line]++; } else { $cache[$line] = 1; } } function processContent(array $input): array { $cache = []; foreach ($input as $row) { if (empty($row)) continue; addToCache($cache, prepareString($row)); } return $cache; } printIfSuitable( processContent( getContentFromFile('/Users/rjhdby/CLionProjects/php-src/Zend/zend_vm_execute.h') ), 1000 );
Runtime ~ 0.275s ... WTF !? The difference is almost 2 times!
Let's see what a PHP function is from the point of view of a virtual machine.
The code:
$a = 1; $b = 2; $c = $a + $b;
Compiles to:
line
Let's put the addition into a function:
function sum($a, $b){ return $a + $b; } $a = 1; $b = 1; $c = sum($a, $b);
Such code is compiled into two sets of opcodes: one for the root namespace and one for the function.
Root:
line
Function:
line
Those. even if you just count by opcodes, then each function call adds 3 + 2N opcodes, where N is the number of arguments passed.
And if you dig a little deeper, then here we also switch the execution context.
A rough estimate of our refactored code gives such numbers (remember about 70,000 iterations).
The number of "additional" executed opcodes: ~ 17,000,000.
Number of context switches: ~ 280,000.
The third approach, classic
Especially without philosophizing, we wrap all these functions with a class.
Bed sheet under the spoiler class ProcessFile { private $content; private $cache = []; function __construct(string $fileName) { $this->content = explode("\n", file_get_contents($fileName)); } private function reverseWordsIfNeeded(array &$input) { if (count($input) > 10) { $input = array_reverse($input); } } private function prepareString(string $input): string { $words = preg_split("/\s+/", trim($input)); $this->reverseWordsIfNeeded($words); return implode(" ", $words); } function printIfSuitable(int $threshold) { foreach ($this->cache as $key => $value) { if ($value > $threshold) { echo "$key : $value" . PHP_EOL; } } } private function addToCache(string $line) { if (isset($this->cache[$line])) { $this->cache[$line]++; } else { $this->cache[$line] = 1; } } function processContent() { foreach ($this->content as $row) { if (empty($row)) continue; $this->addToCache( $this->prepareString($row)); } } } $processFile = new ProcessFile('/Users/rjhdby/CLionProjects/php-src/Zend/zend_vm_execute.h'); $processFile->processContent(); $processFile->printIfSuitable(1000);
Lead time: 0.297. It got worse. Not much, but noticeable. Is the creation of an object (10 times in our case) so expensive? Nuuu ... Not only that.
Let's see how a virtual machine works with a class.
class Adder{ private $a; private $b; function __construct($a, $b) { $this->a = $a; $this->b = $b; } function sum(){ return $this->a + $this->b; } } $a = 1; $b = 1; $adder = new Adder($a, $b); $c = $adder->sum();
There will be three sets of opcodes, which is logical: the root and two methods.
Root:
line
Constructor:
line
Sum method:
line
The
new keyword is actually converted to a function call (lines 3-6).
It creates an instance of the class and calls the constructor with the passed parameters on it.
In the code of methods, we will be interested in working with class fields. Please note that if you assign one simple
ASSIGN opcode with ordinary variables, then for class fields everything is somewhat different.
Assignment - 2 opcodes
7 2 ASSIGN_OBJ 'a' 3 OP_DATA !0
Read - 1 opcode
1 FETCH_OBJ_R ~1 'b'
Here you should know that
ASSIGN_OBJ and
FETCH_OBJ_R are much more complicated and, accordingly, more resource-
intensive than a simple
ASSIGN , which, roughly speaking, simply copies
zval from one piece of memory to another.
It is clear that such a comparison is very far from correct, but still gives some idea. A little further I will make measurements.
Now let's see how expensive it is to create an instance of an object. Let's measure on one million iterations:
class ValueObject{ private $a; function __construct($a) { $this->a = $a; } } $start = microtime(true); for($i = 0; $i < 1000000; $i++){
Variable assignment: 0.092.
Instance Object: 0.889.
Something like that. Not completely free, especially if many times.
Well, so as not to get up twice, let's measure the difference between working with properties and local variables. To do this, change our code this way:
class ValueObject{ private $b; function try($a) {
Exchange through assignment: 0.830.
Exchange through property: 0.862.
Just a little, but longer. Just the same order of difference that you got after wrapping functions in a class.
Banal conclusions
- The next time you want to instantiate a million objects, think about whether you really need it. Maybe just an array, huh?
- Writing a spaghetti code for the sake of saving one millisecond - well, that. The exhaust is cheap, and colleagues can beat them later.
- But for the sake of saving 500 milliseconds, maybe sometimes it makes sense. The main thing is not to go too far and remember that these 500 milliseconds are likely to be saved only by a small section of very hot code, and not to turn the entire project into a void of sorrow.
PS About lambdas next time. It is interesting there. :)