PHP, how much abstraction for the people?



Joy: What is going on?

Sadness: We're abstracting! There are four stages. This is the first. Non-objective fragmentation!

Bing Bong: Alright, do not panic. What is important is that we all stay together. [suddenly his abstract arm falls off]

Joy: Oh! [Sadness and Joy start falling apart too]

Sadness: We're in the second stage. We're deconstructing! [as Bing Bong falls to pieces]

Bing Bong: I can't feel my legs! [picks one leg up] Oh, there they are.

© Cartoon Inside Out



Everyone loves to write beautiful code. To abstraction, lambda, SOLID, DRY, DI, etc. etc. In this article, I want to explore how much it costs in terms of performance and why.



To do this, we take a simple task, divorced from reality, and we will gradually bring beauty into it, measuring performance and looking under the hood.



Disclaimer: This article should in no way be construed as a call to write bad code. It is best if you tune in advance to say after reading “Cool! Now I know how it is inside. But, of course, I will not use it. ” :)



A task:



  1. Dan text file.
  2. We break it into lines.
  3. Trim the spaces left and right
  4. Discard all blank lines.
  5. We replace all non-single spaces with single ones (“ABC” -> “ABC”).
  6. Lines with more than 10 words, according to the words, are flipped backwards (“An Bn Cn” -> “Cn Bn An”).
  7. We calculate how many times each row occurs.
  8. Print all the lines that occur more than N times.


As an input file, by tradition we take php-src / Zend / zend_vm_execute.h for ~ 70 thousand lines.



As a runtime, take PHP 7.3.6.

Let's look at the compiled opcodes here https://3v4l.org .



Measurements will be made as follows:



//     $start = microtime(true); ob_start(); for ($i = 0; $i < 10; $i++) { //    } ob_clean(); echo "Time: " . (microtime(true) - $start) / 10;
      
      





First approach, naive



Let's write a simple imperative code:



 $array = explode("\n", file_get_contents('/Users/rjhdby/CLionProjects/php-src/Zend/zend_vm_execute.h')); $cache = []; foreach ($array as $row) { if (empty($row)) continue; $words = preg_split("/\s+/", trim($row)); if (count($words) > 10) { $words = array_reverse($words); } $row = implode(" ", $words); if (isset($cache[$row])) { $cache[$row]++; } else { $cache[$row] = 1; } } foreach ($cache as $key => $value) { if ($value > 1000) { echo "$key : $value" . PHP_EOL; } }
      
      





Runtime ~ 0.148 s.



Everything is simple and there’s nothing to talk about.



The second approach, procedural



We refactor our code and take out the elementary actions in the function.

We will try to adhere to the principle of sole responsibility.



Footcloth under the spoiler.
 function getContentFromFile(string $fileName): array { return explode("\n", file_get_contents($fileName)); } function reverseWordsIfNeeded(array &$input) { if (count($input) > 10) { $input = array_reverse($input); } } function prepareString(string $input): string { $words = preg_split("/\s+/", trim($input)); reverseWordsIfNeeded($words); return implode(" ", $words); } function printIfSuitable(array $input, int $threshold) { foreach ($input as $key => $value) { if ($value > $threshold) { echo "$key : $value" . PHP_EOL; } } } function addToCache(array &$cache, string $line) { if (isset($cache[$line])) { $cache[$line]++; } else { $cache[$line] = 1; } } function processContent(array $input): array { $cache = []; foreach ($input as $row) { if (empty($row)) continue; addToCache($cache, prepareString($row)); } return $cache; } printIfSuitable( processContent( getContentFromFile('/Users/rjhdby/CLionProjects/php-src/Zend/zend_vm_execute.h') ), 1000 );
      
      







Runtime ~ 0.275s ... WTF !? The difference is almost 2 times!



Let's see what a PHP function is from the point of view of a virtual machine.



The code:



 $a = 1; $b = 2; $c = $a + $b;
      
      





Compiles to:



 line #* EIO op fetch ext return operands ------------------------------------------------------------------------------------- 2 0 E > ASSIGN !0, 1 3 1 ASSIGN !1, 2 4 2 ADD ~5 !0, !1 3 ASSIGN !2, ~5
      
      





Let's put the addition into a function:



 function sum($a, $b){ return $a + $b; } $a = 1; $b = 1; $c = sum($a, $b);
      
      





Such code is compiled into two sets of opcodes: one for the root namespace and one for the function.



Root:



 line #* EIO op fetch ext return operands ------------------------------------------------------------------------------------- 2 0 E > ASSIGN !0, 1 3 1 ASSIGN !1, 1 5 2 NOP 9 3 INIT_FCALL 'sum' 4 SEND_VAR !0 5 SEND_VAR !1 6 DO_FCALL 0 $5 7 ASSIGN !2, $5
      
      





Function:



 line #* EIO op fetch ext return operands ------------------------------------------------------------------------------------- 5 0 E > RECV !0 1 RECV !1 6 2 ADD ~2 !0, !1 3 > RETURN ~2
      
      





Those. even if you just count by opcodes, then each function call adds 3 + 2N opcodes, where N is the number of arguments passed.



And if you dig a little deeper, then here we also switch the execution context.



A rough estimate of our refactored code gives such numbers (remember about 70,000 iterations).

The number of "additional" executed opcodes: ~ 17,000,000.

Number of context switches: ~ 280,000.



The third approach, classic



Especially without philosophizing, we wrap all these functions with a class.



Bed sheet under the spoiler
 class ProcessFile { private $content; private $cache = []; function __construct(string $fileName) { $this->content = explode("\n", file_get_contents($fileName)); } private function reverseWordsIfNeeded(array &$input) { if (count($input) > 10) { $input = array_reverse($input); } } private function prepareString(string $input): string { $words = preg_split("/\s+/", trim($input)); $this->reverseWordsIfNeeded($words); return implode(" ", $words); } function printIfSuitable(int $threshold) { foreach ($this->cache as $key => $value) { if ($value > $threshold) { echo "$key : $value" . PHP_EOL; } } } private function addToCache(string $line) { if (isset($this->cache[$line])) { $this->cache[$line]++; } else { $this->cache[$line] = 1; } } function processContent() { foreach ($this->content as $row) { if (empty($row)) continue; $this->addToCache( $this->prepareString($row)); } } } $processFile = new ProcessFile('/Users/rjhdby/CLionProjects/php-src/Zend/zend_vm_execute.h'); $processFile->processContent(); $processFile->printIfSuitable(1000);
      
      







Lead time: 0.297. It got worse. Not much, but noticeable. Is the creation of an object (10 times in our case) so expensive? Nuuu ... Not only that.



Let's see how a virtual machine works with a class.



 class Adder{ private $a; private $b; function __construct($a, $b) { $this->a = $a; $this->b = $b; } function sum(){ return $this->a + $this->b; } } $a = 1; $b = 1; $adder = new Adder($a, $b); $c = $adder->sum();
      
      





There will be three sets of opcodes, which is logical: the root and two methods.



Root:



 line #* EIO op fetch ext return operands --------------------------------------------------------------------------- 2 0 E > NOP 16 1 ASSIGN !0, 1 17 2 ASSIGN !1, 1 18 3 NEW $7 :15 4 SEND_VAR_EX !0 5 SEND_VAR_EX !1 6 DO_FCALL 0 7 ASSIGN !2, $7 19 8 INIT_METHOD_CALL !2, 'sum' 9 DO_FCALL 0 $10 10 ASSIGN !3, $10
      
      





Constructor:



 line #* EIO op fetch ext return operands --------------------------------------------------------------------------- 6 0 E > RECV !0 1 RECV !1 7 2 ASSIGN_OBJ 'a' 3 OP_DATA !0 8 4 ASSIGN_OBJ 'b' 5 OP_DATA !1 9 6 > RETURN null
      
      





Sum method:



 line #* EIO op fetch ext return operands --------------------------------------------------------------------------- 11 0 E > FETCH_OBJ_R ~0 'a' 1 FETCH_OBJ_R ~1 'b' 2 ADD ~2 ~0, ~1 3 > RETURN ~2
      
      





The new keyword is actually converted to a function call (lines 3-6).

It creates an instance of the class and calls the constructor with the passed parameters on it.



In the code of methods, we will be interested in working with class fields. Please note that if you assign one simple ASSIGN opcode with ordinary variables, then for class fields everything is somewhat different.



Assignment - 2 opcodes



  7 2 ASSIGN_OBJ 'a' 3 OP_DATA !0
      
      





Read - 1 opcode



  1 FETCH_OBJ_R ~1 'b'
      
      





Here you should know that ASSIGN_OBJ and FETCH_OBJ_R are much more complicated and, accordingly, more resource- intensive than a simple ASSIGN , which, roughly speaking, simply copies zval from one piece of memory to another.



Opcode The number of lines of the handler (C-code)
ASSIGN_OBJ 149
OP_DATA thirty
FETCH_OBJ_R 112
ASSIGN 26


It is clear that such a comparison is very far from correct, but still gives some idea. A little further I will make measurements.



Now let's see how expensive it is to create an instance of an object. Let's measure on one million iterations:



 class ValueObject{ private $a; function __construct($a) { $this->a = $a; } } $start = microtime(true); for($i = 0; $i < 1000000; $i++){ // $a = $i; // $a = new ValueObject($i); } echo "Time: " . (microtime(true) - $start);
      
      





Variable assignment: 0.092.

Instance Object: 0.889.



Something like that. Not completely free, especially if many times.



Well, so as not to get up twice, let's measure the difference between working with properties and local variables. To do this, change our code this way:



 class ValueObject{ private $b; function try($a) { //    // $this->b = $a; // $c = $this->b; //    // $b = $a; // $c = $b; return $c; } } $a = new ValueObject(); $start = microtime(true); for($i = 0; $i < 1000000; $i++){ $b = $a->try($i); } echo "Simple. Time: " . (microtime(true) - $start);
      
      







Exchange through assignment: 0.830.

Exchange through property: 0.862.



Just a little, but longer. Just the same order of difference that you got after wrapping functions in a class.



Banal conclusions



  1. The next time you want to instantiate a million objects, think about whether you really need it. Maybe just an array, huh?
  2. Writing a spaghetti code for the sake of saving one millisecond - well, that. The exhaust is cheap, and colleagues can beat them later.
  3. But for the sake of saving 500 milliseconds, maybe sometimes it makes sense. The main thing is not to go too far and remember that these 500 milliseconds are likely to be saved only by a small section of very hot code, and not to turn the entire project into a void of sorrow.


PS About lambdas next time. It is interesting there. :)



All Articles