Hello, Habr!
We often write and talk about PHP performance: how we deal with it in general, how we saved $ 1 million when switching to PHP 7.0, and also translate various materials on this topic. This is due to the fact that the audience of our products is growing, and scaling of the PHP backend with iron is very costly - we have 600 servers with PHP-FPM. Therefore, investing time in optimization is beneficial for us.
Before, we talked mainly about the usual and already established ways of working with productivity. But the PHP community is on the alert! JIT will appear in PHP 8, preload in PHP 7.4, and frameworks are developed outside of the core PHP development framework that assume that PHP will function as a daemon. It's time to experiment with something new and see what this can give us.
Since the release of PHP 8 is still a long way off, and asynchronous frameworks are poorly suited for our tasks (why - I will tell below), today we will focus on preload, which will appear in PHP 7.4, and the framework for demonizing PHP - RoadRunner.
This is the text version of my report with Badoo PHP Meetup # 3 . Video of all the speeches we have collected in this post .
PHP-FPM, Apache mod_php, and similar ways to run PHP scripts and process requests (which are run by the vast majority of sites and services; for simplicity, I will call them “classic” PHP) work on the basis of shared-nothing in the broad sense of the term:
- state is not rummaged between PHP workers;
- the state is not rummaged between various requests.
Consider this with an example of a simple script:
// $app = \App::init(); $storage = $app->getCitiesStorage(); // $name = $storage->getById($_COOKIE['city_id']); echo " : {$name}";
For each request, the script is executed from the first to the last line: despite the fact that the initialization, most likely, will not differ from the request to the request and it can potentially be performed once (saving resources), you still have to repeat it for each request. We cannot just take and save variables (for example,
$app
) between requests due to the peculiarities of how "classic" PHP works.
What would it look like if we went beyond the framework of "classic" PHP? For example, our script could run regardless of the request, initialize and have a query loop inside it, inside which he would wait for the next one, process it and repeat the loop without cleaning the environment (hereinafter I will call this solution “PHP as a daemon ").
// $app = \App::init(); $storage = $app->getCitiesStorage(); $cities = $storage->getAll(); // while ($req = getNextRequest()) { $name = $cities[$req->getCookie('city_id')]; echo " : {$name}"; }
We were able to not only get rid of the initialization repeated for each request, but also save the list of cities once to the
$cities
variable and use it from various queries without accessing anywhere except memory (this is the fastest way to get any data).
The performance of such a solution is potentially significantly higher than that of the "classic" PHP. But usually the increase in productivity is not given for free - you have to pay some price for it. Let's see what it can be in our case.
To do this, let's complicate our script a bit and instead of displaying the variable
$name
we will fill the array:
- $name = $cities[$req->getCookie('city_id')]; + $names[] = $cities[$req->getCookie('city_id')];
In the case of “classic” PHP, no problems will arise - at the end of the query, the
$name
variable will be destroyed and each subsequent request will work as expected. In the case of starting PHP as a daemon, each request will add another city to this variable, which will lead to uncontrolled growth of the array until the memory runs out on the machine.
In general, not only memory may end - some other errors may occur that will lead to the death of the process. With such problems, "classic" PHP handles automatically. In the case of starting PHP as a daemon, we need to somehow monitor this daemon, restart it if it crashes.
Errors of this type are unpleasant, but effective solutions exist for them. It is much worse if, due to an error, the script does not fall, but unpredictably changes the values ​​of some variables (for example, it clears the
$cities
array). In this case, all subsequent requests will work with incorrect data.
To summarize, it is easier to write code for “classic” PHP (PHP-FPM, Apache mod_php and the like) - it frees us from a number of problems and errors. But for this we pay with performance.
From the examples above, we see that in some parts of the code, PHP spends resources that could not have been spent (or wasted once) in processing each request of the "classic" one. These are the following areas:
- file connection (include, require, etc.);
- initialization (framework, libraries, DI container, etc.);
- request data from external storage (instead of storing in memory).
PHP has been around for many years and may even have become popular thanks to this model of work. During this time, many methods of varying degrees of success were developed to solve the described problem. I mentioned some of them in my previous article . Today we’ll dwell on two fairly new solutions for the community: preload and RoadRunner.
Preload
Of the three points listed above, preload is designed to deal with the first - overhead when connecting files. At first glance, this may seem strange and meaningless, because PHP already has OPcache, which was created just for this purpose. To understand the essence, let's profile real with the help of
perf
, over which OPcache is enabled, with hit rate equal to 100%.
Despite OPcache, we see that
persistent_compile_file
takes 5.84% of the query execution time.
In order to understand why this happens, we can look at the sources of zend_accel_load_script . It can be seen from them that, despite the presence of OPcache, with each call to
include/require
signatures of classes and functions are copied from the shared memory to the memory of the worker process, and various auxiliary work is done. And this work should be done for each request, since at the end of it the memory of the worker process is cleared.
This is compounded by the large number of include / require calls that we usually make in a single request. For example, Symfony 4 includes about 310 files before executing the first useful line of code. Sometimes this happens implicitly: to create an instance of class A, below, PHP will autoload all other classes (B, C, D, E, F, G). And especially in this regard, Composer’s dependencies that declare functions stand out: to ensure that these functions will be available during the execution of user code, Composer must always connect them regardless of use, since PHP does not have autoload functions and they cannot be loaded at the time of the call.
class A extends \B implements \C { use \D; const SOME_CONST = \E::E1; private static $someVar = \F::F1; private $anotherVar = \G::G1; }
How preload works
Preload has one single main setting, opcache.preload, into which the path to the PHP script is passed. This script will be executed once when starting PHP-FPM / Apache /, etc., and all signatures of classes, methods and functions that will be declared in this file will be available to all scripts that process requests from the first line of their execution (important Note: this does not apply to variables and global constants - their values ​​will be reset to zero after the end of the preload phase). You no longer have to make include / require calls and copy function / class signatures from the shared memory to the process memory: they are all declared immutable and due to this all processes can refer to the same memory location containing them.
Usually the classes and functions we need are in different files and it is inconvenient to combine them into one preload script. But this does not need to be done: since preload is a regular PHP script, we can just use include / require or opcache_compile_file () from the preload script for all the files we need. In addition, since all these files will be loaded once, PHP will be able to make additional optimizations that could not be done while we separately connected these files at the time of the query. PHP makes optimizations only within the framework of each separate file, but in the case of preload, for all code loaded in the preload phase.
Benchmarks preload
In order to demonstrate in practice the benefits of preload, I took one CPU-bound endpoint Badoo. Our backend is generally characterized by CPU-bound load. This fact is the answer to the question why we did not consider asynchronous frameworks: they do not give any advantage in the case of CPU-bound load and at the same time complicate the code even more (it needs to be written differently), as well as for working with a network, disk, etc. special asynchronous drivers are required.
In order to fully appreciate the benefits of preload, for the experiment I downloaded with it all the files that are necessary for the tested script during work, and loaded it with a semblance of a normal production load using wrk2 - a more advanced analogue of Apache Benchmark, but just as simple .
To try preload, you must first upgrade to PHP 7.4 (we now have PHP 7.2). I measured the performance of PHP 7.2, PHP 7.4 without preload and PHP 7.4 with preload. Here is the picture:
Thus, the transition from PHP 7.2 to PHP 7.4 gives + 10% to performance at our endpoint, and preload gives another 10% from above.
In the case of preload, the results will greatly depend on the number of connected files and the complexity of the executable logic: if many files are connected and the logic is simple, preload will give more than if there are few files and the logic is complicated.
Nuances of preload
That which increases productivity usually has a downside. Preload has a lot of nuances, which I will give below. They all need to be taken into account, but only one (first) can be fundamental.
Change - restart
Since all preload files are compiled only at startup, marked as immutable and not recompiled in the future, the only way to apply changes to these files is to restart (reload or restart) PHP-FPM / Apache /, etc.
In the case of reload, PHP tries to restart as accurately as possible: user requests will not be interrupted, but nevertheless, while the preload phase is in progress, all new requests will wait for it to complete. If there isn’t a lot of code in the preload, this may not cause problems, but if you try to download the entire application, it is fraught with a significant increase in the response time during restart.
Also, restarting (whether it is reload or restart) has an important feature - as a result of this action, OPcache is cleared. That is, all requests after it will work with a cold opcode cache, which can increase the response time even more.
Undefined characters
For preload to load a class, everything it depends on must be defined up to this point. For the class below, this means that all other classes (B, C, D, E, F, G), the
$someGlobalVar
variable, and the SOME_CONST constant must be available before compiling this class. Since the preload script is just regular PHP code, we can define an autoloader. In this case, everything connected with other classes will be loaded automatically by it. But this does not work with variables and constants: we ourselves must ensure that they are defined at the time this class is declared.
class A extends \B implements \C { use \D; const SOME_CONST = \E::E1; private static $someVar = \F::F1; private $anotherVar = \G::G1; private $varLink = $someGlobalVar; private $constLink = SOME_CONST; }
Fortunately, preload contains enough tools to understand if you get something out of the way or not. Firstly, these are warning messages with information about what failed to load and why:
PHP Warning: Can't preload class MyTestClass with unresolved initializer for constant RAND in /local/preload-internal.php on line 6 PHP Warning: Can't preload unlinked class MyTestClass: Unknown parent AnotherClass in /local/preload-internal.php on line 5
Secondly, preload adds a separate section to the result of the opcache_get_status () function, which shows what was successfully loaded in the preload phase:
Class field / constant optimization
As I wrote above, preload resolves the field values ​​/ constants of the class and saves them. This allows you to optimize the code: during the processing of the request, the data is ready and does not need to be derived from other data. But this can lead to non-obvious results, which the following example demonstrates:
const.php: <?php define('MYTESTCONST', mt_rand(1, 1000));
preload.php: <?php include 'const.php'; class MyTestClass { const RAND = MYTESTCONST; }
script.php: <?php include 'const.php'; echo MYTESTCONST, ', ', MyTestClass::RAND; // 32, 154
The result is a counterintuitive situation: it would seem that the constants should be equal, since one of them was assigned the value of the other, but in reality this is not so. This is due to the fact that global constants, in contrast to class constants / fields, are forcibly cleared after the preload phase ends, while class constants / fields are resolved and saved. This leads to the fact that during the execution of the query we have to redefine the global constant, as a result of which it can get a different value.
Cannot redeclare someFunc ()
In the case of classes, the situation is simple: usually we do not explicitly connect them, but use an autoloader. This means that if the class is defined in the preload phase, then during the request the autoloader simply will not execute and we will not try to connect this class a second time.
The situation is different with functions: we must connect them explicitly. This can lead to a situation where we will connect all the necessary files with functions in the preload script, and during the request we will try to do this again (a typical example is the Composer bootloader: it will always try to connect all the files with functions). In this case, we get an error: the function has already been defined and cannot be redefined.
There are several ways to solve this problem. In the case of Composer, for example, you can connect everything at all in the preload phase, and during requests do not connect anything at all that relates to Composer. Another solution is not to connect files with functions directly, but to do it through a proxy file with a check for function_exists (), as, for example, Guzzle HTTP does .
PHP 7.4 hasn't officially released yet (yet)
This nuance will become irrelevant after some time, but until the PHP version 7.4 has not yet officially been released and the PHP team in the release notes explicitly writes : "Please DO NOT use this version in production, it is an early test version." During our experiments with preload, we came across several bugs, fixed them ourselves and even sent something to the upstream. To avoid surprises, it is better to wait for the official release.
Roadrunner
RoadRunner is a daemon written in Go that, on the one hand, creates PHP workers and monitors them (starts / ends / restarts as necessary), and on the other hand, accepts requests and passes them to these workers. In this sense, its work is no different from the work of PHP-FPM (where there is also a master process that monitors workers). But there are still differences. The key is that RoadRunner does not reset the state of the script after the completion of the query.
Thus, if we recall our list of what resources are spent in the case of “classic” PHP, RoadRunner allows you to deal with all points (preload, as we recall, is only with the first):
- file connection (include, require, etc.);
- initialization (framework, libraries, DI container, etc.);
- request data from external storage (instead of storing in memory).
The Hello World RoadRunner example looks something like this:
$relay = new Spiral\Goridge\StreamRelay(STDIN, STDOUT); $psr7 = new Spiral\RoadRunner\PSR7Client(new Spiral\RoadRunner\Worker($relay)); while ($req = $psr7->acceptRequest()) { $resp = new \Zend\Diactoros\Response(); $resp->getBody()->write("hello world"); $psr7->respond($resp); }
We will try our current endpoint, which we tested with preload, to run on RoadRunner without modifications, load it and measure performance. No modifications - because otherwise the benchmark will not be completely honest.
Let's try to adapt the Hello World example for this.
Firstly, as I wrote above, we do not want the worker to crash in the event of an error. To do this, we need to wrap everything in a global try..catch. Secondly, since our script does not know anything about Zend Diactoros, for the answer we will need to convert its results. For this we use ob_-functions. Thirdly, our script does not know anything about the nature of the PSR-7 request. The solution is to populate the standard PHP environment from these entities. And fourthly, our script expects that the request will die and the entire state will be cleared. Therefore, with RoadRunner we will need to do this cleaning ourselves.
Thus, the initial Hello World version turns into something like this:
while ($req = $psr7->acceptRequest()) { try { $uri = $req->getUri(); $_COOKIE = $req->getCookieParams(); $_POST = $req->getParsedBody(); $_SERVER = [ 'REQUEST_METHOD' => $req->getMethod(), 'HTTP_HOST' => $uri->getHost(), 'DOCUMENT_URI' => $uri->getPath(), 'SERVER_NAME' => $uri->getHost(), 'QUERY_STRING' => $uri->getQuery(), // ... ]; ob_start(); // our logic here $output = ob_get_contents(); ob_clean(); $resp = new \Zend\Diactoros\Response(); $resp->getBody()->write($output, 200); $psr7->respond($resp); } catch (\Throwable $Throwable) { // some error handling logic here } \UDS\Event::flush(); \PinbaClient::sendAll(); \PinbaClient::flushAll(); \HTTP::clear(); \ViewFactory::clear(); \Logger::clearCaches(); // ... }
Benchmarks RoadRunner
Well, it's time to launch benchmarks.
The results do not meet expectations: RoadRunner allows you to level out more factors causing performance losses than preload, but the results are worse. Let's figure out why this happens, as always, by running perf for this.
In the perf results, we see phar_compile_file. This is because we include some files during the execution of the script, and since OPcache is not enabled (RoadRunner runs scripts as the CLI, where OPcache is turned off by default), these files are compiled again with each request.
Let's edit the RoadRunner configuration - enable OPcache:
These results are already more like what we expected to see: RoadRunner began to show more performance than preload. But perhaps we can get even more!
There seems to be nothing more unusual with perf - let's look at the PHP code. The easiest way to profile it is to use phpspy : it does not require any modification of the PHP code - you just need to run it in the console. Let's do this and build a flame graph:
Since we agreed not to modify the logic of our application for the purity of the experiment, we are interested in the stack branch associated with the work of RoadRunner:
The main part of it comes down to calling fread (), hardly anything can be done with this. But we see some other branches in \ Spiral \ RoadRunner \ PSR7Client :: acceptRequest () , except for fread itself. You can understand their meaning by looking at the source code:
/** * @return ServerRequestInterface|null */ public function acceptRequest() { $rawRequest = $this->httpClient->acceptRequest(); if ($rawRequest === null) { return null; } $_SERVER = $this->configureServer($rawRequest['ctx']); $request = $this->requestFactory->createServerRequest( $rawRequest['ctx']['method'], $rawRequest['ctx']['uri'], $_SERVER ); parse_str($rawRequest['ctx']['rawQuery'], $query); $request = $request ->withProtocolVersion(static::fetchProtocolVersion($rawRequest['ctx']['protocol'])) ->withCookieParams($rawRequest['ctx']['cookies']) ->withQueryParams($query) ->withUploadedFiles($this->wrapUploads($rawRequest['ctx']['uploads']));
It becomes clear that RoadRunner is trying to create a PSR-7-compliant request object using a serialized array. If your framework works with PSR-7 query objects directly (for example, Symfony does not work ), then this is completely justified. In other cases, the PSR-7 becomes an extra link before the request is converted to what your application can work with. Let's remove this intermediate link and look at the results again:
The test script was quite light, so I managed to squeeze out a significant share of the performance - + 17% compared to pure PHP (I recall that preload gives + 10% on the same script).
Nuances of RoadRunner
In general, the use of RoadRunner is a more serious change than just enabling preload, so the nuances here are even more significant.
-, RoadRunner, , PHP- , , , : , , .
-, RoadRunner , «» — . / RoadRunner ; , , , , - .
-, endpoint', , , RoadRunner. .
Conclusion
, «» PHP, , preload RoadRunner.
PHP «» (PHP-FPM, Apache mod_php ) . - , . , , preload JIT.
, , , RoadRunner, .
, (: ):
- PHP 7.2 — 845 RPS;
- PHP 7.4 — 931 RPS;
- RoadRunner — 987 RPS;
- PHP 7.4 + preload — 1030 RPS;
- RoadRunner — 1089 RPS.
Badoo PHP 7.4 , ( ).
RoadRunner , , , , .
Thanks for attention!