Parallel or distributed computing - a thing in itself is very nontrivial. Both the development environment should support, and the DS specialist should have the skills to conduct parallel computing, and the task should be reduced to a form that can be divided into parts, if one exists. But with a competent approach, you can greatly accelerate the solution of the problem with single-threaded R, if you have at least a multi-core processor (and almost everyone has it now), adjusted for the theoretical acceleration limit determined by Amdal's law . However, in some cases, even it can be circumvented.
It is a continuation of previous publications .
As a rule, when an analyst (DS specialist, developer, or choose any suitable name for yourself) tries to speed up the task within one computer and begins to move from single-threaded to multi-threaded mode, he does it in a boilerplate manner. parApply
, foreach\%dopar%
, etc. You can see compactly and intelligibly, for example, here: “Parallelism in R” . 3 steps:
core-1
foreach
For typical computing tasks that occupy 100% of the CPU and do not require the transfer of a large amount of input information, this is the right approach. The main point that needs attention is to provide logging within the threads in order to be able to control the process. Without logging, the flight will go even without instruments.
In the case of "enterprise" tasks, when they are parallelized, many additional methodological difficulties arise that significantly reduce the effect of the above straightforward approach:
This is a completely typical scenario when, as part of the process, you need to get a voluminous job as an input, read data from the disk, pick up a large chunk from the database, ask for external systems and wait for an answer from them (classic - REST API request), and then return N to the parent process megabytes as a result.
Map-reduce
by users, locations, documents, ip-addresses, dates, ... (add it yourself). In the most sad cases, parallel execution may be longer than single-threaded. Out of memory problems may also occur. Everything is lost? Not at all.
Let us consider the thesis of a way to radically improve the situation. At the same time, we do not forget that we live in the framework of a full zoo. Productive circuit on *nix
, DS laptops on Win * nix \ MacOS, but it is necessary that it works uniformly everywhere.
10^6
future
doFuture
doFuture
doFuture
*nix
htop
Result - the original task is many times faster. Acceleration can be even greater than the number of available cores.
There is no code consciously, since the main task of the publication is to share the approach and the excellent family of future
packages.
There are a few small nuances that also need to be traced:
doFuture
gc
rm
plan(sequential)
Previous publication - “Business processes in enterprise companies: speculation and reality. We shed light with R " .