🐲 📽️ 👩🏿‍🤝‍👩🏼 Effective use of libdispatch 🕓 🍸 🔢

( Note: the author of the original material is Thomas @tclementdev, a github and twitter user. The first-person narrative used by the author is saved in the translation below. )

I think that most developers use libdispatch inefficiently because of how it was introduced to the community, as well as due to confusing documentation and APIs. I came to this thought after reading the discussion of “concurrency” in the Swift development mailing list (swift-evolution). Messages from Pierre Habouzit (Pierre Habouzit - is engaged in support of libdispatch in Apple) are especially enlightened:

He also has many tweets on this topic:

twitter.com/pedantcoder

Made by me:

The program should have very few queues using the global pool ( threads - approx. Per. ). If all these queues are simultaneously active, then you will receive the same number of simultaneously running threads. These queues should be considered as execution contexts in the program (GUI, storage, work in the background, ...) that benefit from parallel execution.
Start with sequential execution. When you discover a performance problem, take measurements to find out the cause. And if parallel execution helps, use it carefully. Always verify that the parallel code is working under pressure from the system. By default, reuse queues. Add lines when it brings measurable benefits. Most applications should not use more than three to four queues.
Queues that have another queue set as the target work well and scale.

( Note perev .: about setting a queue as a target for another queue can be read, for example, here . )
Do not use dispatch_get_global_queue (). This is not compatible with quality of service and priorities and can lead to explosive growth in the number of flows. Instead, run your code in one of your execution contexts.
dispatch_async () is a waste of resources for small executable blocks (<1 ms), since this call will most likely require the creation of a new thread due to the excessive zeal of libdispatch. Instead of switching the execution context to protect the shared state, use lock mechanisms to access the shared state at the same time.
Some classes / libraries are well-designed in that they reuse the execution context that the calling code passes to them. This allows the use of conventional locking to ensure thread safety. os_unfair_lock is usually the fastest locking mechanism in the system: it works better with priorities and causes fewer context switches.
In the case of parallel execution, your tasks should not struggle among themselves, otherwise productivity drops sharply. The fight takes many forms. The obvious case: the struggle to capture the lock. But in reality, such a fight means nothing more than using a shared resource, which becomes a bottleneck: IPC (interprocess communication) / OS daemons, malloc (blocking), shared memory, I / O.
You do not need all the code to execute asynchronously in order to avoid an explosive increase in the number of threads. It is much better to use a limited number of lower queues and refuse to use dispatch_get_global_queue ().

( Note perev.1: apparently, this is a case when an explosive increase in the number of threads occurs when synchronizing a large number of parallel tasks “If I have lots of blocks and they all want to wait, we can get what we call thread explosion. " )

( Note p. 2: from the discussion it can be understood that Pierre Habuzit means the queues “which are known to the kernel when there are tasks in it . ” Here we are talking about the OS kernel. )
We must not forget about the complexity and bugs that arise in an architecture filled with asynchronous execution and callbacks. Sequentially executable code is still much easier to read, write, and maintain.
Competitive queues are less optimized than sequential ones. Use them if you are measuring performance gains, otherwise it is premature optimization.
If you need to send tasks in one queue both asynchronously and synchronously, then instead of dispatch_sync () use dispatch_async_and_wait (). dispatch_async_and_wait () does not guarantee execution on the thread from which the call originated, which reduces context switching when the target queue is active.

( Note transl. 1: actually dispatch_sync () also does not guarantee, the documentation about it states only “executes a block on the current thread, whenever possible. With one exception: the block sent to the main queue is always executed on the main thread. " )

( Note Rev. 2: about dispatch_async_and_wait () in the documentation and in the source code )
Correctly using 3-4 cores is not so simple. Most of those who try, in fact, can not cope with scaling and wasting energy for the sake of a tiny increase in productivity. How processors work with overheating will not help. For example, Intel will disable Turbo-Boost if enough cores are used.
Measure the performance of your product in the real world to make sure you make it faster, not slower. Be careful with micro performance tests - they hide the influence of the cache and keep the thread pool hot. To check what you are doing, you should always have a macro test.
libdispatch is effective, but there are no miracles. The resources are not endless. You cannot ignore the reality of the OS and hardware on which the code is executed. Also, not every code is well parallelized.

Take a look at all the calls to dispatch_async () in your code and ask yourself: is the task you submit with this call really worth the context switch. In most cases, locking is probably the best choice.

As soon as you start using and reusing queues (execution contexts) from a pre-designed set, there will be a danger of deadlocks. Danger arises when sending tasks to these queues using dispatch_sync (). This usually happens when queues are used for thread safety. Therefore, once again: the solution is to use locking mechanisms and use dispatch_async () only when you need to switch to another execution context.

I personally saw huge performance improvements from following these guidelines.

(in highly loaded programs). This is a new approach, but worth it.

More links

The program should have very few queues using the global pool

( Note perev.: Reading the last link, could not resist and transferred a piece from the middle of the correspondence of Pierre Habuzit with Chris Lattner. Below is one of the answers of Pierre Habuzit in 039420.html )

<...>

I understand that it’s hard for me to convey my point of view, because I am not a guy in language architecture, I am a guy in system architecture. And I definitely don’t understand Actors enough to decide how to integrate them into the OS. But for me, returning to the database example, the Actor-Database-Data, or the Actor-Network-Interface from the earlier correspondence, are different from, say, this SQL query or this network query. The first are the entities that the OS should know about in the kernel. While a SQL query or a network query are just actors queued for execution first. In other words, these top-level actors are different because they are top-level, directly on top of the kernel / low-level runtime. And this is the essence that the core should be able to reason about. This makes them great.

There are 2 types of queues and corresponding API level in the dispatch library:

global queues that are not queues like the others. And in reality, they are just an abstraction over the thread pool.
all other queues that you can set as target one for another as you want.

Today it has become clear that this was a mistake and that there should be 3 types of queues:

global queues, which are not real queues, but represent which family of system attributes your execution context requires (mostly priorities). And we must prohibit sending tasks directly to these queues.
lower queues (which GCD has tracked in recent years and calls "bases" in the source code ( it seems that the source code of GCD itself is referring to - approx. transl. ). The lower queues are known to the kernel when they have tasks.
any other "internal" queues that the kernel does not know about at all.

In the dispatch development team, we regret every passing day that the difference between the second and third group of queues was not initially made clear in the API.

I like to call the second group “execution contexts,” but I can understand why you want to call them Actors. This is perhaps more consistent (and the GCD did the same, presenting both this and that as a queue). Such top-level “Actors” should be few in number because if they all become active at the same time, they will need the same number of threads in the process. And this is not a resource that can be scaled. That is why it is important to distinguish between them. And, as we discuss, they are also commonly used to protect a shared state, resource, or the like. It may not be possible to do this using internal actors.

<...>