Сoncurrent collections in 10 minutes

image

Photo by Robert V. Ruggiero



The topic is not new. But asking the question “what are concurrent collections and when to use them?” At an interview or code review, I almost always get an answer consisting of one sentence: “they completely protect us from race conditions” (which is impossible even in theory). Or: “it’s like ordinary collections, but everything inside is on locks”, which also does not quite correspond to reality.



The purpose of this article is to make out the topic in 10 minutes. It will be useful for quick acquaintance with some subtleties. Or to refresh your memory before the interview.



First of all, we will take a quick look at the contents of the System.Collections.Concurrent namespace. Then we discuss the main differences between concurrent and classic collections, note some unobvious points. In conclusion, we discuss possible pitfalls and when what types of collections are worth using.



What is in System.Collections.Concurrent



Intellisense tells you a little bit:



image



Let's briefly discuss the purpose of each class.



ConcurrentDictionary : A thread-safe, general-purpose collection applicable to a wide range of scenarios.



ConcurrentBag, ConcurrentStack, ConcurrentQueue : Special Purpose Collections. “Specialty” consists of the following points:





IProducerConsumerCollection - contract used by the BlockingCollection class (see below). Implemented by the ConcurrentStack , ConcurrentQueue, and ConcurrentBag collections.



BlockingCollection - used in scenarios when some threads fill a collection, while others extract elements from it. A typical example is a replenished task queue. If the collection is empty at the time of the request for the next element, then the reader goes into the waiting state of the new element (polling). By calling the CompleteAdding () method, we can indicate that the collection will no longer be replenished, then when reading polling will not be performed. You can check the state of the collection using the IsAddingCompleted ( true if the data will no longer be added) and IsCompleted ( true if the data will no longer be added and the collection is empty) properties.



Partitioner, OrderablePartitioner, EnumerablePartitionerOptions - basic constructions for implementing segmentation of collections . Used by the Parallel.ForEach method to specify how to distribute items across processing threads.



Further in the article, we will focus on the collections: ConcurrentDictionary and ConcurrentBag / Stack / Queue .



Differences between concurrent and classic collections



Protection of the internal state



Classic collections are designed with maximum performance in mind, so their instance methods do not guarantee thread safety.



For example, take a look at the source code for the Dictionary.Add method.

We can see the following lines (the code is simplified for readability):



if (this._buckets == null) { int prime = HashHelpers.GetPrime(capacity); this._buckets = new int[prime]; this._entries = new Dictionary<TKey, TValue>.Entry[prime]; }
      
      





As we can see, the internal state of the dictionary is not protected. When adding items from multiple threads, the following scenario is possible:



  1. Thread 1 called Add , execution stopped immediately after entering the if condition
  2. Thread 2 called Add , initialized the collection, added the item
  3. Stream 1 returned to work, re-initialized the collection, thereby destroying the data added by stream 2.


That is, classic collections are unsuitable for recording from multiple streams.



The API is tolerant of the current state of the collection.



As we know, you cannot add duplicate keys to the Dictionary . If we call Add twice with the same key, the second call will throw an ArgumentException .



This protection is useful in single-threaded scenarios. But with multithreading, we cannot be sure of the current state of the collection. Naturally, checks like the following save us only when we constantly wrap ourselves in lock:



 if (!dictionary.ContainsKey(key)) { dictionary.Add(key, “Hello”); }
      
      





The exception-based API is a bad option and will not allow stable, predictable behavior in multi-threaded scenarios. Instead, you need an API that does not make assumptions about the current state of the collection, does not throw exceptions, and leaves a decision about the validity of a particular state to the caller.



In concurrent collections, APIs are built on the TryXXX pattern. Instead of the usual Add , Get and Remove, we use the TryAdd , TryGetValue and TryRemove methods . And, if these methods return false , then we decide whether this is an exceptional situation or not.



It is worth noting that classic collections now also have state-tolerant methods. But in classic collections, such an API is a nice addition, and in concurrent collections it is a must.



API minimizing race conditions



Consider the simplest element update operation:



 dictionary[key] += 1;
      
      





For all its simplicity, the code performs as many as three actions: gets the value from the collection, adds 1, writes the new value. In multi-threaded execution, it is possible that the code retrieved a value, performed an increment, and then safely erased the value that was recorded by another thread while the increment was running.



To solve such problems, the concurrent collections API contains a number of helper methods. For example, the TryUpdate method, which takes three parameters: the key, the new value, and the expected current value. If the value in the collection does not match what was expected, then the update will not be performed and the method will return false .



Consider another example. Literally every line of the following code (including Console.WriteLine ) can cause problems with multi-threaded execution:



 if (dictionary.ContainsKey(key)) { dictionary[key] += 1; } else { dictionary.Add(key, 1); } Console.WriteLine(dictionary[key]);
      
      





Adding or updating a value, and then performing an operation with the result, is a fairly typical task. Therefore, the concurrent dictionary has the AddOrUpdate method, which performs a sequence of actions in one call and is thread safe:



 var result = dictionary.AddOrUpdate(key, 1, (itemKey, itemValue) => itemValue + 1); Console.WriteLine(result);
      
      





There is one point worth knowing about.



The implementation of the AddOrUpdate method calls the TryUpdate method described above and passes the current value from the collection to it. If the update failed (the neighboring thread has already changed the value), then the attempt is repeated and the transmitted update delegate is called again with the updated current value. That is, the update delegate can be called multiple times , so it should not contain any side effects.



Lock free algorithms and granular locks



Microsoft did a great job on the performance of concurrent collections, and not just wrapped all operations with locks. Studying the source you can see many examples of the use of granular locks, the use of competent algorithms instead of locks, as well as the use of special instructions and more “lighter” synchronization primitives than Monitor .



What concurrent collections do not give



From the examples above it is obvious that concurrent collections do not provide complete protection against race conditions and we must design our code accordingly. But that’s not all, there are a couple of points worth knowing about.



Polymorphism with classic collections



Concurrent collections, like the classic ones, implement the IDictionary , ICollection , and IEnumerable interfaces. But part of the API of these interfaces cannot be thread safe by definition. For example, the Add method, which we discussed above.



Concurrent collections implement such contracts without thread safety. And to “hide” an insecure API, they use an explicit implementation of the interfaces. This is worth remembering when we pass concurrent collections to methods that take input, for example, ICollection.



Also, concurrent collections do not comply with the Liskov substitution principle with respect to classical collections.



For example, the contents of a classic collection cannot be modified during iteration ; the following code will throw an InvalidOperationException for the List class:



 foreach (var element in list) { list.Remove(element); }
      
      





If we talk about concurrent collections, then modification at the time of enumeration does not lead to an exception, so that we can perform simultaneous reading and writing from different threads.



Moreover, concurrent collections differently implement the possibility of modification during enumeration. ConcurrentDictionary simply does not perform any checks and does not guarantee the result of the iteration, and ConcurrentStack / Queue / Bag locks and creates a copy of the current state, which iterates through.



Possible performance issues



We mentioned above that ConcurrentBag can “steal” elements from neighboring threads. This can result in performance issues if you write and read to the ConcurrentBag from different threads.



Also, concurrent collections impose full locks when querying the state of the entire collection ( Count , IsEmpty , GetEnumerator , ToArray , etc.) and therefore significantly slower than their classic counterparts.



Conclusion: using concurrent collections is worth it only if they are really necessary, since this choice is not “free”.



When what types of collections to use





conclusions



In this article, we briefly studied concurrent collections, when to use them and what specifics they have. Of course, the article does not exhaust the topic, and with serious work with multithreaded collections, you should dig deeper. The easiest way to do this is to look at the source code of the collections used. This is informative and not at all complicated, the code is very, very readable.



All Articles