Choosing the right data structure in Swift

Hello again. Before leaving for the weekend, we want to share with you a translation of material that was prepared specifically for the basic course “iOS Developer” .








Deciding which data structure to use to represent a given set of values ​​is often much more difficult than it seems. Since each type of data structure is optimized for a certain number of use cases, choosing the right fit for each data set can often have a big impact on the performance of our code.



The standard Swift library comes with three main data structures - Array



, Dictionary



and Set



, each of which has its own set of optimizations, pluses and minuses. Let's look at some of their characteristics, as well as cases where we may need to go beyond the standard library to find the right data structure for our purposes.



Array linearity



Array



is probably one of the most commonly used data structures in Swift, and there are good reasons for this. It stores its elements sequentially, they are easily sorted in a predictable way, and any values ​​can be stored in it: from structures to instances of classes and other collections.



For example, here we use an array to store a collection of shapes placed on a Canvas



in a drawing application. Then, when we need to render the canvas into an image, we just go through the array to draw each element using the DrawingContext



- as follows:



 struct Canvas { var shapes: [Shape] func render() -> Image { let context = DrawingContext() shapes.forEach(context.draw) return context.makeImage() } }
      
      





When it comes to sequential rendering of all our shapes, as we did above, the array fits perfectly. Arrays not only store their elements in a very efficient way, they also have a guaranteed sorting order, which provides a predictable rendering order without having to do any additional work.



However, like all other data structures, arrays have their drawbacks. In our case, we will encounter one of its drawbacks when we want to remove shapes from canvas. Since the elements of the array are stored by index, we always need to first find which index the figure is associated with before we can remove it:



 extension Canvas { mutating func remove(_ shape: Shape) { guard let index = shapes.firstIndex(of: shape) else { return } shapes.remove(at: index) } }
      
      





At first, the above code may not seem so problematic, but it can become a performance bottleneck for any canvas containing a large number of shapes, since firstIndex



is linear (O (N)) in terms of time complexity .



Although we can get around this limitation where we use our type of Canvas. For example, always referring to figures by index rather than by value or ID - this would make our code more complex and more fragile, since we would always have to be sure that our indexes do not expire when the canvas we are working with will change.



Speed ​​sets



Instead, let's see if we can optimize the Canvas



itself by changing its underlying data structure. Considering the above problem, one of our initial ideas might be to use Set



(sets) instead of Array



. As we already discussed in The power of sets in Swift , one of the significant advantages of sets over arrays is that both insertion and deletion can always be performed in a constant (O (1)) time, since items are stored by hash value, not by index.



By updating Canvas



so that it uses sets, we get the following:



 struct Canvas { var shapes: Set<Shape> func render() -> Image { let context = DrawingContext() shapes.forEach(context.draw) return context.makeImage() } mutating func remove(_ shape: Shape) { shapes.remove(shape) } }
      
      





Again, the above code may look right, and it even compiles without problems. However, although we solved the deletion problem, we also lost our stable rendering order - because, unlike arrays, sets do not give us a guaranteed sorting order - which is a stumbling block in this situation, since it looks like we will draw custom shapes in randomly.



Indexing Indexes



Let's continue to experiment. Now let's see if we can optimize Canvas



by introducing a Dictionary



, which allows us to search for the index of any shape based on its ID. We'll start by changing the access level for our shapes



array to private



so that we can control the insertion of elements using the new add



method. And every time a new figure is added, we also add its index to our dictionary:



 struct Canvas { private var shapes = [Shape]() private var indexes = [Shape.ID : Int]() func render() -> Image { let context = DrawingContext() shapes.forEach(context.draw) return context.makeImage() } mutating func add(_ shape: Shape) { let index = shapes.count indexes[shape.id] = index shapes.append(shape) } }
      
      





Since now we always know at what index the given figure is stored, we can quickly perform deletion in constant time, as when using a set:



 extension Canvas { mutating func remove(_ shape: Shape) { guard let index = indexes[shape.id] else { return } shapes.remove(at: index) indexes[shape.id] = nil } }
      
      





However, there is a pretty serious bug in our new Canvas



implementation. Each time we delete a shape, we actually invalidate all indexes that are higher than the one we just deleted - since each of these elements will move one step to the beginning of the array. We could solve this problem by adjusting these indices after each deletion, but this would again bring us back to O (N) territory, which we tried to avoid from the very beginning.



Our latest implementation has its merits. In general, using a combination of two data structures can be a great idea in such situations, since we can often use the strengths of one data structure to compensate for the shortcomings of the other, and vice versa.



So, let's try again with a different combination, but this time let's start by looking at our real requirements :





Looking at the above requirements, it turns out that although we need a stable sorting order, we really don't need indexes. This would make the linked list perfect for our use case.



Linked lists consist of nodes, where each node contains a link (or link) to the next node in the list, which means that it can be sorted in a predictable way - without the need for any index updates when an item is deleted. However, the Swift standard library (for now) does not contain a linked list type, so if we want to use it, we must first create it.



Create a linked list



Let's start by declaring a List



structure that will track the first and last nodes in our list. We will make both of these properties read-only outside of our type to ensure data consistency:



 struct List<Value> { private(set) var firstNode: Node? private(set) var lastNode: Node? }
      
      





Next, let's create our Node



type (node), which we will make a class, because we want to be able to refer to nodes by reference, and not by value . Our list will be doubly connected, which means that each node will contain a link to both its next neighbor and the previous one. Each node will also store a value - like this:



 extension List { class Node { var value: Value fileprivate(set) weak var previous: Node? fileprivate(set) var next: Node? init(value: Value) { self.value = value } } }
      
      





The reason we make the previous property weak is to avoid the retain-loops that would appear if we kept strong links in both directions. To learn more about avoiding retain cycles, see the article “Memory Management” .
This is actually all the code that we need so that values ​​can be stored in our linked list. But this is only the first part of the puzzle, as in any other collection, we also want to be able to iterate over it and change its contents. Let's start with iterations, which, thanks to the very protocol-oriented Swift design, can be easily implemented by ensuring compliance with the Sequence



protocol and implementing the makeIterator



method:



 extension List: Sequence { func makeIterator() -> AnyIterator<Value> { var node = firstNode return AnyIterator { //   ,    ,    : let value = node?.value node = node?.next return value } } }
      
      





Since the above iteration is very simple, we use the AnyIterator



standard library to avoid the need to implement a custom iterator type. For more complex scenarios, it can be implemented by adding a match to IteratorProtocol



.
Next, let's add an API to modify our linked list, starting with the inserts. We will extend the List



with the append



method, which adds a new node for the inserted value and then returns this node - like this:



 extension List { @discardableResult mutating func append(_ value: Value) -> Node { let node = Node(value: value) node.previous = lastNode lastNode?.next = node lastNode = node if firstNode == nil { firstNode = node } return node } }
      
      





Above, we use the @discardableResult



attribute, which tells the compiler not to generate any warnings if the result of calling our method was not used, since we are not always interested in the node that was created.
Since linked lists are not based on indexes, but on maintaining a chain of values ​​through links, implementing deletions is just a matter of updating the following and previous neighbors of the remote nodes so that they point to each other instead:



 extension List { mutating func remove(_ node: Node) { node.previous?.next = node.next node.next?.previous = node.previous //  « »,        ,    : if firstNode === node { firstNode = node.next } if lastNode === node { lastNode = node.previous } } }
      
      





Based on the foregoing, the initial version of our List is completed, and we are ready to check it in action. Let's update the Canvas to use our new list, as well as a dictionary that allows us to quickly find which node matches a given shape ID:



 struct Canvas { private var shapes = List<Shape>() private var nodes = [Shape.ID : List<Shape>.Node]() func render() -> Image { let context = DrawingContext() shapes.forEach(context.draw) return context.makeImage() } mutating func add(_ shape: Shape) { nodes[shape.id] = shapes.append(shape) } mutating func remove(_ shape: Shape) { guard let node = nodes.removeValue(forKey: shape.id) else { return } shapes.remove(node) } }
      
      





Now we have both quick insertions and deletions, and a predictable sorting order, without the need to add additional complexity to the call process - it's pretty cool! And, since our new List is a completely universal type, now we can use it whenever we again need to store values ​​without an index in a linear way.



Conclusion



Despite the fact that the data structures are so fundamental that they can be found in all kinds of programming languages, the decision about which one to use in each specific situation may still require a significant amount of thought, testing and experimentation, especially if we want to so that our code remains effective as the data set grows.



It is also very likely that the appropriate data structure for each particular situation may change over time as our requirements change, and sometimes using a combination of several data structures, and not just one, can be a way to achieve the required performance characteristics.



We will continue to explore the world of data structures in the following articles, focusing on those that are not yet implemented in the standard library. As with so many other things, sometimes we need to expand our thinking beyond Swift to choose the right data structure for each situation.



You can find me on Twitter or email me if you have any questions, comments or feedback.



Thanks for your attention!



All Articles