Hello again. Before leaving for the weekend, we want to share with you a translation of material that was prepared specifically for the basic course “iOS Developer” .
Deciding which data structure to use to represent a given set of values is often much more difficult than it seems. Since each type of data structure is optimized for a certain number of use cases, choosing the right fit for each data set can often have a big impact on the performance of our code.
The standard Swift library comes with three main data structures -
Array
,
Dictionary
and
Set
, each of which has its own set of optimizations, pluses and minuses. Let's look at some of their characteristics, as well as cases where we may need to go beyond the standard library to find the right data structure for our purposes.
Array linearity
Array
is probably one of the most commonly used data structures in Swift, and there are good reasons for this. It stores its elements sequentially, they are easily sorted in a predictable way, and any values can be stored in it: from structures to instances of classes and other collections.
For example, here we use an array to store a collection of shapes placed on a
Canvas
in a drawing application. Then, when we need to render the canvas into an image, we just go through the array to draw each element using the
DrawingContext
- as follows:
struct Canvas { var shapes: [Shape] func render() -> Image { let context = DrawingContext() shapes.forEach(context.draw) return context.makeImage() } }
When it comes to sequential rendering of all our shapes, as we did above, the array fits perfectly. Arrays not only store their elements in a very efficient way, they also have a guaranteed sorting order, which provides a predictable rendering order without having to do any additional work.
However, like all other data structures, arrays have their drawbacks. In our case, we will encounter one of its drawbacks when we want to remove shapes from canvas. Since the elements of the array are stored by index, we always need to first find which index the figure is associated with before we can remove it:
extension Canvas { mutating func remove(_ shape: Shape) { guard let index = shapes.firstIndex(of: shape) else { return } shapes.remove(at: index) } }
At first, the above code may not seem so problematic, but it can become a performance bottleneck for any canvas containing a large number of shapes, since
firstIndex
is linear (O (N)) in terms of
time complexity .
Although we can get around this limitation where we use our type of Canvas. For example, always referring to figures by index rather than by value or ID - this would make our code more complex and more fragile, since we would always have to be sure that our indexes do not expire when the canvas we are working with will change.
Speed sets
Instead, let's see if we can optimize the
Canvas
itself by changing its underlying data structure. Considering the above problem, one of our initial ideas might be to use
Set
(sets) instead of
Array
. As we already discussed in
The power of sets in Swift , one of the significant advantages of sets over arrays is that both insertion and deletion can always be performed in a constant (O (1)) time, since items are stored by hash value, not by index.
By updating
Canvas
so that it uses sets, we get the following:
struct Canvas { var shapes: Set<Shape> func render() -> Image { let context = DrawingContext() shapes.forEach(context.draw) return context.makeImage() } mutating func remove(_ shape: Shape) { shapes.remove(shape) } }
Again, the above code may look right, and it even compiles without problems. However, although we solved the deletion problem, we also lost our stable rendering order - because, unlike arrays, sets do not give us a guaranteed sorting order - which is a stumbling block in this situation, since it looks like we will draw custom shapes in randomly.
Indexing Indexes
Let's continue to experiment. Now let's see if we can optimize
Canvas
by introducing a
Dictionary
, which allows us to search for the index of any shape based on its ID. We'll start by changing the access level for our
shapes
array to
private
so that we can control the insertion of elements using the new
add
method. And every time a new figure is added, we also add its index to our dictionary:
struct Canvas { private var shapes = [Shape]() private var indexes = [Shape.ID : Int]() func render() -> Image { let context = DrawingContext() shapes.forEach(context.draw) return context.makeImage() } mutating func add(_ shape: Shape) { let index = shapes.count indexes[shape.id] = index shapes.append(shape) } }
Since now we always know at what index the given figure is stored, we can quickly perform deletion in constant time, as when using a set:
extension Canvas { mutating func remove(_ shape: Shape) { guard let index = indexes[shape.id] else { return } shapes.remove(at: index) indexes[shape.id] = nil } }
However, there is a pretty serious bug in our new
Canvas
implementation. Each time we delete a shape, we actually invalidate all indexes that are higher than the one we just deleted - since each of these elements will move one step to the beginning of the array. We could solve this problem by adjusting these indices after each deletion, but this would again bring us back to O (N) territory, which we tried to avoid from the very beginning.
Our latest implementation has its merits. In general, using a combination of two data structures can be a great idea in such situations, since we can often use the strengths of one data structure to compensate for the shortcomings of the other, and vice versa.
So, let's try again with a different combination, but this time let's start by looking at our
real requirements :
- We need both inserts and deletions to have constant time complexity, and it should be possible to delete the figure without knowing its base index.
- We need a guaranteed brute force order to be able to maintain a stable rendering order.
Looking at the above requirements, it turns out that although we need a stable sorting order, we really don't need indexes. This would make the linked list perfect for our use case.
Linked lists consist of nodes, where each node contains a link (or link) to the next node in the list, which means that it can be sorted in a predictable way - without the need for any index updates when an item is deleted. However, the Swift standard library (for now) does not contain a linked list type, so if we want to use it, we must first create it.
Create a linked list
Let's start by declaring a
List
structure that will track the first and last nodes in our list. We will make both of these properties read-only outside of our type to ensure data consistency:
struct List<Value> { private(set) var firstNode: Node? private(set) var lastNode: Node? }
Next, let's create our
Node
type (node), which we will make a class, because we want to be able to refer to nodes
by reference, and not by value . Our list will be doubly connected, which means that each node will contain a link to both its next neighbor and the previous one. Each node will also store a value - like this:
extension List { class Node { var value: Value fileprivate(set) weak var previous: Node? fileprivate(set) var next: Node? init(value: Value) { self.value = value } } }
The reason we make the previous property weak is to avoid the retain-loops that would appear if we kept strong links in both directions. To learn more about avoiding retain cycles, see the article “Memory Management” .
This is actually all the code that we need so that values can be stored in our linked list. But this is only the first part of the puzzle, as in any other collection, we also want to be able to iterate over it and change its contents. Let's start with iterations, which, thanks to the very protocol-oriented Swift design, can be easily implemented by ensuring compliance with the
Sequence
protocol and implementing the
makeIterator
method:
extension List: Sequence { func makeIterator() -> AnyIterator<Value> { var node = firstNode return AnyIterator {
Since the above iteration is very simple, we use the AnyIterator
standard library to avoid the need to implement a custom iterator type. For more complex scenarios, it can be implemented by adding a match to IteratorProtocol
.
Next, let's add an API to modify our linked list, starting with the inserts. We will extend the
List
with the
append
method, which adds a new node for the inserted value and then returns this node - like this:
extension List { @discardableResult mutating func append(_ value: Value) -> Node { let node = Node(value: value) node.previous = lastNode lastNode?.next = node lastNode = node if firstNode == nil { firstNode = node } return node } }
Above, we use the @discardableResult
attribute, which tells the compiler not to generate any warnings if the result of calling our method was not used, since we are not always interested in the node that was created.
Since linked lists are not based on indexes, but on maintaining a chain of values through links, implementing deletions is just a matter of updating the following and previous neighbors of the remote nodes so that they point to each other instead:
extension List { mutating func remove(_ node: Node) { node.previous?.next = node.next node.next?.previous = node.previous
Based on the foregoing, the initial version of our List is completed, and we are ready to check it in action. Let's update the Canvas to use our new list, as well as a dictionary that allows us to quickly find which node matches a given shape ID:
struct Canvas { private var shapes = List<Shape>() private var nodes = [Shape.ID : List<Shape>.Node]() func render() -> Image { let context = DrawingContext() shapes.forEach(context.draw) return context.makeImage() } mutating func add(_ shape: Shape) { nodes[shape.id] = shapes.append(shape) } mutating func remove(_ shape: Shape) { guard let node = nodes.removeValue(forKey: shape.id) else { return } shapes.remove(node) } }
Now we have both quick insertions and deletions, and a predictable sorting order, without the need to add additional complexity to the call process - it's pretty cool! And, since our new List is a completely universal type, now we can use it whenever we again need to store values without an index in a linear way.
Conclusion
Despite the fact that the data structures are so fundamental that they can be found in all kinds of programming languages, the decision about which one to use in each specific situation may still require a significant amount of thought, testing and experimentation, especially if we want to so that our code remains effective as the data set grows.
It is also very likely that the appropriate data structure for each particular situation may change over time as our requirements change, and sometimes using a combination of several data structures, and not just one, can be a way to achieve the required performance characteristics.
We will continue to explore the world of data structures in the following articles, focusing on those that are not yet implemented in the standard library. As with so many other things, sometimes we need to expand our thinking beyond Swift to choose the right data structure for each situation.
You can find me
on Twitter or email me if you have any questions, comments or feedback.
Thanks for your attention!