Data structures for graph storage: a review of existing and two "almost new"

Hello.



In this note, I decided to list the main data structures used to store graphs in computer science, and also talk about a couple of other structures that somehow crystallized out by me.



So, let's begin. But not from the very beginning - I think what a graph is and what they are (oriented, non-oriented, weighted, unweighted, with multiple edges and loops or without them), we all already know.



So let's go. What are the options for data structures for "graph storage" we have.



1. Matrix data structures



1.1 Adjacency matrix. The adjacency matrix is ​​a matrix where the row and column headings correspond to the numbers of the vertices of the graph, and the value of each of its elements a (i, j) is determined by the presence or absence of edges between the vertices i and j (it is clear that such a matrix for an undirected graph will be symmetrical, well, or you can agree that we store all the values ​​only above the main diagonal). For unweighted graphs, a (i, j) can be specified by the number of edges from i to j (if there is no such edge, then a (i, j) = 0), and for weighted ones, by the weight (total weight) of the mentioned edges.



1.2 The incidence matrix. In this case, our graph is also stored in a table in which, as a rule, row numbers correspond to the numbers of its vertices, and column numbers correspond to pre-numbered edges. If the vertex and the edge are incident to each other, then a nonzero value is written in the corresponding cell (for undirected graphs, 1 is written in the case of incidence of the vertex and edge, for oriented graphs “1” if the edge “leaves” the vertex and “-1” if it it “enters” (it is remembered quite easily, because the minus sign also seems to be “included” in the number “-1”)). For weighted graphs, again, instead of 1 and -1, you can specify the total weight of the edge.



2. Enumeration data structures



2.1 adjacency list. Well, everything seems to be simple. In general, each vertex of a graph can be associated with any enumeration structure (list, vector, array, ...), in which the numbers of all vertices adjacent to this one will be stored. For oriented graphs, we will only list vertices in which there is a “directed” edge from the vertex of the attribute. For weighted graphs, the implementation will be more complicated.



2.2 List of ribs. Pretty popular data structure. The list of edges, as Captain Evidence tells us, is actually a list of edges of the graph, each of which is defined by an initial vertex, an end vertex (for undirected graphs, the order is not important here, although different rules can be used for unification, for example, specify vertices in order ascending) and weight (only for weighted graphs).



About the above lists, the matrices can be viewed in more detail (and with illustrations), for example, here .



2.3 Adjacency Array. Not the most common structure. At its core, it is a form of “packing” adjacency lists into one enumeration structure (array, vector). The first n (by the number of graph vertices) elements of such an array contain start indices of the same array, starting from which all vertices adjacent to this are written in it in a row.



Here I found the most understandable (for myself) explanation: ejuo.livejournal.com/4518.html



3. Adjacency Vector and Associative Adjacency Array



It happened so that the author of these lines, not being a professional programmer, but periodically dealing with graphs, most often dealt with lists of edges. Indeed, it is convenient if the graph has multiple loops and edges. And now, in the development of the classical lists of edges, I propose to pay attention to their “development / branching / modification / mutation”, namely: the adjacency vector and the associative array of adjacency.



3.1 Adjacency vector



Case (A1): Unweighted Count



We will call the adjacency vector for an unweighted graph an ordered set of an even number of integers (a [2i], a [2i + 1], ..., where i is numbered c 0), in which each pair of numbers a [2i], a [2i +1] defines the edge of the graph between the vertices a [2i] and a [2i + 1], respectively.

This recording format does not contain information whether the graph is oriented (both options are possible). When using the digraph format, it is assumed that the edge is directed from a [2i] to a [2i + 1]. Hereinafter: for undirected graphs, if necessary, the requirements for the order of recording vertices can be applied (for example, so that the vertex with the lower value of the number assigned to it goes first).



In C ++, it is advisable to specify the adjacency vector using std :: vector, from which the name of this data structure was chosen.



Case (a2): unweighted graph, integer edge weights



By analogy with case (a1), we call the adjacency vector for a weighted graph with integer edge weights an ordered set (dynamic array) of numbers (a [3i], a [3i + 1], a [3i + 2], ..., where i is numbered c 0), where each "triplet" of the numbers a [3i], a [3i + 1], a [3i + 2] defines the edge of the graph between the vertices under the numbers a [3i] and a [3i + 1], respectively, and the value of a [3i + 2] is the weight of this edge. Such a graph can also be either oriented or not.



Case (b): unweighted graph, non-integer edge weights



Since the heterogeneous elements cannot be stored in one array (vector), the following implementation is possible, for example. The graph is stored in a pair of vectors, in which the first vector is the adjacency vector of the graph without specifying weights, and the second vector contains the corresponding weights (a possible implementation for C ++ is std :: pair <std :: vector, std :: vector>). Thus, for an edge defined by a pair of vertices under the indices 2i, 2i + 1 of the first vector, the weight will be equal to the element under the index i of the second vector.



Well, why is this necessary?



Well, to the author of these lines for solving a number of problems this seemed quite useful. Well, from a formal point of view, there will be such advantages:





However, I must admit, this “listot” does not imply quick access to the rib. And here the Associative adjacency array hurries to help, about which - below.



3.2 Associative adjacency array



So, if for us access to a particular edge, its weight and other properties is critical, and the memory requirements do not allow the use of an adjacency matrix, then let's think about how you can change the adjacency vector to solve this problem. So, the key is the edge of the graph, which can be defined as an ordered pair of integers. What does it look like? Could it be a key in an associative array? And if so, why don't we implement this? Let us have such an associative array, where each key - an ordered pair of integers - will be associated with a value - an integer or a real number that specifies the weight of the edge. In C ++, it is advisable to implement this structure based on the std :: map container (std :: map <std :: pair <int, int>, int> or std :: map <std :: pair <int, int>, double>) , or std :: multimap if multiple edges are assumed. Well, and here we have a structure for storing graphs, which takes up less memory than “matrix” structures, can define graphs with multiple loops and edges and even without strict requirements for the non-negativity of vertex numbers (I don’t know who needs it, but still).



4. Data structures at least “flood”, but something is missing



And the truth is: when solving a number of problems, we may need to assign some attributes to the edges of the graph and, accordingly, to store them. If it is possible to unambiguously reduce these features to integers, then it is possible to store such “graphs with additional features” using extended versions of the adjacency vector and associative adjacency array.



So, let us have an unweighted graph, for each edge of which it is necessary to store, for example, 2 additional attributes specified by integers. In this case, it is possible to specify its adjacency vector as an ordered set of not “pairs”, but “quartets” of integers (a [2i], a [2i + 1], a [2i + 2], a [2i + 3] .. .), where a [2i + 2] and a [2i + 3] will determine the features of the corresponding edge. For a graph with integer edge weights, the order is generally similar (the only difference is that the signs follow the edge weight and are given by the elements a [2i + 3] and a [2i + 4], and the edge itself will be specified not 4, but 5 ordered numbers). And for a graph with non-integer edge weights, the attributes can be written to its unweighted component.



When using an associative adjacency array for graphs with integer edge weights, it is possible to specify not an individual number, but an array (vector) of numbers that specify, in addition to the edge weight, all its other necessary attributes. At the same time, the inconvenience for the case of non-integer weights will be the need to specify a character with a floating point number (yes, this is an inconvenience, but if there are not so many such signs, and if you do not set them too "tricky" double, then it may be nothing) . And so, in C ++, extended associative adjacency arrays can be defined as follows: std :: map <std :: pair <int, int>, std :: vector> or std :: map <std :: pair <int, int>, std :: vector, with the first value in "vector-value-by-key" is the weight of the edge, and then the numerical designations of its features are located.



Literature:



About graphs and algorithms in general:



1. Cormen, Thomas H., Leiserson, Charles I., Rivest, Ronald L., Stein, Clifford. Algorithms: construction and analysis, 2nd edition: Per. from English - M.: Williams Publishing House, 2011.

2. Harari Frank. Graph theory. M .: Mir, 1973.

The report of the author about these same vector and associative array of adjacency:

3. Chernoukhov S.A. Adjacency vector and associative adjacency array as ways of representing and storing graphs / SA Chernouhov. Adjacency vector and adjacency map as data structures to represent a graph // Collection of articles of the International scientific and practical conference "Problems of implementing the results of innovative developments and ways to solve them" (Saratov, 09/14/2019). - Sterlitamak: AMI, 2019, p. 65-69

Useful internet sources on the topic:

4.prog-cpp.ru/data-graph

5. ejuo.livejournal.com/4518.html



All Articles