In the examples, I will use a data set containing 52,608,000 records.
Using the example of not difficult analytical reserves, I will demonstrate that even a weak computer can be turned into a good tool for analyzing a “decent” amount of data without much effort.
Having set up not complicated experiments, we will see that a regular table is not a suitable source for analytical queries.
If the reader can easily decrypt the abbreviations OLTP and OLAP, it may make sense to go directly to the Columnstore section
Two approaches to working with data
Here I will be brief, because There is more than enough information on this topic on the Internet.
So, at the highest level, there are only two approaches to working with data: OLTP and OLAP.
OLTP - can be translated as instant transaction processing. In fact, we are talking about online processing of short transactions that work with a small amount of data. For example, recording, updating or deleting an order. In the vast majority of cases, an order is an extremely small amount of data, during the processing of which you can not be afraid of the long locks imposed by modern RDBMSs.
OLAP - can be translated as analytical processing of a large number of transactions at a time. Any report uses this particular approach, because in the vast majority of cases the report produces consolidated, aggregated figures for certain sections.
Each approach has its own technology. For example, for OLTP it is PostgreSQL, and for OLAP it is Microsoft SQL Server Analysis Services. While PostgresSQL uses a well-known format for storing data in tables, several different formats were invented for OLAP. These are multidimensional tables, bucket filled with key-value pairs and my favorite columnstore. About the latter in more detail below.
Why are two approaches needed?
It was noted that any data warehouse sooner or later faces two types of load: frequent reading (writing and updating, of course, too) of extremely small amounts of data and rare reading, but very large amounts of data. In fact, this is activity, for example, of the cash register and the head. The cash desk, working all day, fills the storage with small chunks of data, while at the end of the day the volume of accumulated, if the business is going well, reaches an impressive size. In turn, the manager at the end of the day wants to know how much money the box office earned per day.
So, in OLTP we have tables and indexes. These two tools are great for recording box office activity with all the details. Indexes provide a quick search for a previously recorded order, so changing an order is easy. But in order to satisfy the needs of the leader, we need to consider the entire amount of data accumulated per day. In addition, as a rule, the manager does not need all the details of all orders. What he really needs to know is how much money the box office made in general. It doesn’t matter where the ticket office was, when there was a lunch break, who worked for it, etc. OLAP exists then, so that in a short time period the system can answer the question - how much the company has earned as a whole without sequentially reading each order and all its details. Can OLAP use the same tables and indexes as OLTP? The answer is no, at least it shouldn't. Firstly, because OLAP just doesn’t need all the details recorded in the tables. This problem is solved by storing data in other formats other than two-dimensional tables. Secondly, the analyzed information is often scattered across different tables, which entails their multiple associations, including self-join associations. To solve this problem, as a rule, they develop a special database schema. This scheme is optimized for OLAP load, as well as the normal normalized scheme for OLTP load.
What happens when OLAP uses an OLTP scheme
In fact, I introduced this section so that this article clearly meets my own requirements for the format of such material, i.e. problem, solution, conclusion.
We list a number of disadvantages of using OLTP schemes for data analysis.
- Too many indexes.
Often, you have to create special indexes to support reports. These indexes implement an OLAP data storage scheme. They are not used by the OLTP part of the application, while exerting a load on it, requiring constant support and taking up disk space. - The amount of data read exceeds the required.
- Lack of a clear data scheme.
The fact is that often the information submitted by reports in a single form is spread out in different tables. Such information requires constant transformation on the fly. The simplest example is the amount of revenue, which consists of cash and non-cash money. Another striking example is data hierarchies. Because application development is progressive and it is not always known what will be needed in the future, the same hierarchy in meaning can be stored in different tables. And while on-the-fly acquisition is actively used in OLAP, these are slightly different things. - Excessive query complexity.
Because An OLTP scheme is different from an OLAP. A strongly connected software layer is needed that brings the OLTP data scheme to the right form. - Complexity of support, debugging and development.
In general, we can say that the more complex the code base, the more difficult it is to maintain it in working condition. This is an axiom. - The complexity of the test coverage.
A lot of copies were broken due to discussions on the topic of how to get a database full of all test scripts, but it’s better to say that having a simpler data scheme the task of covering with tests is simplified many times. - Endless performance debugging.
There is a high probability that the user will order a “dead load” report for the database server. This probability increases over time. It should be noted that OLAP is also prone to this problem, but unlike OLTP, the OLAP resource in this matter is much higher.
Columnstore
This article will focus on the columnstore storage format, but without low-level details. Other formats mentioned above also deserve attention, but this is a topic for another article.
Actually, the columnstore format has been known for about 30 years. But in RDBMS it was not implemented until recently. The essence of columnstore is that data is stored not in rows, but in columns. Those. on one page (all known 8 Kb), the server records data of only one field. And so with each field in the table in turn. This is necessary so that you do not have to read extra information. Let's imagine a table with 10 fields and a query that has only one field specified in the SELECT statement. If it were a regular table saved in a row-based format, the server would be forced to read all 10 fields, but only return one. It would turn out that the server read 9 times more information than was necessary. Columnstore completely solves this problem, because The storage format allows reading only one ordered field. All this happens because the storage unit in an RDBMS is a page. Those. the server always writes and reads at least one page. The only question is how many fields are present on it.
How Columnstore Can Really Help
To answer this one must have exact numbers. Let's get them. But what numbers can give an accurate picture?
- The amount of disk space.
- Query performance.
- Fault tolerance.
- Ease of implementation.
- What new skills should a developer have to work with new structures.
Disk space
Let's create a simple table, fill it with data and check how much space it takes.
create foreign table cstore_table ( trd date, org int, op int, it int, wh int, m1 numeric(32, 2), m2 numeric(32, 2), m3 numeric(32, 2), m4 numeric(32, 2), m5 numeric(32, 2) ) server cstore_server options(compression 'pglz');
As you noticed, I created an external table. The fact is that PostgreSQL does not have built-in columnstore support. But PostgreSQL has a powerful system for extensions. One of them makes it possible to create columnstore tables. Links at the end of the article.
- pglz - tells the extension that the data should be compressed using the built-in algorithm in PostgreSQL;
- trd - transaction time;
- op, it, wh — analytical sections or measurements;
- m1, m2, m3, m4, m5 - numerical indicators or measures;
Let's insert a “decent” amount of data and see how much space it takes on disk. At the same time, we check the performance of the insert. Because I put my experiments on a home laptop, I am slightly organic in the amount of data. In addition, which is even good, I will use the HDD running the guest OS Fedora 30. OS host - Windows 10 Home Edition. Processor Intel Core 7. Guest OS received 4 GB of RAM. PostgreSQL version - PostgreSQL 10.10 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 9.1.1 20190503 (Red Hat 9.1.1-1), 64-bit. I will experiment with a data set with the number of records 52 608 000.
explain (analyze) insert into cstore_table select '2010-01-01'::date + make_interval(days => d) as trd , op , org , wh , it , 100 as m1 , 100 as m2 , 100 as m3 , 100 as m4 , 100 as m5 from generate_series(0, 1) as op cross join generate_series(1, 2) as org cross join generate_series(1, 3) as wh cross join generate_series(1, 4000) as it cross join generate_series(0, 1095) as d;
The implementation plan will be as follows
Insert on cstore_table (cost = 0.01..24902714242540.01 rows = 1000000000000000 width = 150) (actual time = 119560.456..119560.456 rows = 0 loops = 1)Total lead time - 1.994867517 minutes
----> Nested Loop (cost = 0.01..24902714242540.01 rows = 1000000000000000 width = 150) (actual time = 1.823..22339.976 rows = 52608000 loops = 1)
----------> Function Scan on generate_series d (cost = 0.00..10.00 rows = 1000 width = 4) (actual time = 0.151..2.198 rows = 1096 loops = 1)
----------> Materialize (cost = 0.01..27284555030.01 rows = 1000000000000 width = 16) (actual time = 0.002..3.196 rows = 48000 loops = 1096)
----------------> Nested Loop (cost = 0.01..17401742530.01 rows = 1000000000000 width = 16) (actual time = 1.461..15.072 rows = 48000 loops = 1)
----------------------> Function Scan on generate_series it (cost = 0.00..10.00 rows = 1000 width = 4) (actual time = 1.159..2.007 rows = 4000 loops = 1)
----------------------> Materialize (cost = 0.01..26312333.01 rows = 1000000000 width = 12) (actual time = 0.000..0.001 rows = 12 loops = 4000)
----------------------------> Nested Loop (cost = 0.01..16429520.01 rows = 1,000,000,000 width = 12) (actual time = 0.257 ..0.485 rows = 12 loops = 1)
----------------------------------> Function Scan on generate_series wh (cost = 0.00..10.00 rows = 1000 width = 4) (actual time = 0.046..0.049 rows = 3 loops = 1)
----------------------------------> Materialize (cost = 0.01..28917.01 rows = 1,000,000 width = 8) (actual time = 0.070..0.139 rows = 4 loops = 3)
---------------------------------------> Nested Loop (cost = 0.01..20010.01 rows = 1000000 width = 8) (actual time = 0.173..0.366 rows = 4 loops = 1)
-------------------------------------------> Function Scan on generate_series op ( cost = 0.00..10.00 rows = 1000 width = 4) (actual time = 0.076..0.079 rows = 2 loops = 1)
---------------------------------------------> Function Scan on generate_series org (cost = 0.00..10.00 rows = 1000 width = 4) (actual time = 0.043..0.047 rows = 2 loops = 2)
Planning time: 0.439 ms
Execution time: 119692.051 ms
Dataset creation time - 22.339976 seconds
Insertion time - 1.620341333 minutes
I did not manage to evaluate the disk space occupied by the table using PostgreSQL functions. Not sure why, but showing 0. Perhaps this is the standard behavior for external tables. Used for this file manager. So, the volume of disk space occupied is 226.2 Mb. To evaluate a lot or a little, let's compare it with a regular table.
explain (analyze) create table rbstore_table as select '2010-01-01'::date + make_interval(days => d) as trd , op , org , wh , it , 100 as m1 , 100 as m2 , 100 as m3 , 100 as m4 , 100 as m5 from generate_series(0, 1) as op cross join generate_series(1, 2) as org cross join generate_series(1, 3) as wh cross join generate_series(1, 4000) as it cross join generate_series(0, 1095) as d;
The implementation plan will be as follows
Nested Loop (cost = 0.01..22402714242540.01 rows = 1000000000000000 width = 44) (actual time = 0.585..23781.942 rows = 52608000 loops = 1)The time spent on the implementation of this plan does not interest us, because in real life, such inserts are not supposed. We are interested in how much disk space this table occupies. Having fulfilled the request for system functions, I received 3.75 GB.
---> Function Scan on generate_series d (cost = 0.00..10.00 rows = 1000 width = 4) (actual time = 0.091..2.130 rows = 1096 loops = 1)
---> Materialize (cost = 0.01..27284555030.01 rows = 1000000000000 width = 16) (actual time = 0.001..3.574 rows = 48000 loops = 1096)
----------> Nested Loop (cost = 0.01..17401742530.01 rows = 1000000000000 width = 16) (actual time = 0.489..14.044 rows = 48000 loops = 1)
----------------> Function Scan on generate_series it (cost = 0.00..10.00 rows = 1000 width = 4) (actual time = 0.477..1.352 rows = 4000 loops = 1 )
----------------> Materialize (cost = 0.01..26312333.01 rows = 1000000000 width = 12) (actual time = 0.000..0.001 rows = 12 loops = 4000)
----------------------> Nested Loop (cost = 0.01..16429520.01 rows = 1,000,000,000 width = 12) (actual time = 0.010..0.019 rows = 12 loops = 1)
----------------------------> Function Scan on generate_series wh (cost = 0.00..10.00 rows = 1000 width = 4) (actual time = 0.003..0.003 rows = 3 loops = 1)
----------------------------> Materialize (cost = 0.01..28917.01 rows = 1,000,000 width = 8) (actual time = 0.002. .0.004 rows = 4 loops = 3)
----------------------------------> Nested Loop (cost = 0.01..20010.01 rows = 1,000,000 width = 8 ) (actual time = 0.006..0.009 rows = 4 loops = 1)
----------------------------------------> Function Scan on generate_series op (cost = 0.00 ..10.00 rows = 1000 width = 4) (actual time = 0.002..0.002 rows = 2 loops = 1)
----------------------------------------> Function Scan on generate_series org (cost = 0.00 ..10.00 rows = 1000 width = 4) (actual time = 0.001..0.001 rows = 2 loops = 2)
Planning time: 0.569 ms
Execution time: 378883.989 ms
So, cstore_table - 226 MB, rbstore_table - 3.75 GB. The difference of 16.99 times is striking, but it is unlikely that the same difference can be obtained in production, primarily due to the distribution of data. As a rule, this difference will be less and will be about 5 times.
But wait a minute, no one uses raw data in a row-based format for analysis purposes. For example, they try to use indexed data for reporting. And since “Raw” data will always be, you need to compare the sizes with the sizes of the indices. Let's create at least one index. Let it be an index on the date field and type of operation - trd + op.
So, I indexed only two fields, and the index took 1583 MB, which is much more than the cstore_table. But, as a rule, OLAP load requires far from one index. It will be appropriate to note here that the cstore_table does not need additional indexing. This table acts as an index covering any queries.
From all of the above, a simple conclusion can be made - using columnstore tables, you can reduce the amount of disk space used.
Query Performance
To evaluate performance, let's run a query that returns summary data for a specific month for a specific type of operation.
explain (analyze, costs, buffers) select sum(m1) from cstore_table where trd = '2011-01-01' and op = 1;
The implementation plan will be as follows
Aggregate (cost = 793602.69..793602.70 rows = 1 width = 32) (actual time = 79.708..79.708 rows = 1 loops = 1)AND
--Buffers: shared hit = 44226
---> Foreign Scan on cstore_table (cost = 0.00..793544.70 rows = 23197 width = 5) (actual time = 23.209..76.628 rows = 24000 loops = 1)
-------- Filter: ((trd = '2011-01-01' :: date) AND (op = 1))
-------- Rows Removed by Filter: 26000
-------- CStore File: / var / lib / pgsql / 10 / data / cstore_fdw / 14028/16417
-------- CStore File Size: 120818897
-------- Buffers: shared hit = 44226
Planning time: 0.165 ms
Execution time: 79.887 ms
explain (analyze, costs, buffers) select sum(m1) from rbstore_table where trd = '2011-01-01' and op = 1;
The implementation plan will be as follows
Aggregate (cost = 40053.80..40053.81 rows = 1 width = 8) (actual time = 389.183..389.183 rows = 1 loops = 1)389.219 ms vs 79.887 ms. Here we see that even on a relatively small amount of columnstore data, a table is significantly faster than an index on a row-based table.
--Buffers: shared read = 545
---> Index Scan using trd_op_ix on rbstore_table (cost = 0.56..39996.70 rows = 22841 width = 4) (actual time = 55.955..385.283 rows = 24000 loops = 1)
-------- Index Cond: ((trd = '2011-01-01 00:00:00' :: timestamp without time zone) AND (op = 1))
-------- Buffers: shared read = 545
Planning time: 112.175 ms
Execution time: 389.219 ms
Let's change the request and try to get the unit for the whole of 2011.
explain (analyze, costs, buffers) select sum(m1) from cstore_table where trd between '2011-01-01' and '2011-12-31' and op = 1;
The implementation plan will be as follows
Aggregate (cost = 946625.58..946625.59 rows = 1 width = 32) (actual time = 3123.604..3123.604 rows = 1 loops = 1)AND
--Buffers: shared hit = 44226
---> Foreign Scan on cstore_table (cost = 0.00..925064.70 rows = 8624349 width = 5) (actual time = 21.728..2100.665 rows = 8760000 loops = 1)
-------- Filter: ((trd> = '2011-01-01' :: date) AND (trd <= '2011-12-31' :: date) AND (op = 1))
-------- Rows Removed by Filter: 8760000
-------- CStore File: / var / lib / pgsql / 10 / data / cstore_fdw / 14028/16411
-------- CStore File Size: 120818897
-------- Buffers: shared hit = 44226
Planning time: 0.212 ms
Execution time: 3123.960 ms
explain (analyze, costs, buffers) select sum(m1) from rbstore_table where trd between '2011-01-01' and '2011-12-31' and op = 1;
The implementation plan will be as follows
Finalize Aggregate (cost = 885214.33..885214.34 rows = 1 width = 8) (actual time = 98512.560..98512.560 rows = 1 loops = 1)98523.278 ms vs 3123.960 ms. Perhaps a partial index would help us, but it’s better not to risk it and make a suitable row_based structure in which ready-made values will be stored.
--Buffers: shared hit = 2565 read = 489099
---> Gather (cost = 885214.12..885214.33 rows = 2 width = 8) (actual time = 98427.034..98523.194 rows = 3 loops = 1)
-------- Workers Planned: 2
-------- Workers Launched: 2
-------- Buffers: shared hit = 2565 read = 489099
---------> Partial Aggregate (cost = 884214.12..884214.13 rows = 1 width = 8) (actual time = 97907.608..97907.608 rows = 1 loops = 3)
-------------- Buffers: shared hit = 2565 read = 489099
---------------> Parallel Seq Scan on rbstore_table (cost = 0.00..875264.00 rows = 3580047 width = 4) (actual time = 40820.004..97405.250 rows = 2920000 loops = 3)
--------------------- Filter: ((trd> = '2011-01-01 00:00:00' :: timestamp without time zone) AND (trd <= '2011-12-31 00:00:00' :: timestamp without time zone) AND (op = 1))
-------------------- Rows Removed by Filter: 14616000
-------------------- Buffers: shared hit = 2565 read = 489099
Planning time: 7.899 ms
Execution time: 98523.278 ms
Manual Aggregates
A suitable structure for manual aggregates may be a regular row_based table containing precomputed values. For example, it may contain a record related to 2011 with the operation type equal to 1, while in the fields m1, m2, m3, m4 and m5 the aggregated value will be stored precisely for these analytical sections. Thus, having a sufficient set of aggregates and indices, analytical queries acquire unprecedented performance. Interestingly, Microsoft SQL Server Analysis Services has a special wizard that allows you to configure the number and depth of pre-calculated values.
This solution has the following advantages:
- Real time analytics.
Please do not confuse the term “real-time analytics”. Here we are talking about the fact that the increment of the unit occurs over an acceptable period of time in the vast majority of cases.
In fact, this plus is controversial, but let's not talk about it. The fact remains. The architecture of the solution is such that the units remain “fresh” almost always. - Complete independence from data volume.
This is a very serious plus. It doesn’t matter how much data gets processed, sooner or later they will be processed, and aggregates received. - Relative complexity.
To get real-time analytics and data volume independence, the solution must use advanced technologies, such as multithreading and manual lock management at the DBMS level. - Difficulty testing.
Here we are talking about both unit testing and manual testing. I think the reader should not explain that identifying multithreading errors is not an easy task. - Increased disk space requirements.
The actual use of columnstore
Here we must again plunge into the theory and analyze the question of what is analytical data in more detail.
Take the average head of the enterprise. As a rule, he / she is concerned about two global questions: “How are things going at the moment?” And “What has changed lately?”.
To answer the question "How are things going at the moment" we absolutely do not need historical data. Those. no matter how things went a month ago.
In order to keep a finger on the pulse, the question is often asked. This type of data analysis is called operational.
To answer the question “What has changed lately”, we need historical data. Moreover, as a rule, the analysis is performed on the same time intervals. For example, a month is compared to a month, year to year, etc. Of course, the system should not limit the user from the ability to compare arbitrary periods, but such a case must be recognized as rare, because comparing a closed year with a half not closed makes little sense. A distinctive feature of comparative analysis is that it is not required as often as operational. We will call this type of analysis historical.
Obviously, operational analysis should occur quickly. Accordingly, it places high demands on performance. While for historical analysis, such requirements can not be put forward. Although the performance of historical analysis should remain at a very high level. At least so that the analysis system itself remains competitive.
So, in accordance with two types of analysis, we can distinguish two types of analytical data: operational data and historical. From the side of the user it should not be noticeable with what particular data he is working at the moment.
It is from these considerations that in database servers the possibility of breaking tables into separate sections has appeared.
With regard to columnstore, it is possible to mix sections in row-based and columnstore formats. It is known that operational analysis data is subject to frequent changes, which prevents their storage in columnstore format. And given the fact that operational data does not happen too much, they can be stored in row-based format.
Historical data does not change. There is a lot of this data, and therefore the columnstore format suits them better. Recall that the performance of bold queries on a columnstore source is higher than on a row-based source.
Let's look at an example of all of the above.
Below I create the main warehouse table and attach the sections of operational and historical analysis to it.
create table warehouse ( trd date, org int, op int, it int, wh int, m1 numeric(32, 2), m2 numeric(32, 2), m3 numeric(32, 2), m4 numeric(32, 2), m5 numeric(32, 2) ) partition by range(trd); create foreign table historycal_data ( trd date, org int, op int, it int, wh int, m1 numeric(32, 2), m2 numeric(32, 2), m3 numeric(32, 2), m4 numeric(32, 2), m5 numeric(32, 2) ) server cstore_server options(compression 'pglz'); insert into historycal_data select '2010-01-01'::date + make_interval(days => d) as trd , op , org , wh , it , 100 as m1 , 100 as m2 , 100 as m3 , 100 as m4 , 100 as m5 from generate_series(0, 1) as op cross join generate_series(1, 2) as org cross join generate_series(1, 3) as wh cross join generate_series(1, 4000) as it cross join generate_series(0, (1095 - 31)) as d; analyze historycal_data; create table operational_data as select ('2012-12-01'::date + make_interval(days => d))::date as trd , op , org , wh , it , 100::numeric(32, 2) as m1 , 100::numeric(32, 2) as m2 , 100::numeric(32, 2) as m3 , 100::numeric(32, 2) as m4 , 100::numeric(32, 2) as m5 from generate_series(0, 1) as op cross join generate_series(1, 2) as org cross join generate_series(1, 3) as wh cross join generate_series(1, 4000) as it cross join generate_series(0, 30) as d; create index trd_op_ix on operational_data (trd, op); analyze operational_data; alter table warehouse attach partition operational_data for values from ('2012-12-01') to ('2112-01-01'); alter table warehouse attach partition historycal_data for values from ('2010-01-01') to ('2012-12-01');
All is ready. Let's try to order a couple of reports. Let's start by ordering data for one day of the current month.
explain (analyze, costs, buffers) select sum(m1) from warehouse where trd = '2012-12-01' and op = 1;
Aggregate (cost = 15203.37..15203.38 rows = 1 width = 32) (actual time = 17.320..17.320 rows = 1 loops = 1)Now we will order data for the whole of 2012, in which the number of transactions is 8,784,000.
--Buffers: shared hit = 3 read = 515
---> Append (cost = 532.59..15140.89 rows = 24991 width = 5) (actual time = 1.924..13.838 rows = 24000 loops = 1)
------- Buffers: shared hit = 3 read = 515
---------> Bitmap Heap Scan on operational_data (cost = 532.59..15140.89 rows = 24991 width = 5) (actual time = 1.924..11.992 rows = 24000 loops = 1)
--------------- Recheck Cond: ((trd = '2012-12-01' :: date) AND (op = 1))
--------------- Heap Blocks: exact = 449
--------------- Buffers: shared hit = 3 read = 515
----------------> Bitmap Index Scan on trd_op_ix (cost = 0.00..526.34 rows = 24991 width = 0) (actual time = 1.877..1.877 rows = 24000 loops = 1 )
--------------------- Index Cond: ((trd = '2012-12-01' :: date) AND (op = 1))
--------------------- Buffers: shared hit = 2 read = 67
Planning time: 0.388 ms
Execution time: 100.941 ms
explain (analyze, costs, buffers) select sum(m1) from warehouse where trd between '2012-01-01' and '2012-12-31' and op = 1;
Aggregate (cost = 960685.82..960685.83 rows = 1 width = 32) (actual time = 4124.681..4124.681 rows = 1 loops = 1)In the end, let's see what happens if the user wants, for example, without malicious intent, to order a report on all the transactions in the system, of which there are 52 608 000.
--Buffers: shared hit = 45591 read = 11282
---> Append (cost = 0.00..938846.60 rows = 8735687 width = 5) (actual time = 66.581..3036.394 rows = 8784000 loops = 1)
--------- Buffers: shared hit = 45591 read = 11282
----------> Foreign Scan on historycal_data (cost = 0.00..898899.60 rows = 7994117 width = 5) (actual time = 66.579..2193.801 rows = 8040000 loops = 1)
--------------- Filter: ((trd> = '2012-01-01' :: date) AND (trd <= '2012-12-31' :: date) AND (op = 1))
--------------- Rows Removed by Filter: 8040000
--------------- CStore File: / var / lib / pgsql / 10 / data / cstore_fdw / 14028/16448
--------------- CStore File Size: 117401470
--------------- Buffers: shared hit = 42966
----------> Seq Scan on operational_data (cost = 0.00..39947.00 rows = 741570 width = 5) (actual time = 0.019..284.824 rows = 744000 loops = 1)
--------------- Filter: ((trd> = '2012-01-01' :: date) AND (trd <= '2012-12-31' :: date) AND (op = 1))
--------------- Rows Removed by Filter: 744000
--------------- Buffers: shared hit = 2625 read = 11282
Planning time: 0.256 ms
Execution time: 4125.239 ms
explain (analyze, costs, buffers) select sum(m1) from warehouse
Aggregate (cost = 672940.20..672940.21 rows = 1 width = 32) (actual time = 15907.886..15907.886 rows = 1 loops = 1)Please note that I am still writing my article, as if nothing had happened. I didn’t even have to reboot my not-so-powerful laptop with HDD and 4 GB RAM. Although the issue of resource consumption requires more careful study.
--Buffers: shared hit = 17075 read = 11154
---> Append (cost = 0.00..541420.20 rows = 52608000 width = 5) (actual time = 0.192..9115.144 rows = 52608000 loops = 1)
--------- Buffers: shared hit = 17075 read = 11154
----------> Foreign Scan on historycal_data (cost = 0.00..512633.20 rows = 51120000 width = 5) (actual time = 0.191..5376.449 rows = 51120000 loops = 1)
--------------- CStore File: / var / lib / pgsql / 10 / data / cstore_fdw / 14028/16448
--------------- CStore File Size: 117401470
--------------- Buffers: shared hit = 14322
----------> Seq Scan on operational_data (cost = 0.00..28787.00 rows = 1488000 width = 5) (actual time = 0.032..246.978 rows = 1488000 loops = 1)
--------------- Buffers: shared hit = 2753 read = 11154
Planning time: 0.157 ms
Execution time: 15908.096 ms
fault tolerance
In part, fault tolerance was tested right at the time of this writing. My laptop is alive, and, in general, I did not notice any slowdowns in its work, besides the usual ones.
Let the reader forgive me for the fact that I did not have time to work out the issue of fault tolerance in detail, but I can say that the extension in question has fault tolerance - backup is possible.
Ease of implementation
As it turned out, when creating a table that stores data in a columnstore format, there are no options other than a compression algorithm. Compression itself is absolutely necessary.
The format itself has a certain structure. By setting the appropriate parameters, you can achieve a certain acceleration of analytical queries, or adjust the degree of information compression.
As demonstrated above, creating a columnstore table is not a hassle at all. The extension can work with 40 PostgreSQL data types. Webinars talked about all the types supported by PostgreSQL.
What new skills should a developer have to work with new structures
The SQL developer does not need any special skills for writing queries to columnstore tables. Such a table is visible in all queries, like a regular row-based table. Although this does not exclude the need for query optimization.
Conclusion
In this article, I showed how a table with a columnstore storage format can be useful. This saves disk space and high performance analytical queries. The ease of working with the table automatically reduces the cost of creating a full-fledged analytical data warehouse, because its use does not require the development of complex, difficult to debug algorithms. Testing is simplified.
Despite the fact that the experiments posed above inspire optimism, many questions remained undeveloped. For example, which query plan will be generated when the columnstore table joins other tables. I hope to continue this work in the next part. How many parts will depend on how cstore_fdw behaves on more or less real data.
Links to additional materials
Short review cstore_fdw
cstore_fdw on github
Roadmap cstore_fdw