Replacing EAV with JSONB in ​​PostgreSQL

TL; DR: JSONB can greatly simplify database schema development without sacrificing query performance.


Introduction



Let us give a classic example, probably, of one of the oldest use cases of relational databases (database): we have an entity, and it is necessary to preserve certain properties (attributes) of this entity. But not all instances may have the same set of properties, in addition in the future, the possible addition of more properties.



The easiest way to solve this problem is to create a column in the database table for each property value, and simply fill in those that are needed for a specific entity instance. Fine! The problem is resolved ... until your table contains millions of records and you do not need to add a new record.



Consider the EAV ( Entity-Attribute-Value ) pattern, it is quite common. One table contains entities (records), another table contains names of properties (attributes), and the third table associates entities with their attributes and contains the value of these attributes for the current entity. This gives you the opportunity to have different sets of properties for different objects, as well as add properties on the fly, without changing the structure of the database.



Nevertheless, I would not write this note if there were no shortcomings in the approach using EVA. So, for example, to obtain one or more entities that have 1 attribute each, 2 joins (joins) are required in the query: the first is a join with the attribute table, the second is the join with a value table. If an entity has 2 attributes, then 4 joins are already needed! In addition, all attributes are usually stored as strings, which leads to type casting for both the result and the WHERE clause. If you write a lot of queries, then this is rather wasteful in terms of resource use.



Despite these obvious flaws, EAV has long been used to solve these kinds of problems. These were inevitable flaws, and there was simply no better alternative.

But then a new “technology” appeared in PostgreSQL ...



Starting with PostgreSQL 9.4, a JSONB data type has been added for storing binary JSON data. Although storing JSON in this format usually takes a little more space and time than plain text JSON, operations with it are much faster. JSONB also supports indexing, which makes querying them even faster.



The JSONB data type allows us to replace the bulky EAV pattern by adding just one JSONB column to our entity table, which greatly simplifies database design. But many argue that this should be accompanied by a decrease in productivity ... That's why I appeared in this article.



Test database setup



For this comparison, I created a database on a new installation of PostgreSQL 9.5 on the $ 80 build of DigitalOcean Ubuntu 14.04. After setting some parameters in postgresql.conf, I ran this script using psql. The following tables were created to represent the data as EAV:



CREATE TABLE entity ( id SERIAL PRIMARY KEY, name TEXT, description TEXT ); CREATE TABLE entity_attribute ( id SERIAL PRIMARY KEY, name TEXT ); CREATE TABLE entity_attribute_value ( id SERIAL PRIMARY KEY, entity_id INT REFERENCES entity(id), entity_attribute_id INT REFERENCES entity_attribute(id), value TEXT );
      
      





Below is a table where the same data will be stored, but with attributes in the JSONB type column - properties .



 CREATE TABLE entity_jsonb ( id SERIAL PRIMARY KEY, name TEXT, description TEXT, properties JSONB );
      
      





It looks a lot easier, right? Then, 10 million records were added to the entity tables ( entity & entity_jsonb ), and accordingly, the same table data was filled using the EAV pattern and the approach with the JSONB column - entity_jsonb.properties . Thus, we received several different data types among the entire set of properties. Sample data:



 { id: 1 name: "Entity1" description: "Test entity no. 1" properties: { color: "red" lenght: 120 width: 3.1882420 hassomething: true country: "Belgium" } }
      
      





So, now we have the same data, for two options. Let's start comparing implementations at work!



Design simplification



It was said earlier that the design of the database was greatly simplified: one table, by using the JSONB column for properties, instead of using three tables for EAV. But how is this reflected in the requests? Updating one property of an entity is as follows:



 -- EAV UPDATE entity_attribute_value SET value = 'blue' WHERE entity_attribute_id = 1 AND entity_id = 120; -- JSONB UPDATE entity_jsonb SET properties = jsonb_set(properties, '{"color"}', '"blue"') WHERE id = 120;
      
      





As you can see, the last request does not look easier. To update the value of a property in a JSONB object, we must use the jsonb_set () function, and must pass our new value as a JSONB object. However, we do not need to know any identifier in advance. Looking at the EAV example, we need to know both entity_id and entity_attribute_id in order to update. If you want to update a property in a JSONB column based on the name of the object, then this is all done in one simple row.



Now let's choose the entity that we just updated, according to the condition of its new color:



 -- EAV SELECT e.name FROM entity e INNER JOIN entity_attribute_value eav ON e.id = eav.entity_id INNER JOIN entity_attribute ea ON eav.entity_attribute_id = ea.id WHERE ea.name = 'color' AND eav.value = 'blue'; -- JSONB SELECT name FROM entity_jsonb WHERE properties ->> 'color' = 'blue';
      
      





I think we can agree that the second is shorter (without join!), And therefore more readable. Here is the victory of JSONB! We use the JSON - >> operator to get the color as a text value from a JSONB object. There is also a second way to achieve the same result in the JSONB model using the @> operator:



 -- JSONB SELECT name FROM entity_jsonb WHERE properties @> '{"color": "blue"}';
      
      





This is a bit more complicated: we check to see if the JSON object in the properties column contains the object to the right of the @> operator. Less readable, more productive (see below).



Simplify the use of JSONB even more when you need to select multiple properties at once. This is where the JSONB approach really fits: we simply select properties as additional columns in our result set without the need for joins:



 -- JSONB SELECT name , properties ->> 'color' , properties ->> 'country' FROM entity_jsonb WHERE id = 120;
      
      





With EAV, you will need 2 joins for each property you want to request. In my opinion, the above queries show a great simplification in database design. See more examples of how to write JSONB requests, also in this post.

Now it's time to talk about performance.



Performance



To compare performance, I used EXPLAIN ANALYZE in the queries to calculate the runtime. Each request was executed at least three times because the first time the query planner takes longer. At first, I ran queries without any indexes. Obviously, this served as an advantage of JSONB, since the join required for EAV could not use indexes (foreign key fields were not indexed). After that, I created an index for 2 columns of foreign keys in the EAV table of values, as well as a GIN index for the JSONB column.



Data updates showed the following results in time (in ms). Note that the scale is logarithmic:



image



We see that JSONB is much (> 50,000-x) faster than EAV if you do not use indexes, for the reason indicated above. When we index the columns with primary keys, the difference almost disappears, but JSONB is still 1.3 times faster than EAV. Please note that the index in the JSONB column has no effect here, since we do not use the property column in the evaluation criteria.



To select data based on a property value, we get the following results (normal scale):



image



You may notice that JSONB is faster again than EAV without indexes, but when EAV is with indexes, it still works faster than JSONB. But then I saw that the time for JSONB requests was the same, this prompted me to the fact that the GIN index did not work. Apparently, when you use the GIN index for a column with filled properties, it only works when using the inclusion operator @>. I used this in a new test, which had a huge impact on time: only 0.153 ms! This is 15,000 times faster than EAV, and 25,000 times faster than operator - >>.



I think it was fast enough!



DB table size



Let's compare the table sizes for both approaches. In psql, we can show the size of all tables and indexes with the \ dti + command



image



For the EAV approach, the size of the tables is about 3068 MB, and the indexes are up to 3427 MB, which in total gives 6.43 GB. Using the JSONB approach, 1817 MB for the table and 318 MB for the indexes are used, which is 2.08 GB. It turns out 3 times less! This fact surprised me a bit because we store property names in every JSONB object.



But all the same, the numbers speak for themselves: in EAV we store 2 integer foreign keys for the attribute value, as a result of which we get 8 bytes of additional data. In addition, in EAV, all property values ​​are stored as text, while JSONB will use numerical and logical values ​​internally where possible, resulting in less volume.



Summary



In general, I think that storing entity properties in JSONB format can greatly simplify the design and maintenance of your database. If you execute a lot of queries, then everything that is stored in the same table with the entity will actually work more efficiently. And the fact that this simplifies the interaction between the data is already a plus, but the resulting database is 3 times smaller in volume.



Also, according to the test, we can conclude that the performance loss is very slight. In some cases, JSONB even works faster than EAV, which makes it even better. However, this benchmark, of course, does not cover all aspects (for example, entities with a very large number of properties, a significant increase in the number of properties of existing data, ...), therefore, if you have any suggestions on how to improve them, please Feel free to leave a comment!



All Articles