Data Mesh: how to work with data without a monolith

Hello, Habr! We at Dodo Pizza Engineering really love the data (and who doesn’t like it now?). Now there will be a story about how to accumulate all the data of the Dodo Pizza world and give any company employee convenient access to this data array. The task under the asterisk: to save the nerves of the Data Engineering team.









Like real Plyushkins, we save all kinds of information about the work of our pizzerias:









Several teams are currently responsible for working with data at Dodo Pizza, one of them is the Data Engineering team. Now they (that is, us) have a task: to give any employee of the company convenient access to this data array.







When we began to think about how to do this and started discussing the task, we found a very interesting approach to data management - Data Mesh (you will find a huge chic article here). Her ideas fell very well on our idea of ​​how we want to build our system. The rest of the article will be our rethinking of the approach and how we see it being implemented in Dodo Pizza Engineering.







What do we mean by "data"



To get started, let's decide what we mean by the data in Dodo Pizza Engineering:









In order for Dodo Pizza's business to use and rely on this data, it is important that the following conditions are met:









Given all these requirements, we came to the conclusion that the data in Dodo is a product. Same as public service API. Accordingly, the same team that owns the service should own the data. Also, data schema changes must always be backward compatible.







Traditional Approach - Data Lake



To solve the problems of reliable storage and processing of big data, there is a traditional approach adopted by many companies that work with such a pool of information - Data Lake. As part of this approach, data engineers collect information from all components of the system and put them into one large storage (this can be, for example, Hadoop, Azure Kusto, Apache Cassandra or even a MySQL replica, if the data gets into it).







Further, these same engineers write queries for such a repository. Implementing this approach at Dodo Pizza Engineering implies that the Data Engineering team will own the data schema in the analytic repository.







With this scenario, the team becomes very sad cats and that's why:









It turns out that the team is at the intersection of a huge number of needs and is unlikely to be able to satisfy them. At the same time it will be in constant time pressure and stress. We really do not want this. Therefore, you have to think how to solve these problems and at the same time get the opportunity to analyze the data.


Flowing from Data Lake to Data Mesh



Fortunately, not only we asked ourselves this question. In fact, a similar problem has already been solved in the industry (hallelujah!). Only in another area: application deployment. Yes, I'm talking about the DevOps approach, where the team determines how to deploy the product that they create.







A similar approach to problem solving Data Lake was proposed by Zhamak Dehghani, ThoughtWorks consultant. Watching Netflix and Spotify solve such problems, she wrote an amazing article on How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh (the link to it was at the beginning of the article). The main ideas that we took out of it for ourselves:









Meanwhile, the Data Engineering team ...



If you imagine that all this is realized at the click of a finger, then it remains to answer two questions:







What will the Data Engineering team do now? Dodo Pizza Engineering already has a / SRE platform team. Its task is to give developers tools for easy deployment of services. The Data Engineering team will perform the same role for data only.







Turning operational data into analytic data is a complex process. Making analytics available to the entire company is even more difficult. It is the solution to these problems that the Data Engineering team will deal with.







We are going to provide Feature Team with a convenient set of tools and practices with which they can publish data from their service to the rest of the company. We will also be responsible for the general infrastructure parts of the data pipeline (queues, reliable storage, clusters for performing transformations on data).







How will Data Engineer skills appear inside Feature Team? Feature Team is getting harder. Of course, we could try to hire one Data Engineer in each of our teams. But it is so hard. Finding a person with a good background in data processing and convincing him to work inside a grocery team is hard.







The great advantage of Dodo is that we love internal learning. So now our plan is this: the Data Engineering team begins to publish the data of some services, cries, pricks, but continues to eat a cactus. As soon as we understand that we have a ready-made process for publication, we begin to talk about it in the Feature Team.







We have several ways to do this:







  1. DevForum , in which we will tell you what the process we created looks like, what tools are there and how to use them most effectively.
  2. Speaking at DevForum will help us gather feedback from product developers. After that, we will be able to join product teams and help them solve problems with the publication of data, organize training for teams.


Data consumption



Now I talked a lot about publishing data. But there is also consumption. What about this issue?







We have a wonderful BI team that writes very complex reports for a management company. Inside Dodo IS, there are many reports for our partners that help them manage pizzerias. In our new model, we think of them as data consumers who have their own data domains. And it is consumers who will be responsible for their own domains. Sometimes a consumer domain can be described with a single request to the analytic repository - and this is good. But we understand that this will not always work. That is why we want the platform that we will create for product teams to be also used by data consumers (in the case of reports inside Dodo IS, these will be just teams).







This is how we see working with data in Dodo Pizza Engineering. We are pleased to read your thoughts on this in the comments.








All Articles