In the article, the translation of which is proposed below, Robert Martin seems to start with thoughts very similar to those that can be seen in Yegor Bugaenko’s discussions about ORM, but others draw conclusions. Personally, Yegor’s approach impresses me, but I think that Martin reveals the topic in more detail. It seems to me that everyone who has ever thought about what place ORM should occupy and generally why objects are needed for which all fields are open should get his opinion. The article is written in the "Dialogue" genre, where a more experienced programmer discusses a problem with someone who has less experience.
What is a class?
A class is a specification of many similar objects.
What is an object?
An object is a set of functions that perform actions with encapsulated data.
Or is it better to say that an object is a set of functions that perform actions with data whose existence is implied
In the sense of "implied"?
Once the object has functions, it can be assumed that there is also data there, but there is no direct access to the data and they are not visible at all from the outside.
Isn't the data in the object?
Maybe they are, but the rule that says that they must be there is not. From the user's point of view, an object is nothing more than a set of functions. The data with which these functions work must exist, but the position of this data is unknown to the user.
Let us suppose.
Well, what is a data structure?
A data structure is a collection of related items.
Or, in other words, a data structure is a set of elements with which functions work, the existence of which is implied implicitly.
OK OK. I understand. The functions that work with data structures are not defined inside these structures, but from the very existence of a data structure we can conclude that there must be something that works with them.
Correctly. And what about these two definitions?
They are in a sense opposites of each other.
Really. They complement each other. Like a hand and a glove.
- An object is a set of functions that work with data elements whose existence is implied implicitly.
- A data structure is a set of data elements with which functions work, the existence of which is implied implicitly.
Wow! So it turns out that objects and data structures are not the same thing!
Right. Data structures are DTOs.
And tables in databases are not objects either, right?
True again. Databases contain data structures, not objects.
Wait a minute. Doesn't ORM map tables from the database to objects?
Of course not. You cannot map database tables to objects. Tables in the database are data structures, not objects.
Then what do ORM do?
They transfer data from one structure to another.
So they have nothing to do with Objects?
Nothing at all. Strictly speaking, such a thing as ORM in the sense of technology that maps relational data to objects does not exist, because tables cannot be mapped from a database to objects.
But they told me that ORMs collect business objects.
No, ORM retrieve data from a database that business objects work with
But do not these data structures fall into business objects?
Maybe they get, or maybe not. ORM doesn't know anything about this.
But the difference is purely semantic.
Well no. There are far-reaching consequences.
For example?
For example, the design of the database schema and the design of business objects. Business objects define business behavior. The database schema defines the business data structure. These structures are limited by very different forces. A business data structure is not necessarily the best structure for business behavior.
Eeee. This is incomprehensible.
Think about it that way. The data scheme is not designed for one single application, it is intended for use throughout the enterprise. Therefore, the data structure is a compromise between several different applications.
It's clear.
Good. Now think about every single application. The object model of each application describes how the behavior of the application is structured. Each application will have its own object model to better match the behavior of the application.
Aah, I see. Since the data scheme is a compromise between different applications, the scheme will not fall on the object model of each individual application.
Correctly! Objects and Structures are limited to different things. They very rarely fit together. People call this the mismatch of object-relational impedance.
Something I remember. But it seems that the impedance mismatch was just corrected using ORM.
And now you know that this is not so. The impedance mismatch between objects and data structures is complementary, not isomorphic.
What?
They are opposites, not something similar.
Opposites?
Yes, in a very interesting sense. You see, objects and data structures imply diametrically opposed control structures.
What?
Think of a set of classes that implement some kind of common interface. For example, imagine classes that represent two-dimensional figures, in which there are functions for calculating the area and perimeter of a figure.
How much do shapes shove code with objects in all examples?
Let's look at two different types of shapes: Squares and Circles. It is clear that the functions for calculating the area and perimeter of these classes use different data structures. It is also understood that these operations are invoked using dynamic polymorphism.
Slow down please, nothing is clear.
There are two different functions for calculating the area, one for the Square, the other for the Circle. When a function is called to calculate the area of a particular object, it is this object that decides which particular function to call. This is called dynamic polymorphism.
Okay. Of course. An object knows how its methods are implemented. Naturally.
Now let's turn these objects into data structures. We use Discriminated Unions.
Discriminated what?
Discriminated Unions. Well, C ++, pointers, the union keyword, a flag to determine the type of structure, Discriminated Unions. In our case, these are just two different data structures. One for the Square and one for the Circle. The Circle has a center point and a radius. And a type code from which it can be understood that it is a Circle.
The field with the code will be enum?
Well yes. And the Square will have the upper left point and the length of the side. And also enum for type indication.
Okay. There will be two structures with a type code.
Correctly. Now let's look at the function for the area. There will probably be a switch, right?
Well. Of course, for two classes. Branch for the Square and for the Circle. And for the perimeter, you also need a similar switch.
And again, right. Now think about these two scenarios. In the scenario with objects, two implementations of functions for the area are independent of each other and belong (in a sense) directly to the type. The function for the area of the Square belongs to the Square, and the function for determining the area of the Circle belongs to the Circle.
Okay, I understand what you're leading to. In a scenario with data structures, both implementations of a function for an area are in the same function, they do not "belong" (whatever that word means) to the type.
Further better. In the case of objects, if you need to add the Triangle type, what code should be changed?
Do not change anything at all. Just make a new Triangle class. Although no, you probably need to fix the code that creates the objects.
Correctly. So, when adding a new type, the changes are negligible. Now suppose that we need to add a new function - for example, a function to determine the center.
Then you have to add it to all three types, Circle, Square and Triangle.
Good. It turns out that adding new functions is difficult, because you have to make changes in each class.
But with data structures, everything is different. In order to add a Triangle, you have to change each function to add branches to handle the Triangle in each switch.
Correctly. It’s difficult to add types; you have to edit each function.
But in order to add a function for the center, nothing needs to be changed.
Yeah. Adding features is easy.
Wow. It turns out that these two approaches are directly opposite.
Definitely yes. To summarize
- It’s difficult to add new functions to classes, you have to make changes in each class
- Adding new functions to data structures is simple, you just need to add a function, nothing else needs to be changed
- Adding new types to classes is simple, you just need to add a new class
- It is difficult to add new types for structures; you need to fix each function
Yes. Opposites. Opposites in a curious sense. That is, if it is known in advance that new functions need to be added, it is convenient to use data structures. But if you know in advance that you have to add new types, then you need to use classes.
Good observation! But today we need to think about one more thing. There is another point in which data structures and classes are opposites of each other. Dependencies.
Addictions?
Yes, the direction of the dependencies in the source code.
Okay, I will ask. What is the difference?
Let's look at the case of structures. Each function contains a switch that selects the desired implementation based on the type code in the union.
Yes, that is right. So what?
Let's look at the function call for the area. The calling code depends on the function for the area, and the function for the area depends on each specific implementation.
And what do you mean when you say "depends"?
Imagine that each implementation of a function for an area is allocated to a separate function. That is, there will be circleArea, squareArea and triangleArea functions.
Well, it turns out that in the switch branches there will simply be calls to these functions.
Imagine that these functions are in different files.
Then in the file with the switch will be import or use or include for files with functions.
Right This is the dependency at the source code level. One source depends on another source. How is this dependence directed?
The source code with switch depends on the source code in which the implementations are located.
What about the code calling the function for the area?
The calling code depends on the code with switch, which depends on all implementations.
Right. In all sources, the arrow is directed in the direction of the call, from the calling code to the implementation. So if you want to make a tiny change in these implementations ...
Alright, alright, I see what you're getting at. A change in any of the implementations will entail recompilation of all files with switch, and this will lead to the fact that everything that calls this switch will be recompiled, for example, in our case, the function for the area.
Yes. At least it will be so for languages that use file modification dates in order to understand what needs to be rebuilt.
And these are generally all systems with static typing, right?
Yes, and some other systems without it
It is a lot to be rebuilt.
And a lot to redo it.
Okay, but is it the other way around with classes?
Yes, because the code that calls the function for the area depends on the interface, and the implementation also depends on this interface.
Clear. The code for the Square class will import or use or include a file with the Shape interface.
Right. The arrow in the implementation files points in the opposite direction to the call. It is directed from implementation code to call code. At least this will be the case for statically typed languages. For dynamically typed languages, the code that calls the function for the area does not depend on anything at all, because linking occurs in runtime.
Yeah, okay. That is, if you make changes to one of the implementations ...
It is only necessary to rebuild and re-install the code with these changes.
This is because dependencies are directed opposite to the direction of calls.
Yes, we call it dependency inversion.
Okay, let’s summarize it all. Classes and data structures are opposed to each other in three senses.
- The functions are in the classes explicitly, and you can only guess about the existence of data. Data structures are explicitly present in the data structures, and you can only guess what functions are available.
- In the case of classes, adding types is simple, but adding functions is difficult. In the case of structures, adding functions is easy, but adding types is difficult.
- Data structures lead to recompilation and redistribution of the calling code. Classes isolate the calling code and do not need to recompile and deploy it again.
Yes that's right. And this should be borne in mind by every designer and software architect.