Model of football transfers: digging deeper

It is time to continue the previous article about the study of football transfers.







This time you will find out why Klopp Simeone is so cool who to take into agents if you are a footballer and why read all this if you are not interested in football.













We learned to collect a lot of data.







Look around you, for example, at work, I’m sure you will find this or that plate with, like, the necessary data left for later, “when we understand what to do with it”. This is partly because the dependencies within them are highly non-linear and non-intuitive. And I just want to understand which of these data to pay the most attention to, which most of all affected the result. Further, I will demonstrate by example how to use one of the simplest algorithms that allows you to do this.







You can find the playback scheme itself, as well as full results in this laptop , and below there will be many graphs with the most interesting of them.







But first, a little about the method.







Imagine that you are not working with football transfers, but you have data on sales of store goods (well, or a slightly less mercantile example - with the results of matches in your MOBA-like game) and a lot of background information: about the store, goods, sellers ( well, or about the selected heroes, players, their strength), etc. Then you want to achieve a very specific result - to increase sales of a certain product (or improve the balance of your game).







In any case, the plan is simple:







  1. understand which of the parameters most strongly affect the final result (the number of goods sold or the percentage of victories of a certain character in the game) and how
  2. understand which of these options you can really influence
  3. focus on what is important (p. 1) and what can be changed (p. 2)


Everything is simple, it remains only to deal with paragraph 1.







In fact, this task is far from new and it is quite easy to solve without using any neural networks, using only the good old statistics ... If you have Magnet data. Or League of Legends. But there is a big chance that not: that you are a network of a couple of shops, a moderately popular game, and you simply do not have much data that you can use. Yes, LoL collects the results of millions of matches, and there will be so many combinations of parameters that you can compare how the choice of his partners influences the victory of a given hero, all other things being practically equal. With a lack of data, apples have to be compared with oranges - we simply don’t have enough isolated cases.







To simplify the situation, in order to understand how, for example, the map-matchup combination affects the probability of winning your game, ideally, you would have to have several thousand results in which all parameters except the ones that are interesting to us are the same. That is, for the same players with the same skill level to play the same hero on different maps with different opponents. This is difficult to achieve if you are not Riot Games .







But back to the transfers. Imagine that we want to investigate one parameter - the “football agent”, for example, in order to understand which company is best for a football player to become a client. It is clear that we must go to the agent who organizes the sale of the player most expensive . If we apply standard statistical methods, we will see that GestiFute company is the most successful at selling its customers, which is quite consistent with their reputation. But how can we separate the selling skills from the strength of the player himself? After all, it’s not so difficult to sell expensive if your clients are Deco, Danny, Pepe, Diego Costa and a humble guy named Cristiano . Honestly, you can compare agencies only if they are all given the same set of players (both good and not so good) and forced to sell them to the same clubs. But such a multiverse is hard to imagine in real life . But we have a model that gives the result (transfer cost) for any data set. Yes, even if you force her to calculate the cost of transfer of Gogua from Tambov to CSKA, if it was carried out by Georges Mendes. And Glushakova in Akhmat, and Azar in Real and all the rest. Then do the same trick with all the other offices. Perfectly equal conditions. Then it remains only to calculate the price of the entire set of players for each of the agencies and here we have the answer to the question of how much brand membership, for example GestiFute, increases the price of a player .

That is, we analyzed how the target parameter depends on another previously set in isolation, and only it .







This is called Partial Dependency.







And so it is possible to analyze any parameter, which we will do now.







Age



The first thing I set the algorithm for was the age of the footballer and got this picture













She, it must be said, plunged me into sadness. Because it is absolutely not consistent with my intuition. Something's broken here. I well know that the maximum cost of players reaches a peak by about 25-27 years, that players at 17 years old are definitely not the most expensive. As I was convinced by building a simple dependence from the source data, without any model.













Yes, that’s right, that’s how the cost of players behaves depending on age, a sweeping hump.







But after thinking a bit about what the model painted for me, I realized that she was right. This was the moment when I first believed that it really works, that there is something in it. The model gave me, at first glance, an unexpected, but paradoxically correct result, and allowed me to look at the object of study a little from the other side.







What do we see on the first chart and why are we used to the second?







The transfer cost, in fact, is very difficult to correlate with the age of the player using only raw data. After all, why does a player become more expensive by the age of 25? Is it because he is aging? No, he just plays more matches, scores more, starts playing for the national team, becomes more popular among fans in the end, etc., all this, of course, goes as if complete with age and gaining experience, but all this we have separate parameters. And on the first graph we see dependence only on age, in an ideal scenario, when everything else is the same. Of course, in this case, a player who scored 20 goals last season for his club regularly plays for the national team, at the age of 19 it will cost more than the exact same player with the same indicators, playing in the same place, but at 25!







It is also interesting here that the slope of the curve to 25 years is quite gentle, whereas after this age there is simply a collapse. It will be interesting to think about why there is such a striking difference?







Season



I propose to compare the "distilled" growth in the value of players in the seasons













One can clearly see how having survived the overheating of the 90s market and the financial crisis of the beginning of the 10s, the chart confidently enters the almost exponential growth curve.







But the same graph is built solely on data. Notice how less pronounced the growth in the value of players in recent years













Related Parameters



Agency, age and, to a lesser extent, season are examples of fairly rare independent categories of parameters. In the end, you can easily imagine how a player moves from one agency to another and this will have little effect on other parameters. But let's say we want to understand which club-buyer has to pay the most for the players. You can take one parameter to_club_name



and calculate the result. But here it is already difficult for us not to take into account the related values ​​of to_clb_lg_name



, to_clb_lg_country



, to_clb_lg_group



, which show which league this club plays in. Yes, we can separate them and find out how much more expensive Man Utd buys players solely because of the strength of his brand, and how much the "English margin" makes separately, but most often we are interested in the combined result. In the end, Manchester does not plan to move anywhere from England, so we will immediately investigate a group of parameters.







What are the numbers in the graphs?

The number after the “column” is the coefficient of how much this parameter increases the transfer price relative to the average







The number inside the "column", as well as its hue, indicate the number of transfers with this parameter







Club Buyer









The fact that out of the 20 most generous buyers of 18 English clubs and one royal club from the city of Madrid does not really surprise me, but the third (!) Place of Makhachkala Anji once again shows that he deserves a prize in the nomination " Party Like A Russian ".

By the way, only one club from this list no longer exists.













If the antitope was higher, then here we have clubs whose brand, on the contrary, allows you to buy the same players cheaper than the market. Captured by the Belgians !







Where are the Portuguese? - you ask. Soon everything will be, - I will answer you and myself.







Club seller









Clubs that sell the most. Pleases the second place of the Miner (well-deserved respect); the presence of most of the tops of Brazil and Argentina is clear; Seville and finally Benfica, with the highest number of sales.

But the most interesting, of course, will be a closer look at the Atalanta. Who has she sold so well in the last 10 years? So also in the amount of under fifty pieces? Let me remind you that the data for 2008-2018 is far from today's Atalanta with faded Gasperini and the Champions League!













As outsiders, the Dutch, it was just about them in those days that it was there that you could buy quality players the cheapest. And, suddenly, Zenith with Wolfsburg ...







Club Performance



Since we have data on those who sells the most expensive and buy the cheapest, we cannot help but see who has this difference, in relative terms, the greatest.













Terribly interesting picture. Belgians, Argentines with Brazilians, who clearly deserve a closer look at Besiktas with Alkmaar, and finally Benfica and Porto with the most deals.







Of particular note is Anderlecht - top 10 in the list and more than 100 transfers.













The antitope, this time in terms of "efficiency", is again captured by the British with a small splash of Barcelona. 0.5 at Manchester United is just scary.







Here it is necessary to say that any data shows only what they show. In this case, this is the "extra charge for the club" upon purchase divided by the "extra charge for the club" upon sale.

No wonder I took the "efficiency" in quotation marks. The best clubs in the world could not refuse the top of “efficiency”, their task was not to sell players as expensive as possible, but to take the best career years from them, squeeze the maximum, turn them into a club result. An excellent player in a top club can only go for a replacement for years, but this will reduce its price, but if it is necessary for the club, he will do it. If the top club sells the player to the middle peasant (and this affects the schedule more since switching from top to top almost does not change the overall balance), then most often this means that it has not passed the test (well, or it usually seems like that), what to sell again it will be more difficult to get it to another top club (namely, they inflate the price of a player).







That is, in terms of "efficiency" in this graph, if it makes sense to compare, then only clubs of the same category (top clubs, donors, Belgian clubs :), etc.)













Like for example here - the "effectiveness" of Russian clubs. CSKA leadership is very expected. Spartak surprised until I remembered that they still sold their players well. For example, they managed to sell the Cavenags who had failed in Russia for almost the same price they bought.







About Zenit and Anji, we have already said.







League









And here is the "secret" of the leadership of the British in antitopes (and the Belgians in the tops). English margin in all its disgrace. It is more than the sum of the margins of all other top6 countries combined.







Buyer Trainer









Well, what can I say ... With top coaches as with top clubs - you invite them to give results, rather than increase the value of players. On the contrary, they will have to overpay for them, buying the players they need. It is amazing to see Mancini only at the end of this list, although with a large number of players. It is no less surprising that in the top there is Jardim with Pochettino. But we will consider this the assignment of the title "top coach" to them.







Sales Coach









Zhardim, Lucescu and Pochettino buy expensive, but also sell expensive. It is not surprising that each of them is best known for working at the indicated time with clubs (Monaco, Shakhtar and Tottenham) of the “respect for” status. Simeone - boch: the game system, the withdrawal of Atletico to the leaders, an incredible increase in the value of players.







In one line



The biggest margin when buying from a club-trainer pair: 1.Gus Hiddink, in Anji; 2. Sir Alex Ferguson, it is clear where ; 3. Louis van Gaal, it is clear where United .







The cheapest of all, from a significant number of transfers, was bought by Chenol Gunesh from Besiktas, again it’s worth a closer look.







A club-player couple, but in sales : 1. Mircea Lucescu, Shakhtar; 2.Diego Simene, Atlético; 3.Leonardo Jardim, Monaco.







Only second place Diego here, paradoxically , suggests that before Atlético he even sold players better.







Agent



And finally, the top agent effectiveness













We already talked about Gestifute. But now you can issue a verdict: yes, they are the best .







They are good on their own. They not only have Cristiano Ronaldo and other great players, they also know how to sell them, in fact, the best in the world.







The first 2 agencies are also worth looking at especially carefully, although they work almost exclusively on the Italian market.







Undisclosed agents , apparently, do not reveal themselves for a reason - they sell expensively, no matter what.







Further evidence that Mino Raiola is not a joke to Georges Mendes, just a little above the market. PR is doing its job.







Well, if you are a young Russian football player, then you are already doing well Feel free to choose an agency from the top of this list, you will not lose.







We will do everything, there are many more results, but I have selected the most interesting ones, in my opinion.

A closer look at these and other graphs, along with full tables, is possible in a laptop . And here I conducted even more experiments on this data.







And most importantly, I want to remind you that this method works with almost any set of tabular data. First, you determine which parameters most strongly affect the result (Feature Importance), and then which parameter values ​​(Partial Dependency) you need to achieve in order to maximize the selected function using the method that was described in this article.







I wish you good luck in the experiments, it will be interesting to know what happened with you :)








All Articles