Habrastatistics: how Habr lives without geektimes

Hello, Habr.



This article is a logical continuation of the ranking of the Best Habr's articles for 2018 . And although the year has not yet ended, but as you know, in the summer there were changes in the rules, respectively, it became interesting to see if it affected anything.







In addition to statistics itself, an updated rating of articles will be given, as well as a few source codes for those who are interested in how this works.



For those who are interested in what happened, continued under the cut. Those who are interested in a more detailed analysis of sections of the site can also see the next part .



Initial data



This rating is unofficial, and I do not have any insider data. As you can easily see, looking at the address bar of the browser, all articles on Habré have end-to-end numbering. Further, a technical matter, we just read all the articles in a row in a cycle (in one thread and with pauses so as not to load the server). The values ​​themselves were obtained by a simple parser in Python (the sources are here ) and saved in a csv file of approximately the following type:



2019-08-11T22:36Z,https://habr.com/ru/post/463197/,"Blazor + MVVM = Silverlight , ",votes:11,votesplus:17,votesmin:6,bookmarks:40,views:5300,comments:73

2019-08-11T05:26Z,https://habr.com/ru/news/t/463199/," NASA ",votes:15,votesplus:15,votesmin:0,bookmarks:2,views:1700,comments:7








Treatment



For parsing we will use Python, Pandas and Matplotlib. Those who are not interested in statistics, can skip this part and immediately go to the articles.



First you need to load the dataset into memory and select data for the desired year.



 import pandas as pd import datetime import matplotlib.dates as mdates from matplotlib.ticker import FormatStrFormatter from pandas.plotting import register_matplotlib_converters df = pd.read_csv("habr.csv", sep=',', encoding='utf-8', error_bad_lines=True, quotechar='"', comment='#') dates = pd.to_datetime(df['datetime'], format='%Y-%m-%dT%H:%MZ') df['datetime'] = dates year = 2019 df = df[(df['datetime'] >= pd.Timestamp(datetime.date(year, 1, 1))) & (df['datetime'] < pd.Timestamp(datetime.date(year+1, 1, 1)))] print(df.shape)
      
      





It turns out that for this year (although it is not finished yet) at the time of writing, 12715 articles were published. For comparison, for the whole of 2018 - 15904. In general, a lot - this is about 43 articles per day (and this is only with a positive rating, how many articles are downloaded that are negative or deleted, you can only guess or roughly figure out the omissions among identifiers).



Select the necessary fields from the dataset. As metrics, we will use the number of views, comments, rating values ​​and the number of bookmarks added.



 def to_float(s): # "bookmarks:22" => 22.0 num = ''.join(i for i in s if i.isdigit()) return float(num) def to_int(s): # "bookmarks:22" => 22 num = ''.join(i for i in s if i.isdigit()) return int(num) def to_date(dt): return dt.date() date = dates.map(to_date, na_action=None) views = df["views"].map(to_int, na_action=None) bookmarks = df["bookmarks"].map(to_int, na_action=None) votes = df["votes"].map(to_float, na_action=None) votes_up = df["up"].map(to_float, na_action=None) votes_down = df["down"].map(to_float, na_action=None) comments = df["comments"].map(to_int, na_action=None) df['date'] = date df['views'] = views df['votes'] = votes df['bookmarks'] = bookmarks df['up'] = votes_up df['down'] = votes_down
      
      





Now the data has been added to the dataset, and we can use them. Group the data by day and take the averaged values.



 g = df.groupby(['date']) days_count = g.size().reset_index(name='counts') year_days = days_count['date'].values grouped = g.median().reset_index() grouped['counts'] = days_count['counts'] counts_per_day = grouped['counts'].values counts_per_day_avg = grouped['counts'].rolling(window=20).mean() view_per_day = grouped['views'].values view_per_day_avg = grouped['views'].rolling(window=20).mean() votes_per_day = grouped['votes'].values votes_per_day_avg = grouped['votes'].rolling(window=20).mean() bookmarks_per_day = grouped['bookmarks'].values bookmarks_per_day_avg = grouped['bookmarks'].rolling(window=20).mean()
      
      





Now for the fun part, we can look at the charts.



Let's see the number of publications on Habré in 2019.



 import matplotlib.pyplot as plt plt.rcParams["figure.figsize"] = (16, 8) fig, ax = plt.subplots() plt.bar(year_days, counts_per_day, label='Articles/day') plt.plot(year_days, counts_per_day_avg, 'g-', label='Articles avg/day') plt.xticks(rotation=45) ax.xaxis.set_major_formatter(mdates.DateFormatter("%d-%m-%Y")) ax.xaxis.set_major_locator(mdates.MonthLocator(interval=1)) plt.legend(loc='best') plt.tight_layout() plt.show()
      
      







The result is interesting. As you can see, Habr was slightly "sausage" during the year. I don’t know the reason.







For comparison, 2018 looks a bit “smoother":







In general, I did not see any drastic decrease in the number of published articles in 2019 on the chart. Moreover, on the contrary, it seems even has grown a little since the summer.



But the following two graphs depress me a little more.



Average views per article:





Average rating per article:





As you can see, the average number of views during the year is slightly reduced. This can be explained by the fact that new articles have not yet been indexed by search engines, and they are not found so often. But the decrease in the average rating per article is more incomprehensible. The feeling is that readers either simply do not have time to view so many articles or do not pay attention to the ratings. From the point of view of the authors' reward program, this trend is very unpleasant.



By the way, this was not the case in 2018, and the schedule is more or less even.







In general, resource owners have something to think about.



But let's not talk about sad things. In general, we can say that Habr "survived" the summer changes quite successfully, and the number of articles on the site did not decrease.



Rating



Now, actually, the rating. Congratulations to those who hit him. I remind you once again that the rating is unofficial, maybe I missed something, and if some article should definitely be here, but it isn’t, write, I’ll add it manually. As a rating, I use calculated metrics, which, it seems to me, have turned out to be quite interesting.



Top Viewed Articles



LED lies of unprecedented proportions 241,000 views, 569 comments, rating + 364.0 / -1.0

'Blowjob article': scientists processed 109 hours of oral sex to develop an AI that sucks a member 236,000 views, 361 comments, rating + 240.0 / -68.0

What the designer smoked: an unusual firearm 235,000 views, 123 comments, rating + 119.0 / -9.0

How I did not work for a year at Sberbank 233,000 views, 580 comments, rating + 449.0 / -14.0

Scientists have found the oldest living vertebrate on Earth 221000 views, 211 comments, rating + 82.0 / -14.0

Smart bulbs thrown into the trash are a valuable source of personal information 219,000 views, 147 comments, rating + 73.0 / -11.0

Development King 178,000 views, 668 comments, rating + 315.0 / -60.0

Fraudsters and EDS - everything is very bad 175,000 views, 778 comments, rating + 356.0 / -0.0

The series 'Chernobyl': watch and think 172,000 views, 803 comments, rating + 164.0 / -25.0

The worst UI sound control 166,000 views, 176 comments, rating + 292.0 / -30.0

Honest programmer resume 165,000 views, 283 comments, rating + 410.0 / -40.0

I ruin developers' lives with my code reviews and I'm sorry 164,000 views, 12 comments, rating + 33.0 / -3.0

How Megafon slept on mobile subscriptions 162,000 views, 676 comments, rating + 624.0 / -2.0

Riot on the Picaba. Users massively go to Reddit 160,000 views, 484 comments, rating + 215.0 / -41.0

Cheap and expensive AAA batteries 159,000 views, 382 comments, rating + 363.0 / -6.0

Retired at 22,156,000 views, 922 comments, rating + 259.0 / -100.0

Man without a smartphone 152,000 views, 736 comments, rating + 173.0 / -25.0

Want eternal LEDs? Uncover soldering irons and files. Or homemade homemade lighting 149,000 views, 262 comments, rating + 94.0 / -6.0

What you don’t need to do if your phone is stolen 144,000 views, 638 comments, rating + 259.0 / -27.0

February 1, 2019 your site may stop functioning 143,000 views, 162 comments, rating + 89.0 / -8.0



Top articles on the ratio of ratings to views



Weaken the nuts, part 2: the voting term for publications and other changes is 14000 views, rating + 238.0 / -3.0

Pretty fanciful 'Beginnings' of Euclid in the TeX-e 10,800 views, rating + 136.0 / -0.0

User reward to the authors of Habr 26,400 views, rating + 320.0 / -0.0

Sending error messages in publications 18,900 views, rating + 179.0 / -2.0

Hello world! Or Habr in English, v1.0 21,000 views, rating + 178.0 / -2.0

Life on particles 34,000 views, rating + 267.0 / -2.0

Civilization of Springs, 5/5 25800 views, rating + 201.0 / -1.0

We play Tetris on the electromechanical screen 16300 views, rating + 124.0 / -0.0

Recreating fonts from a CRT screen 13,400 views, rating + 101.0 / -0.0

The mathematical model of the game is Dobble 14600 views, rating + 110.0 / -0.0

An important message about invites in the profile is 18300 views, rating + 137.0 / -8.0

Weaken the nuts in the Habr rules 48300 views, rating + 338.0 / -13.0

Street magic codec comparison. We reveal the secrets of 21,700 views, rating + 144.0 / -0.0

Smart parser for numbers recorded in words 20,500 views, rating + 136.0 / -1.0

Generic and metaprogramming models: Go, Rust, Swift, D and other 17,000 views, rating + 110.0 / -2.0

I create a global knowledge base on batteries 22,200 views, rating + 139.0 / -0.0

As I wrote and published a book about Moscow State University, or 12 critical errors, 21,600 views, rating + 134.0 / -0.0

About kote, wife, two sons, the idea ... and not only. A story with a continuation of 43,000 views, rating + 269.0 / -8.0

Computed video in 755 megapixels: plenoptics yesterday, today and tomorrow 41,500 views, rating + 244.0 / -0.0

The plot density in retail 27,500 views, rating + 160.0 / -1.0



Top articles on the ratio of comments to views



Github began to block user repositories from Crimea, Cuba, Iran, North Korea and Syria 44,500 views, 1,309 comments, rating + 115.0 / -6.0

Ukrainian lessons 60400 views, 1672 comments, rating + 285.0 / -41.0

Weaken the nuts in the Habr rules 48300 views, 1285 comments, rating + 338.0 / -13.0

A meeting against the isolation of Runet 50,900 views, 923 comments, rating + 204.0 / -32.0

How to ride two wheels to work 47100 views, 781 comments, rating + 113.0 / -10.0

Plane crash in Sheremetyevo: historical analogies 82,400 views, 1211 comments, rating + 147.0 / -11.0

Engineers save people lost in the forest, but the forest has not yet surrendered 28,900 views, 423 comments, rating + 132.0 / -1.0

Rally against isolation of the Runet 63,300 views, 820 comments, rating + 182.0 / -20.0

How the protection of children from information is arranged - and the enchanting story about where it first came from (18+) 65,400 views, 811 comments, rating + 175.0 / -2.0

Hello world! Or Habr in English, v1.0 21,000 views, 249 comments, rating + 178.0 / -2.0

How to buy potatoes correctly if you are color blind 51,800 views, 607 comments, rating + 135.0 / -3.0

How it feels to be a free software maintainer 22,900 views, 259 comments, rating + 129.0 / -3.0

Weaken the nuts, part 2: the voting period for publications and other changes is 14000 views, 158 comments, rating + 238.0 / -3.0

Pilot production of electronics for a minimum price of 34,200 views, 382 comments, rating + 165.0 / -3.0

How do we equip Megaphone 39800 views, 405 comments, rating + 140.0 / -6.0

Nuclear wars of the distant past? 83,400 views, 843 comments, rating + 133.0 / -5.0

Hello world! Or English-speaking Habr, v1.0 60,300 views, 591 comments, rating + 268.0 / -7.0

Space as a vague recollection of 43200 views, 402 comments, rating + 190.0 / -7.0

User reward to the authors of Habr 26,400 views, 245 comments, rating + 320.0 / -0.0

The principles of the free market in the understanding of the United States 56300 views, 502 comments, rating + 160.0 / -44.0



Top most controversial articles



State and T-killers 752 comments, rating + 83.0 / -80.0, 15100 views

These toxic guys: they poison projects 120 comments, rating + 67.0 / -51.0, 50,300 views

Why do you teach Go 70 comments, rating + 76.0 / -57.0, 23100 views

I read 80 resumes, I have questions 635 comments, rating + 135.0 / -94.0, 90700 views

Why it is actually impossible to be a vegetarian 940 comments, rating + 76.0 / -52.0, 51,600 views

Functional programming: a wacky toy that kills labor productivity. Part 1,394 comments, rating + 100.0 / -68.0, 54,000 views

We wrote the most useful code in our life, but threw it in the trash. Together with us 259 comments, rating + 101.0 / -63.0, 62900 views

Appeal in Apple 96 comments, rating + 90.0 / -52.0, 39,300 views

Why does Windows not steer in 2019, or CHYDNT? 881 comments, rating + 123.0 / -70.0, 75,000 views

I'm not real 246 comments, rating + 105.0 / -59.0, 63900 views

Five frightening trends of modern development 262 comments, rating + 95.0 / -52.0, 77400 views

The faster you forget OOP, the better for you and your programs 1271 comments, rating + 131.0 / -63.0, 128000 views

A year behind the wheel of an electric vehicle 1098 comments, rating + 131.0 / -58.0, 71800 views

I’ll stop kicking good to throw 179 comments, rating + 147.0 / -62.0, 34,400 views

Catch me if you can 215 comments, rating + 141.0 / -58.0, 65,400 views

Retired at 22,922 comments, rating + 259.0 / -100.0, 156,000 views

Psychiatrist's response to the article 'Sick and Healthy' 272 comments, rating + 154.0 / -55.0, 43,400 views

New programming languages ​​imperceptibly kill our connection with reality 764 comments, rating + 164.0 / -52.0, 106,000 views

Last stage alcoholism 597 comments, rating + 208.0 / -60.0, 123,000 views

'Blowjob article': scientists processed 109 hours of oral sex to develop an AI that sucks a member 361 comments, rating + 240.0 / -68.0, 236,000 views



Top rated articles



How Megafon slept on mobile subscriptions , 676 comments, rating + 624.0 / -2.0, 162,000 views

'Mobile content' is free, without SMS and registrations. Megaphone fraud details , 474 comments, rating + 488.0 / -8.0, 112,000 views

Innovations in Russian , 612 comments, rating + 480.0 / -33.0, 127,000 views

How I did not work for a year at Sberbank , 580 comments, rating + 449.0 / -14.0, 233,000 views

How Protonmail is blocked in Russia , 398 comments, rating + 418.0 / -7.0, 102,000 views

10 years in IT with a diagnosis of schizophrenia, survival tips , 281 comments, rating + 403.0 / -8.0, 122,000 views

An honest resume of a programmer , 283 comments, rating + 410.0 / -40.0, 165,000 views

When 'a' is not equal to 'a'. In the wake of one hack , 64 comments, rating + 374.0 / -5.0, 74600 views

Increase it! Modern increase in resolution , 214 comments, rating + 366.0 / -1.0, 104000 views

LED lies of unprecedented proportions , 569 comments, rating + 364.0 / -1.0, 241,000 views

Cheap and expensive AAA batteries , 382 comments, rating + 363.0 / -6.0, 159,000 views

Fraudsters and EDS - everything is very bad , 778 comments, rating + 356.0 / -0.0, 175,000 views

Japan: a country of such common sense that it is sometimes irrational for us , 483 comments, rating + 365.0 / -12.0, 138,000 views

Weaken the nuts in the Habr rules , 1285 comments, rating + 338.0 / -13.0, 48300 views

User reward to the authors of Habr , 245 comments, rating + 320.0 / -0.0, 26,400 views

How I caught a hacker , 273 comments, rating + 305.0 / -6.0, 110,000 views

Myths of modern popular physics , 556 comments, rating + 304.0 / -6.0, 99,600 views

Now good developers are measured by views and subscribers - and this is bad , 486 comments, rating + 324.0 / -26.0, 74800 views

Survive in a head-on collision, and why amnesia is not what you think , 165 comments, rating + 297.0 / -4.0, 61800 views

Port scanner in the personal account of Rostelecom , 194 comments, rating + 300.0 / -8.0, 111,000 views



Top Articles by Number of Bookmarks



42 Google Advanced Search Operators (full list) 47,100 views, 917 bookmarks

How to become a Java developer in 1.5 years 88,500 views, 894 bookmarks

Sampler. Console utility for visualizing the result of any shell commands 58,400 views, 801 bookmarks

HBO, thank you for reminding me ... 'Chernobyl first-aid kit' of a Belarusian pharmacist 88,500 views, 797 bookmarks

Practical Tips, Examples, and SSH Tunnels 40,000 Views, 787 Bookmarks

256 lines of bare C ++: writing a ray tracer from scratch in a few hours 60,000 views, 745 bookmarks

Asynchronous programming (full course) 36,700 views, 690 bookmarks

'Burnt' employees: is there a way out? 116,000 views, 688 bookmarks

An extensive overview of Python interviews. Tips and Tricks 28,400 views, 687 bookmarks

15 Machine Learning Books for Beginners 18,700 views, 670 bookmarks

Lecture course in JavaScript and Node.js in the KPI 52500 views, 656 bookmarks

How I write math notes on LaTeX in Vim 58100 views, 652 bookmarks

What I learned from my bitter experience (over 30 years in software development) 100,000 views, 651 bookmarks

A selection of useful slides from Julia Evans 41,000 views, 587 bookmarks

HTTP headers for responsible developer 33,600 views, 566 bookmarks

N + 7 useful books 42,700 views, 563 bookmarks

Hacking CAN bus auto. Virtual dashboard 60,700 views, 562 bookmarks

Careful relocation to the Netherlands with his wife and mortgage. Part 1: job search 76200 views, 555 bookmarks

TCP vs UDP or the future of network protocols 50,300 views, 538 bookmarks

Best Linux distributions for older computers 66,000 views, 523 bookmarks



Top by Bookmark to View Ratio



15 machine learning books for beginners 670 bookmarks, 18,700 views

Music for your projects: 12 thematic resources with tracks licensed under Creative Commons 477 bookmarks, 18,100 views

An extensive overview of Python interviews. Tips and Tricks 687 bookmarks, 28,400 views

A selection of datasets for machine learning 455 bookmarks, 19,000 views

Dungeon generator based on nodes of graph 304 bookmarks, 12,700 views

A simple explanation of path search algorithms and A * 316 bookmarks, 13,500 views

Web tools, or where to start a pentester? 421 bookmarks, 18800 views

Learning Docker, Part 2: Terms and Concepts 341 bookmarks, 15,600 views

Learning Docker, Part 3: Dockerfile Files 297 Bookmarks, 13,800 Views

Tools for analyzing and debugging .NET applications 244 bookmarks, 11,600 views

How to debug environment variables in Linux 322 bookmarks, 15,900 views

How to take the first steps in robotics? 224 bookmarks, 11,200 views

Labyrinths: classification, generation, search for solutions 318 bookmarks, 16,000 views

Practical Tips, Examples, and SSH Tunnels 787 Bookmarks, 40,000 Views

Lecture Course 'Fundamentals of Digital Signal Processing' 418 bookmarks, 21,400 views

42 Google Advanced Search Operators (full list) 917 bookmarks, 47,100 views

3D Game Shaders for Beginners 239 bookmarks, 12,400 views

Point bypass PKH locks on a router with OpenWrt using WireGuard and DNSCrypt 302 bookmarks, 15,700 views

Developing the skill of using grouping and data visualization in Python 192 bookmarks, 10,000 views

Another Github 2: machine learning, datasets and Jupyter Notebooks 265 bookmarks, 13,900 views



Top Commented Articles



Ukrainian lessons 1672 comments, 60,400 views

Rocket 9M729. A few words about the “violator” of the INF Treaty 1371 comments, 83,000 views

Github started blocking user repositories from Crimea, Cuba, Iran, North Korea and Syria 1,309 comments, 44,500 views

Weaken nuts in Habr rules 1285 comments, 48300 views

The faster you forget OOP, the better for you and your programs 1271 comments, 128000 views

Plane crash in Sheremetyevo: historical analogies 1211 comments, 82,400 views

How did generation Y turn into a burnt out generation? 1122 comments, 81,500 views

Electric car is not for me 1116 comments, 50,700 views

A year behind the wheel of an electric vehicle 1098 comments, 71800 views

The current state of consciousness science 1021 comments, 27,500 views

Finland summed up the preliminary results of the experiment with guaranteed basic income 999 comments, 62,100 views

Conversation about a fair economy 997 comments, 7,700 views

Why it is really impossible to be a vegetarian 940 comments, 51,600 views

Honey, we kill the Internet 933 comments, 120,000 views

Rally against isolation of RuNet 923 comments, 50,900 views

Retired at 22,922 comments, 156,000 views

Choosing a car for an IT specialist, or tips for teapots from a teapot 914 comments, 43,400 views

Why Senior Developers Can't get a job 901 comments, 119,000 views

The plan returned to the economy. 892 comments, 27,800 views.

Personal city teleportator 889 comments, 40,800 views



. Finally, the last anti-stop by the number of dislikes



. Retired at 22 , 922 comments, rating + 259.0 / -100.0.

I read 80 resumes, I have questions , 635 comments, rating + 135.0 / -94.0

Darling, we kill the Internet , 933 comments, rating + 392.0 / -83.0

State and T-killers , 752 comments, rating + 83.0 / -80.0

Why does Windows not steer in 2019, or CHYaNT? , 881 comments, rating + 123.0 / -70.0

Functional programming: a stupid toy that kills labor productivity. Part 1, 394 comments, rating + 100.0 / -68.0

'Article about blowjob': scientists processed 109 hours of oral sex to develop an AI that sucks a member , 361 comments, rating + 240.0 / -68.0

We wrote the most useful code in our life, but he was thrown in the trash. Together with us , 259 comments, rating + 101.0 / -63.0

The faster you forget OOP, the better for you and your programs , 1271 comments, rating + 131.0 / -63.0 I will

stop throwing the good in the trash , 179 comments, rating + 147.0 / -62.0 Development

King , 668 comments, rating + 315.0 / -60.0

Last stage alcoholism , 597 comments, rating + 208.0 / -60.0

I'm not real, 246 comments, rating + 105.0 / -59.0

Catch me if you can , 215 comments, rating + 141.0 / -58.0

A year behind the wheel of an electric car , 1098 comments, rating + 131.0 / -58.0

Why do you need to learn Go , 70 comments, rating +76.0 /-57.0

Psychiatrist’s response to the article 'Sick and Healthy' , 272 comments, rating + 154.0 / -55.0 Appeal

at Apple , 96 comments, rating + 90.0 / -52.0

New programming languages ​​quietly kill our connection with reality , 764 comments, rating + 164.0 / -52.0

Five frightening trends of modern development , 262 comments, rating + 95.0 / -52.0



Uff. I have some more interesting samples, but I won’t bore the readers.



Conclusion



When building the rating, I drew attention to two points that seemed interesting.



Firstly, after all, 60% of the top are articles in the geektimes genre. Whether there will be less of them next year, and how Habr will look without articles about beer, space, medicine and so on - I do not know. Readers will definitely lose something. We'll see.



Secondly, the bookmark top turned out to be unexpectedly high-quality. This is psychologically understandable, readers may not pay attention to the rating, and if the article is needed , they will add it to bookmarks. And here is just the greatest concentration of useful and serious articles. I think that the site owners should somehow consider the relationship between the number of bookmarking and the incentive program if they want to increase this particular category of articles here on Habré.



Something like this. I hope it was informative.



The list of articles turned out to be long, well, it’s probably for the better. Enjoy reading to everyone.



All Articles