The city falls asleep, Habrovsk residents wake up

If the number of comments under an article by a swift jack approaches 1000, you can be sure that regardless of the topic stated by the author, srach rages inside: foci of political fire surrounded by couch experts on all issues, psychiatric diagnoses at a distance by profile picture and nickname, transitions to personalities, sarcastic attacks, the causticity of which is greater than that of the blood of xenomorphs, and, of course, the obligatory dish in such cases is the mutual accusations that your partner and you are discussing exclusively for a reward and / or on duty. Which, apparently, is both dangerous and difficult, and at first glance it seems to be invisible, and thirty pieces of silver do not lie on the road.



The funniest thing in this situation is that people who are deeply affected by the Internet-someone-wrong syndrome often spend a damn break of time and nerves to prove to another just as struck completely free that he does exactly the same thing for money or on orders . Are you looking for logic here? She is not. This is the internet, baby.



Let us take one of the relatively recent questions about alleged territorial discrimination on Gitlab. 4 days have passed since the publication of the article, and, of course, the discussion a long time ago moved away from the originally stated topic for distant lands. These phrases sound:

A real person can not oppose anything to a professional commentator on a subscription ...



The user (such and such) spends simply an unrealistic amount of time on comments ...

At the same time, its activity does not have patterns that are usually inherent in an ordinary user ...



ps but it led me to write a parser analyzer of such commentators) With an indication of activity by the hour, the amount of time per day, per week, etc. ... A good topic for the article)
So stop it. And what are these patterns “typically inherent to the average user”? The author of this phrase in that topic, unfortunately, has already been translucidated, so you have to go at random.



The question that I want to put before your eyes is clear, the following - is it even possible by statistical methods to at least somehow reliably distinguish these very patterns so as to create a formal classifier that distinguishes casual commentators from professional ones? Imagine - “according to the Habr-Botometer, you are 76% likely to be a Kremlin bot.” It will be much cooler than karmic raids on each other.

Unfortunately, my competencies are not enough to even suggest which way to dig to solve such a problem. Nevertheless, last night I knocked down “on my knee” a small primitive parser, which (since the comment page is open even to unauthorized visitors) does two things for now - a) it collects statistics from all of its comments (for now, it's just time -stamp) and adds to the MySQL database; b) draws a time chart, marking on it the events of sending comments taken from this database. Even without some tricky analysis, it turned out pretty funny. This is what my comments chart looks like. Explanations are underneath. It is best viewed in a separate window on a scale of 100% or more.



image



On the horizontal axis is time, each pixel is equal to one minute, the price of gray divisions is equal to one hour, the entire horizontal line is equal to one day. The day goes from bottom to top along the vertical axis, the division price on it is 365 days.



There is nothing particularly interesting in my diagram. It can be seen that I like to sleep for 7-8 hours, often go to bed after midnight, and sometimes organize many-hour commentary marathons, and that activity over the past year exceeds or is approximately equal to that for the previous five years.

Or, comrade gecube kept a vow of silence for three and a half years, and then it broke through ...



image



A typical habra-commenter activity diagram looks something like this (this is QtRoS )



image



A distinct "sleepy hollow" on the left somewhere in the European night and leisurely commenting during daylight hours, possibly with interruptions for half a year.



But not all charts are so boring! How do you like this:



image



For more than two years, our colleague, apparently, retrained his biorhythms to sleep from a European night somewhere under the Mid-Atlantic Ridge, moreover, evenly and gradually, and then spent another two years to return to the shores of Portugal. Walking? Swim? I can’t come up with plausible explanations ... The first three hours of wakefulness, the comments fly like a machine gun, and at the end of the day I already looked once an hour that everything is done there.



It was, by the way, 0xd34df00d .



And here’s another riddle:



image



For four and a half years, a colleague held out without a single comment - he saw that he trained somewhere in secret monasteries, how not to sleep for days, judging by how many comments were sent in the "sleepy hollow".



But the most interesting thing here is the anomaly at the 16th hour, which lasts more than three years and gradually fades in the last year. Smoke break? Walking the dog? Jogging? What else can tear a habrovchanin from a commentary tape at the height of the working day with such daily predetermination? I’m gouging and a lazy person, I can’t imagine such a self-discipline that respected khim can afford.



Finally, the last diagram to think about:



image



It generally does not have a pronounced "sleepy hollow." Only barely guesses the apparent excess of the number of comments sent in the afternoon over sent before.



With all Komsomol severity, I urge the respected MTyrz to disarm before the party and honestly admit how many grandparents, granddaughters, bugs and mice steer your account and scribble comments.



And in the end, the insidious question - can anyone be so interested in everything that they want to develop the parser code and / or get a database dump or access to it and so on? My own knowledge in data mining and in data visualization methods hardly exceed general erudition. Something more clever and interesting than these simple little diagrams I can hardly think of. If someone is interested, write to me in telegram (nickname in profile).



Thanks for attention!



UPD. I posted the sources on GitHub .



All Articles