About the oddities of habrostatistics

And before, I noticed strange behavior of ratings, but recently, strangeness has manifested itself too clearly. And I decided to investigate the problem with the scientific methods available to me, namely: to analyze the dynamics of plus-minus. Suddenly imagined?



I'm still the programmer, but I know how to do very basic things. So I coded a simple utility collecting statistics from the panels of the Habrovsky post: pros, cons, views, bookmarks, and more.







Statistics are displayed in graphs, after studying which it was possible to find a couple more surprises, smaller ones. But first things first.



Strangeness 1.

With her, in fact, my statistical study began.



It seemed strange to me that in the first hours after the publication of some of my posts they abruptly went negative, then they were reset to zero, and in the end they earned the expected plus. Why's that?



I was just about to publish another post - in two parts. He decided to subject it to statistical preparation.



Published the first part. At the same time, he launched the utility and began to wait for the result. Unfortunately, at night - while I was drunk - the program stopped collecting information due to an admitted bug. The next morning I fixed the error, but the statistics turned out to be incomplete for a day. However, the trends are obvious for the hours worked.



Data are given for the first 14 hours from the date of publication, the interval between measurements is 10 minutes.







Eyes are not deceived: most of the minuses are in the first hour of the post. First, the post went abruptly, then straightened. Here are the numbers on which the graph is built:







And this despite the fact that views increase smoothly!







The steps going from thousandths are explained by the fact that reductions begin in the Khabrovsky panel: there is nowhere to take the exact number of views from (it could probably be taken from third-party services, but I did not use them).



I’m not special in statistics, but such a distribution of minuses is abnormal, as far as I understand ?!



Look, bookmarks are distributed more or less evenly over the registration period:







Comments are also evenly distributed:







Bursts of activity and passivity are observed, but they are also distributed by period: commenting fades, then resumes.



The same with subscribers - there is a uniform slight increase:







Karma for the reporting period has not changed - I do not bring it. And the rating is calculated by Habré, it makes no sense to bring it.



All indicators change in proportion to the number of views, and only with the minuses is something wrong: a flash of bitterness falls on the first hour from the start of publication. The same thing was observed with my previous posts. But if earlier it was, so to speak, personal impressions, now now they have been confirmed by registration.



In my purely Nubian opinion, such a distribution means: there are several users sitting on the site who purposefully look at the latest published posts and some of the posts, based on the needs known only to them, are minuscated. I am writing “some of the posts” because I noticed this effect not only in my publications. In all cases, the effect is pronounced, otherwise I simply would not have paid attention to it.



I have four versions of why this is happening.



Version 1. Mental perversion. Sick people specifically guard unpleasant authors and minus them, with the goal of harm.



I do not believe in this version.



Version 2. Psychological effect. Which - I do not know. Well, why do readers at first unanimously minus the post, then at least unite the pluses? Minus as non-thematic, but plus after connoisseurs of beauty are in the majority? I do not know.



If there are psychologists among readers, let them say their weighty word.



Version 3. Servants are operational. Why should their bosses slander the Khabrov posts - God knows. However, there are servicemen not only in our country. Who will understand them, Russophobia ?!



Version 4. The combined effect of the previously mentioned factors.



It is quite conceivable.



Be that as it may, minusers manage to reduce the number of views. I am not familiar with the rules for putting Khabrov’s posts to the top, I don’t even know whether these algorithms are made public or not, but it’s obvious to me: early minuscation prevents ostracized posts from reaching the tops - more precisely, it delays getting there, which in turn is significantly times, reduces the number of views.



As far as I understand, there are no effective ways to combat this evil. The only way is a personal vote. Only in this case, you can determine from which profiles comes the periodic tracking and minus the latest posts. However, there is no registered vote on Habré (or rather, it will not be made public).



But not so simple.



As I said, the prepared material was published in parts. After the publication of the second part, I expected a similar picture: with the initial exit to minus and the next to plus. However, the effect turned out to be much smoother: the post didn’t come out in minus.



By the time the second part was published, the bug was fixed, so the data is given per day:







Where smoothing came from is not known to me. Perhaps due to publication on Saturday (minusers on Saturdays do not work?) Or due to the fact that this is the end of previously published material.



However, the distribution of the minuses is still uneven: all the minuses are in the first half of the registration period, and the minus ends much earlier than the plus. At the same time, views are distributed by period exactly as last time - evenly:







The jump, which occurred about three in the afternoon, is not classified material. Just for an hour my Internet was cut off. The utility could not connect to the site.







Everything else is completely standard.



Bookmarks:







Comments: as last time, periods of activity alternate with periods of silence.







Karma. An increase of a couple of units was recorded - of course, not simultaneous:







And subscribers. The total number has remained unchanged (apparently, those who wish signed up when the first part was published). Only about an hour in the afternoon did a single fluctuation occur: someone unsubscribed - possibly by mistake - but immediately signed up again. If it was another person, compensation occurred: the total number of subscribers has not changed.







So, post indicators behave in an understandable and predictable way. All indicators, except for minuses. Since I see no obvious reason for this, I find the minus peak at least strange.



Strangeness 2.

Sometimes the number of views decreases (which, of course, is impossible), but soon returns to normal.



I tracked it by accident, during debugging of the program, when the export-import function was not yet attached, so the corresponding zigzag is missing on the chart. You can take a word - this effect has been observed twice. Several thousand views, suddenly the number of views decreases by a couple of hundred, after 10-20 minutes it is restored to the previous level (without taking into account the natural increase).



With this, it’s quite simple: a bug on the site. And there is nothing to think about.



Strangeness 3.

That's what seemed to me much stranger than the voluntaristic first and technical second effects. Pluses do not happen singly, with a uniform distribution over the period, but in blocks. But plusing is not a comment, when the question naturally follows the answer, they are an individual act!



Take a look at the result graphs published above: the blocks are noticeable.



Knowledgeable people nodded to me at the Poisson distribution, but I am not able to independently calculate the probability. If you are capable, count. For me, it is already obvious that the number of double pluses is much higher than the norm.



Here is the digital data on the pluses of the first part of the post. The graph shows the number of pluses per unit, double and triple positions in the total number of ratings. As mentioned earlier, the measurement interval is 10 minutes.







From 30 pokes into 84 cells, two cells were poked three times. Well, I don’t know how much this corresponds to probability theory ...



Data on the second part of the post (since the measurement period is longer, I shorten it by the duration of the first part, for comparability):







By the way, here one of the single pluses adjoins to the triple here in time, that is, in some 20 minutes there was a surge in plusing (29% of the total number of pluses were delivered). And this did not happen in the first minutes of publication.



The ratio between single, double and triple positions is approximately the same as for the first part. A decrease in the share of ratings in measurements is explained by the fact that ratings were set less often. Measurements were made, but no pluses were recorded.



I can not explain this effect of block padding in any way, that is, in any way at all. For minuses, such a “blocky” behavior seems to be not typical.



The emitters of good send suggestions portionwise, then turning on and off? Heh heh heh ...



PS

If anyone has a desire to analyze the statistics of posts using more advanced methods or check arithmetic, the files with the source data are here:

yadi.sk/d/iN4SL6tzsGEQxw



I do not insist on my doubts - maybe I'm wrong, especially since there are no belmes in statistics. I hope that the comments of professional statisticians, psychologists and other interested users will clarify the perplexity that has arisen.



Thanks for attention.



All Articles