Frequency of using different words in drug / alcohol addicts / tobacco smokers compared to other people
According to American statistics, 10% of the US population aged 12 years and older suffer from some form of dependence — in official terminology, this is called
substance use disorder (SUD) (dependence). This indicator is probably much higher in the Russian Federation. According to
RBC , here 10% of the population take surrogate drinks (medical lotions, hawthorn, glass washers, palenka, etc.), and many times more people drink legal alcohol.
In recent years, people have begun to spend a huge amount of time in social networks, where they communicate, exchange ideas, etc. This is a huge amount of information sufficient for a machine learning system. And there are a lot of dependent people in social networks. Thanks to the achievements of scientists, it became possible to automatically identify drug addicts, alcoholics and tobacco smokers by their vocabulary and cultural interests (music, films).
Perhaps even automatic filtering of drug addicts on the Internet will begin in the future. For example, they will be banned from registering on some sites, or a special profile icon will be assigned.
Specialists from the Department of Information Systems at the University of Maryland and the Addiction Dependency Research Center at the Carillon Research Institute at Virginia Polytechnic University have developed
a machine learning system that automatically identifies addicted people as well as people at risk of addiction (i.e. differ from real drug addicts and alcoholics).
As you know, addiction to certain substances inevitably affects the social activity of a person and correlates with his personality traits. For example, people who regularly smoke tobacco show a significantly higher rate of “openness to experience”, but a significantly lower indicator of “good faith” than non-smokers (see
Campbell et al. 2014 ). Alcohol consumption is positively correlated with sociality and extroversion (
Cook et al., 1998 ).
Dozens of other scientific papers also revealed a link between the constant use of a drug and the peculiarities of personality and social behavior. Very often, the use of substances correlates with a reduced “conscientiousness” - this is a personality trait that is associated with self-discipline, conscientious performance of duties and the desire to achieve the goal. Such a correlation is quite understandable, because it is precisely these traits of character that are required to get rid of drug addiction.
On the other hand, science also knows risk factors that increase the likelihood of addiction — age, gender, impulsiveness, desire for pleasure, reaction to novelty, a tendency to exercise, and a poor environment (
Carroll et al., 2009 ). There are other factors that increase the risk, including the social environment (neighbors), family environment (relatives), social norms.
Previously, scientists conducted such studies using social surveys, but now, thanks to the huge amount of information on social networks, you can study the behavior of people without leaving your computer. For a machine learning system, American researchers used the database compiled as part of the myPersonality project from 2007 to 2012. It was a popular application for Facebook, where people passed psychological tests and talked in detail about their personality and habits, among them were drug addicts, alcoholics and tobacco smokers.
With psychological profiles of Facebook users associated their activity in the social network - 22 million status updates from 153 thousand users. On average, 143 messages from each user, and the average number of words - 1730 per person. Non-English-speaking users and those who wrote less than 500 words were excluded from the database. The remaining 21 million posts from 106 509 people. After filtering low-frequency words (with a frequency of less than 50 in the database), the vocabulary corpus was 73,935 words.
The researchers took into account the experience of predecessors, who proved that a person’s identity is easier to learn not by his words, but by his likes. Therefore, they also made up a database of likes for 5.1 million users.
After training, the system was able to predict with great accuracy the dependence of a person regardless of whether he passed psychological tests. Thus, the probability of smoking tobacco is determined with the greatest accuracy by 86%, the probability of using drugs is 84%, the probability of using alcohol is 81%.
Moreover, the results of the analysis of likes and frequency of words among drug addicts and alcoholics are of real scientific interest. They show in what concrete way the interests and behavior of the dependent person differ from the interests and behavior of the person who does not use substances.
In terms of frequency of use of words in dependent people (drugs, tobacco), swearing (fuck, shit) is more common. Among the interests of alcoholics, the film “V is Vendetta” is leading, and drug addicts like to listen to the bands Radiohead, The Cure and Depeche Mode.
The scientific article was
published on May 16, 2017 on the site of preprints arXiv.org (arXiv: 1705.05633).