Understanding application and service privacy policies will help neural networks

The privacy policies of sites and applications that describe the conditions for processing personal data of users are usually written by lawyers and ... for lawyers. It can be difficult for a mere mortal to understand the essence. The resident of Hacker News took up the problem - he developed a machine learning algorithm that helps with reading the privacy policy.



We talk about him and other projects for "digesting" privacy policies.





Photos - Ashley Batz - Unsplash



What is the problem



This year, NYTimes explored the privacy policies of 150 sites and applications. Editors analyzed them using the Lexile framework. It determines the complexity of the text, taking into account the length of sentences and vocabulary. An analysis of the documents showed that most of them are written in a language that is difficult for even specialists and students to understand, not to mention schoolchildren. Vague wordings make it difficult to understand what companies do with personal data: what information they collect, for what purposes, how they process it and to whom they transmit it.



On average, privacy policies consist of 2.5 thousand words, but in some cases this figure can exceed 8 thousand. It is difficult to imagine how long it will take to carefully and thoughtfully read such a document. According to some reports - up to 30 minutes.


Back in 2008, experts from Carnegie Mellon University calculated that on average an Internet user needs from 181 to 304 hours ( table 7 ) to study the privacy policy of sites visited over the year. At the same time, they did not take into account the time taken to analyze agreements on the use of products and services, in which privacy policies constitute only a small part. There is reason to believe that since then the situation has only worsened.



So, in the late 90s, Google in 600 words explained how they collect and use personal data. Over the past 20 years, the volume of the document has grown seven times . But not all politicians are complicated and confusing. NYTimes editors in their study noticed that the BBC document is simple and concise, without a bunch of terms. There are projects whose purpose is to extend this practice to the entire IT industry and, if not to unify privacy policies, then simplify their understanding for users.



The neural network will read for you



Hacker News Resident has developed a Guard utility that parses application privacy policies using machine learning algorithms. They look for “gray language” in the text of the agreement, leaving room for interpretation. According to the developer, the tool will let users understand what exactly they are agreeing to.



The utility also shows the number of incidents involving personal data leaks in a particular company. The service is quite young and its application library is still small. It includes: Twitter, Instagram, Netflix, Telegram, Waze, Spotify, Reddit and several others.



Guard also has analogues - Terms of Service; Didn't Read (ToS; DR) and TLDRLegal . They also evaluate the privacy policies of individual sites, but operate on a crowdsourcing model. Instead of a neural network, the text is evaluated by volunteers and enthusiasts. In the future, the spread of such tools will positively affect the security of personal data on the network.



Privacy Commons standardizes everything



This is similar to Creative Commons, but for privacy policies. The idea is to create a clear and easy-to-understand structure with a description: what personal data the company collects, how it protects it and to whom it passes. They worked on a similar project in Mozilla back in 2011. Specialists of the company proposed to introduce special icons for sites . They denoted the policies and approaches of companies to work with PD. But the project has not yet come out of beta.



“Conditional standardization will make privacy policies transparent and exclude gray areas,” comments Sergey Belkin, marketing director of IT-GRAD and 1cloud.ru. “But they have been talking about implementing Privacy Commons for at least ten years, and the process has not moved forward.” Although with the introduction of the European GDPR and ePrivacy Regulation, there is a chance that companies will still come to standardization in practice. ”


Browser warns of violations



There are protocols that allow sites to inform the browser about the alleged receipt of the user's personal data. For example, the W3C consortium worked at one time on the Platform for Privacy Preferences (P3P). Users told the browser what personal data they were willing to share. He checked the list of preferences with privacy policy on sites supporting P3P. If discrepancies appeared, the browser issued a warning to the user.





Photos - Kai Brame - Unsplash



But after some time, the development of P3P was turned off, since most sites simply ignored it. However, in a sense, its function today is performed by cookie-banners that have arrived with the new regulation. Resources suggest the user to choose what personal data he is ready to share. Now W3C is developing another standard - Do Not Track (DNT). It adds a function to browsers that tells sites whether the user has allowed the setting of cookies or not. It is believed that DNT will be more successful than P3P - it is already supported by companies such as Mozilla, Google and Microsoft.




1cloud cloud administration step-by-step instructions . We analyze the most frequently asked questions about the operation of virtual servers, billing, and SSL certificates.





All Articles