In connection with the constant increase in the level of development of information technologies, electronic documents become more convenient and popular in use every year and begin to dominate traditional paper media. Therefore, it is very important to pay attention in time to protecting the content of information not only on traditional paper media, but also on electronic documents. Every large company that has commercial, state and other secrets wants to prevent possible information leakage and compromise of classified information, and if a leak is detected, take measures to stop the leak and identify the violator.
A bit about protection options
To perform these tasks, certain protective elements are introduced. Such elements can be barcodes, visible tags, electronic tags, but hidden tags are the most interesting. One of the most prominent representatives are watermarks, they can be applied to paper or added before printing on a printer. It’s no secret to anyone that printers put their watermarks (yellow dots and other marks) when printing, but we will consider other artifacts that can be put on the computer screen at the employee’s workplace. Such artifacts are generated by a special software package that draws artifacts on top of the user's workspace, minimizing the visibility of the artifacts themselves and without interfering with the user's work. These technologies have ancient roots in terms of scientific developments and the algorithms used to represent hidden information, but are quite rare in the modern world. Basically, this approach is found in the military sphere and on paper, to quickly identify unscrupulous employees. These technologies are just beginning to develop the commercial environment. Nowadays, visible watermarks are actively used to protect the copyright of different media files, but invisible ones are quite rare. But they cause the greatest interest.
Security artifacts
Invisible to humans Watermarks form various artifacts, which can, in principle, be invisible to the human eye, can be masked in the image in the form of very small dots. We will consider visible objects, since invisible to the eye can be located outside the standard color space of most monitors. These artifacts are of particular value due to their high degree of invisibility. However, it is impossible to make the CEH completely invisible. In the process of their implementation, a certain kind of distortion of the container image is introduced into the image, some kind of artifacts appear on it. Consider 2 types of objects:
- Cyclic
- Chaotic (contributed by image conversion)
Cyclic represent a certain finite sequence of repeating elements, which is repeated more than once on the screen image (Fig. 1).
Random artifacts can be caused by various kinds of transformations of the superimposed image (Fig. 2), for example, the introduction of a hologram.
Fig. 1 Loop artifacts
Fig. 2 Chaotic artifacts
To begin, consider the options for recognizing cyclic artifacts. These artifacts may include:
- text watermarks repeating across the screen
- binary sequences
- a set of random points in each grid cell
All of these artifacts are applied directly on top of the displayed content, respectively, they can be recognized by identifying the local extremes of the histogram of each of the color channels and accordingly cutting out all other colors. This method involves working with combinations of local extremes of each channel of the histogram. The problem rests in the search for local extremes on a sufficiently complex image with many sharply shifting details, the histogram looks very sawtooth, which makes this approach inapplicable. You can try to apply various filters, but they will introduce their own distortions, which ultimately can lead to the inability to identify the watermark. There is also the option of recognizing artifact data using certain border detectors (for example, a Canny border detector). These approaches have their place for artifacts that are quite sharp in transition, detectors can select image contours and further select color ranges inside the contours to binarize the image in order to further highlight the artifacts themselves, but these methods require sufficiently fine tuning to select the desired contours, as well as the subsequent binarization of the image itself relative to the colors in the selected contours. These algorithms are considered rather unreliable and try to use more stable and independent of the type of color components of the image.
Fig. 3 Watermark after conversion
As for the chaotic artifacts that were mentioned earlier, the algorithms for their recognition will be radically different. Since the formation of chaotic artifacts is assumed by superimposing a certain watermark on the image, which is transformed by any of the transformations (for example, the discrete Fourier transform). Artifacts from such transformations are distributed throughout the screen and it is difficult to identify their regularity. Based on this, the watermark will be located throughout the image in the form of "random" artifacts. Recognition of such a watermark comes down to direct image conversion using the conversion functions. The result of the conversion is shown in the figure (Fig. 3).
But there are a number of problems that prevent the recognition of a watermark in not ideal conditions. Depending on the type of conversion, there can be various difficulties, for example, the inability to recognize a document obtained by photographing at a large angle relative to the screen, or simply a picture of poor quality, or capturing a screen saved in a file with high loss compression. All these problems lead to the complication of identifying a watermark; in the case of an image taken at an angle, it is necessary to apply either more complex transformations or apply affine transformations for the image, but none of the other guarantees a complete restoration of the watermark. If you consider the case of screen capture, two problems arise, the first is distortion when displayed on the screen itself, the second is distortion when saving the image itself from the screen. The first one is quite difficult to control due to the fact that there are matrices for monitors of different quality, and due to the absence of one color or another, they interpolate the color depending on their color representation, thereby introducing distortions into the watermark itself. The second is even more difficult, because you can save a screenshot in any format and, accordingly, lose part of the color range, therefore, we can simply lose the watermark itself.
Implementation Issues
In the modern world, there are many algorithms for embedding watermarks, but none guarantee 100% the possibility of further recognition of a watermark after its implementation. The main difficulty is the definition of a set of reproduction conditions that may arise in each case. As mentioned earlier, it is difficult to create a recognition algorithm that takes into account all the possible features of distortion and attempts to damage the watermark. For example, if you apply a Gaussian filter to the current image, and the artifacts in the original image were quite small and contrasting with the background of the image, then it becomes either impossible to recognize them or a part of the watermark will be lost. Consider the case of a photograph, with a high degree of probability there will be moire (Fig. 5) and a “grid” on it (Fig. 4). Moire arises due to the discreteness of the matrix of the screen and the discreteness of the matrix of the shooting equipment, in this situation it turns out that two mesh images overlap each other. The grid is likely to partially overlap the watermark artifacts and cause a recognition problem, moire, in turn, in some methods of embedding a watermark makes it impossible to recognize, since it overlaps part of the image with a watermark.
Fig. 4 Image Grid
Fig. 5 moire
In order to increase the threshold for watermark recognition, it is necessary to apply algorithms based on self-learning neural networks and during operation, which will themselves learn to recognize watermark images. Now there are a huge number of neural network tools and services, for example, from Google. If desired, you can find a set of reference images and teach the neural network to recognize the necessary artifacts. This approach has the most promising chances of identifying even severely distorted watermarks, but for fast detection it requires large computing power and enough training for correct detection.
Everything described seems simple enough, but the deeper you go into these issues, the more you realize that to recognize watermarks you need to spend a lot of time implementing any of the algorithms, and even more time to bring it to the necessary probability of recognizing each image.