ABBYY Mobile Web Capture: High-quality photos of documents right in your smartphone’s browser

image



Our customers often use a mobile phone to photograph a document and send it to a car sharing company, visa center, telecom operator, financial and other companies. A photo of the document is enough to get a car for rent, activate a SIM-card, apply for a loan. But sometimes getting a good quality image from a smartphone can be difficult. Nevertheless, we managed to solve this problem.



Now there are many applications on iOS and Android for "mobile scanning" of documents. But how many mobile applications do you have on your phone? Why waste time and install more new ones if you can not do this?



It is much easier to photograph a document directly in a mobile browser, which is already on any smartphone. That's why we created ABBYY Mobile Web Capture . This is the JavaScript API, that is, the SDK that we offer our customers to embed in their web pages and web-based applications. It allows you to capture a good picture directly in a web browser on the most popular mobile OS and send it for further processing to a server or to the cloud. Today we will tell how this technology works.



ABBYY Mobile Web Capture offers to take a photo of the desired document in a video stream from a mobile browser. Account, driver’s license, passport, contract, application form, application - any documents can be processed.



The new product uses our mobile technology for automatic capture of images Image Capture, which was transferred to JavaScript. The core of the algorithm is written in C ++, so we used WebAssembly technology to transfer it to a web browser. In addition, we created UI components responsible for working with the camera. They were added to the JS API so that developers can simply and conveniently embed capture from the video stream in their web-based application or website. To make integration very simple, we added the source code of the sample web page code to the product distribution and show how to use our API correctly. In fact, the developer just needs to copy this code to his website and that's all - no more complicated than, for example, inserting a metric for traffic analysis.



After that, right in the browser, you can turn on the camcorder and point it at the document. And then ABBYY Mobile Web Capture acts:



1. When a document gets into the frame, we find its borders in the video stream.



2. Next, you will automatically capture a clear image of the document. To do this, you need to make sure that the person stopped moving the camera and has already "aimed" at the desired document. We do not rely on the sensors of the phone, because there is always the possibility that a person holds the phone motionless, but the other hand with the document moves. To avoid this, we evaluate whether the picture is moving, i.e. consider the offset between the objects from frame to frame. If it is minimal, then you can begin to capture. Additionally, we look at sharpness. Thus, the SDK automatically catches the moment when you need to take a picture of the document to get a high quality image. You do not have to press any buttons:





3. After we captured the frame with the document, cut it to the borders and align:







ABBYY Mobile Web Capture provides a good picture, which can then be sent for recognition, for example, in ABBYY FlexiCapture , and it will definitely be processed. On the example of our projects with ABBYY FlexiCapture, we see that it is often more convenient for customers to enter documents through a smartphone rather than a scanner. But images obtained in this way often turn out to be blurry - then they simply cannot be processed properly. Then the person who sent the photo from the phone is asked to "take a picture". But it is not always convenient for the client to do this.



When we developed ABBYY Mobile Web Capture, we realized that photographing a document in a browser in a video stream is not so simple. Firstly, the search for the boundaries of the document, the estimation of the offset and sharpness of the frame require computational resources - it was necessary to optimize so that the video stream in the browser does not slow down. Secondly, on iOS, we were faced with the fact that in Safari it was impossible to get video with a resolution higher than HD. The pictures we captured on iOS, even on cool XS iPhones, weren't very good. They simply could not be recognized, because to recognize an A4 document typed in 10th font, you need a full HD image and above. We wrote bug reports to Apple and asked to make it possible to work with the camera in high resolution from Safari. And they fixed it in iOS 12.2! Without this, our product would not work as it is now. And now - the SDK gives you good pictures, and then you can do whatever you want with them.



Of course, when we did the SDK, we evaluated what tasks end users needed to solve. Tell a little about them.



You need to open an account for the first time or take a loan from a bank, order insurance, rent a car or other service



Imagine you come to the bank. Although not, if you come, then everything is already wrong. Many people hate going to the bank. You think like that: "Eh, now at least half an hour now." Therefore, if you need to get some kind of cash loan as quickly as possible, you will most likely take a phone and google what options are available. Suppose you find information about a loan, and the site offers to fill out an application online. It’s convenient to arrange it just on the site, and not in the application - you need to install the application, but you still don’t know whether they will give you a loan or not. Why in advance to clog the phone with unnecessary applications? So, you press the button, and then you need to fill in your personal data.



It used to be like this: the bank asks to take the document to the camera or upload the finished photo from the gallery. People do this, but pictures may be blurry or without a document image at all - according to the experience of our projects, people often confuse and upload the wrong file at all. This is all sent to the bank, but the image does not fit, and everything goes down the drain.



Some banks try to get around the problem this way: they put “instant” recognition on the back. But here is another story: the client photographed the document, sent it to the back office for recognition. It takes about a minute. But when you make out a service from a mobile phone, then a minute is a very long time. During this time, you will most likely decide that everything is stuck, close the page and call tech support or drop everything and go to another site to apply for a loan.



With ABBYY Mobile Web Capture, the client does not need to install anything. He shoots a document in a video stream. Technology processes the photo and improves the image, which can now be easily sent to the bank.



You need to process a photo from your phone to send to a government agency, visa center, bank



Do you have a passport? It happens that there is no document at hand, but you photographed it and the image is stored in the gallery of your smartphone. It is always useful if you need to register somewhere or buy tickets on the airline’s website. Sometimes you may be asked to send a photo of a U-turn of your passport. What if the photo is not very clear or taken against a carpet? Do they understand this at the visa center? Probably, but it’s better not to risk it.



You can upload this photo to your site using ABBYY Mobile Web Capture, and the technology will find the borders of the document in the image. If there are no borders on it, we will issue a warning and draw the attention of the client to the fact that he probably downloaded something wrong. Plus, we will try to evaluate the quality of the document in order to understand whether it is suitable for further recognition. (“We’ll try” because this feature is now in technology preview mode, but we are actively working to improve it.)



Need to fill out a card in CRM without unnecessary torment



Here is an example: we have a potential customer - a car dealer, and he needs to know everything about the people to whom he sells cars. To do this, employees enter customer data on a web portal, such as a CRM system. They ask the client for the rights, scan them, and then retype the data into the computer. Why do employees use the web portal and not the mobile app? Very simple: the main task of the dealer is to sell cars and serve customers, rather than writing a lot of code for an internal application. Therefore, it is important for the company to quickly create a solution that will work on all platforms.



With ABBYY Mobile Web Capture, this business process can be simplified: an employee just needs to take a picture of a document on a smartphone, and then send the image for recognition and processing using our other product - ABBYY FlexiCapture. So in the end, we’ll save time and improve the quality of data.



To be honest, I myself have come across several situations where it would be very useful for me to scan documents directly in the browser. For example, when I applied for a visa a year ago, I spent about an hour to shoot all the necessary documents, transfer them to a computer, save in the desired format and upload to the site. And so in 15 minutes everything could just be photographed.



I really hope that soon ABBYY Mobile Web Capture will be used on many sites and will help simplify tasks that require photographing documents!



Olga Titova, Product owner Mobile SDK



All Articles