Deep Learning OCR on Document and Receipt Photos in Javascript

A little optical character recognition (OCR) can go a long way toward expanding your options in terms of functionality. Rather than go through the entire Deep Learning process, we are instead going to use an API with a pre-trained OCR AI on board. Believe me, this is going to simplify matters immensely.

So first we need to import the API client via this script tag, by adding it between the head tags of a page, or in your HTML file.

<script src="https://cdn.cloudmersive.com/jsclient/cloudmersive-ocr-client.js"></script>

Now go ahead and and call our OCR function like this:

var CloudmersiveOcrApiClient = require('cloudmersive-ocr-api-client');var defaultClient = CloudmersiveOcrApiClient.ApiClient.instance;// Configure API key authorization: Apikeyvar Apikey = defaultClient.authentications['Apikey'];Apikey.apiKey = 'YOUR API KEY';// Uncomment the following line to set a prefix for the API key, e.g. "Token" (defaults to null)//Apikey.apiKeyPrefix = 'Token';var apiInstance = new CloudmersiveOcrApiClient.ImageOcrApi();var imageFile = "/path/to/file"; // File | Image file to perform OCR on.  Common file formats such as PNG, JPEG are supported.var opts = {'recognitionMode': "recognitionMode_example", // String | Optional; possible values are 'Basic' which provides basic recognition and is not resillient to page rotation, skew or low quality images uses 1-2 API calls; 'Normal' which provides highly fault tolerant OCR recognition uses 14-16 API calls; and 'Advanced' which provides the highest quality and most fault-tolerant recognition uses 28-30 API calls.  Default recognition mode is 'Advanced''language': "language_example" // String | Optional, language of the input document, default is English (ENG).  Possible values are ENG (English), ARA (Arabic), ZHO (Chinese - Simplified), ZHO-HANT (Chinese - Traditional), ASM (Assamese), AFR (Afrikaans), AMH (Amharic), AZE (Azerbaijani), AZE-CYRL (Azerbaijani - Cyrillic), BEL (Belarusian), BEN (Bengali), BOD (Tibetan), BOS (Bosnian), BUL (Bulgarian), CAT (Catalan; Valencian), CEB (Cebuano), CES (Czech), CHR (Cherokee), CYM (Welsh), DAN (Danish), DEU (German), DZO (Dzongkha), ELL (Greek), ENM (Archaic/Middle English), EPO (Esperanto), EST (Estonian), EUS (Basque), FAS (Persian), FIN (Finnish), FRA (French), FRK (Frankish), FRM (Middle-French), GLE (Irish), GLG (Galician), GRC (Ancient Greek), HAT (Hatian), HEB (Hebrew), HIN (Hindi), HRV (Croatian), HUN (Hungarian), IKU (Inuktitut), IND (Indonesian), ISL (Icelandic), ITA (Italian), ITA-OLD (Old - Italian), JAV (Javanese), JPN (Japanese), KAN (Kannada), KAT (Georgian), KAT-OLD (Old-Georgian), KAZ (Kazakh), KHM (Central Khmer), KIR (Kirghiz), KOR (Korean), KUR (Kurdish), LAO (Lao), LAT (Latin), LAV (Latvian), LIT (Lithuanian), MAL (Malayalam), MAR (Marathi), MKD (Macedonian), MLT (Maltese), MSA (Malay), MYA (Burmese), NEP (Nepali), NLD (Dutch), NOR (Norwegian), ORI (Oriya), PAN (Panjabi), POL (Polish), POR (Portuguese), PUS (Pushto), RON (Romanian), RUS (Russian), SAN (Sanskrit), SIN (Sinhala), SLK (Slovak), SLV (Slovenian), SPA (Spanish), SPA-OLD (Old Spanish), SQI (Albanian), SRP (Serbian), SRP-LAT (Latin Serbian), SWA (Swahili), SWE (Swedish), SYR (Syriac), TAM (Tamil), TEL (Telugu), TGK (Tajik), TGL (Tagalog), THA (Thai), TIR (Tigrinya), TUR (Turkish), UIG (Uighur), UKR (Ukrainian), URD (Urdu), UZB (Uzbek), UZB-CYR (Cyrillic Uzbek), VIE (Vietnamese), YID (Yiddish)};var callback = function(error, data, response) {if (error) {console.error(error);} else {console.log('API called successfully. Returned data: ' + data);}};apiInstance.imageOcrPhotoToText(imageFile, opts, callback);

Specify the language, if known, to provide more accurate results. If you need faster results, you may change the recognition mode, though this might sacrifice accuracy. Then input your image and the API will return your results shortly thereafter.

Image for post
Image for post

Written by

There’s an API for that. Cloudmersive is a leader in Highly Scalable Cloud APIs.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store