How to convert a scanned image into text in Node.JS using OCR

Optical character recognition is absolutely the bee’s knees when it comes to converting images into text. This is normally the result of lengthy forays into the realm of Deep Learning and other such difficult endeavors. So how can we keep this simple? Well, what if there was an OCR system already set up that you could access through an API? Let’s take a look.

So we need to set up our client by adding this reference to our packages.json file.

"dependencies": {
"cloudmersive-ocr-api-client": "^1.2.7"
}

When that’s done with, continue by invoking this function here:

var CloudmersiveOcrApiClient = require('cloudmersive-ocr-api-client');var defaultClient = CloudmersiveOcrApiClient.ApiClient.instance;// Configure API key authorization: Apikeyvar Apikey = defaultClient.authentications['Apikey'];Apikey.apiKey = 'YOUR API KEY';// Uncomment the following line to set a prefix for the API key, e.g. "Token" (defaults to null)//Apikey.apiKeyPrefix = 'Token';var apiInstance = new CloudmersiveOcrApiClient.ImageOcrApi();var imageFile = "/path/to/file"; // File | Image file to perform OCR on.  Common file formats such as PNG, JPEG are supported.var opts = {'recognitionMode': "recognitionMode_example", // String | Optional; possible values are 'Basic' which provides basic recognition and is not resillient to page rotation, skew or low quality images uses 1-2 API calls; 'Normal' which provides highly fault tolerant OCR recognition uses 14-16 API calls; and 'Advanced' which provides the highest quality and most fault-tolerant recognition uses 28-30 API calls.  Default recognition mode is 'Advanced''language': "language_example", // String | Optional, language of the input document, default is English (ENG).  Possible values are ENG (English), ARA (Arabic), ZHO (Chinese - Simplified), ZHO-HANT (Chinese - Traditional), ASM (Assamese), AFR (Afrikaans), AMH (Amharic), AZE (Azerbaijani), AZE-CYRL (Azerbaijani - Cyrillic), BEL (Belarusian), BEN (Bengali), BOD (Tibetan), BOS (Bosnian), BUL (Bulgarian), CAT (Catalan; Valencian), CEB (Cebuano), CES (Czech), CHR (Cherokee), CYM (Welsh), DAN (Danish), DEU (German), DZO (Dzongkha), ELL (Greek), ENM (Archaic/Middle English), EPO (Esperanto), EST (Estonian), EUS (Basque), FAS (Persian), FIN (Finnish), FRA (French), FRK (Frankish), FRM (Middle-French), GLE (Irish), GLG (Galician), GRC (Ancient Greek), HAT (Hatian), HEB (Hebrew), HIN (Hindi), HRV (Croatian), HUN (Hungarian), IKU (Inuktitut), IND (Indonesian), ISL (Icelandic), ITA (Italian), ITA-OLD (Old - Italian), JAV (Javanese), JPN (Japanese), KAN (Kannada), KAT (Georgian), KAT-OLD (Old-Georgian), KAZ (Kazakh), KHM (Central Khmer), KIR (Kirghiz), KOR (Korean), KUR (Kurdish), LAO (Lao), LAT (Latin), LAV (Latvian), LIT (Lithuanian), MAL (Malayalam), MAR (Marathi), MKD (Macedonian), MLT (Maltese), MSA (Malay), MYA (Burmese), NEP (Nepali), NLD (Dutch), NOR (Norwegian), ORI (Oriya), PAN (Panjabi), POL (Polish), POR (Portuguese), PUS (Pushto), RON (Romanian), RUS (Russian), SAN (Sanskrit), SIN (Sinhala), SLK (Slovak), SLV (Slovenian), SPA (Spanish), SPA-OLD (Old Spanish), SQI (Albanian), SRP (Serbian), SRP-LAT (Latin Serbian), SWA (Swahili), SWE (Swedish), SYR (Syriac), TAM (Tamil), TEL (Telugu), TGK (Tajik), TGL (Tagalog), THA (Thai), TIR (Tigrinya), TUR (Turkish), UIG (Uighur), UKR (Ukrainian), URD (Urdu), UZB (Uzbek), UZB-CYR (Cyrillic Uzbek), VIE (Vietnamese), YID (Yiddish)'preprocessing': "preprocessing_example" // String | Optional, preprocessing mode, default is 'Auto'.  Possible values are None (no preprocessing of the image), and Auto (automatic image enhancement of the image before OCR is applied; this is recommended).};var callback = function(error, data, response) {if (error) {console.error(error);} else {console.log('API called successfully. Returned data: ' + data);}};apiInstance.imageOcrPost(imageFile, opts, callback);

If you know the language the input image will be in, this can greatly increase the effectiveness of the algorithm. Otherwise, that’s about all there is to it.

Image for post
Image for post

Written by

There’s an API for that. Cloudmersive is a leader in Highly Scalable Cloud APIs.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store