How to convert a scanned document into text with location with OCR in C# .NET Framework
3 min readJan 31, 2020
Today we will be looking at how to employ an API to simplify the task of performing optical character recognition (OCR). How simple, you ask? Five minutes is all it will take. Let’s get right to it.
To begin, we will use NuGet to install our API Client via the Package Manager console:
Install-Package Cloudmersive.APIClient.NET.OCR -Version 2.1.4
Now call ImageOcrImageWordsWithLocation and optionally select a target language from over 100 that are supported.
using System;using System.Diagnostics;using Cloudmersive.APIClient.NET.OCR.Api;using Cloudmersive.APIClient.NET.OCR.Client;using Cloudmersive.APIClient.NET.OCR.Model;namespace Example{public class ImageOcrImageWordsWithLocationExample{public void main(){// Configure API key authorization: ApikeyConfiguration.Default.AddApiKey("Apikey", "YOUR_API_KEY");// Uncomment below to setup prefix (e.g. Bearer) for API key, if needed// Configuration.Default.AddApiKeyPrefix("Apikey", "Bearer");var apiInstance = new ImageOcrApi();var imageFile = new System.IO.Stream(); // System.IO.Stream | Image file to perform OCR on. Common file formats such as PNG, JPEG are supported.var language = language_example; // string | Optional, language of the input document, default is English (ENG). Possible values are ENG (English), ARA (Arabic), ZHO (Chinese - Simplified), ZHO-HANT (Chinese - Traditional), ASM (Assamese), AFR (Afrikaans), AMH (Amharic), AZE (Azerbaijani), AZE-CYRL (Azerbaijani - Cyrillic), BEL (Belarusian), BEN (Bengali), BOD (Tibetan), BOS (Bosnian), BUL (Bulgarian), CAT (Catalan; Valencian), CEB (Cebuano), CES (Czech), CHR (Cherokee), CYM (Welsh), DAN (Danish), DEU (German), DZO (Dzongkha), ELL (Greek), ENM (Archaic/Middle English), EPO (Esperanto), EST (Estonian), EUS (Basque), FAS (Persian), FIN (Finnish), FRA (French), FRK (Frankish), FRM (Middle-French), GLE (Irish), GLG (Galician), GRC (Ancient Greek), HAT (Hatian), HEB (Hebrew), HIN (Hindi), HRV (Croatian), HUN (Hungarian), IKU (Inuktitut), IND (Indonesian), ISL (Icelandic), ITA (Italian), ITA-OLD (Old - Italian), JAV (Javanese), JPN (Japanese), KAN (Kannada), KAT (Georgian), KAT-OLD (Old-Georgian), KAZ (Kazakh), KHM (Central Khmer), KIR (Kirghiz), KOR (Korean), KUR (Kurdish), LAO (Lao), LAT (Latin), LAV (Latvian), LIT (Lithuanian), MAL (Malayalam), MAR (Marathi), MKD (Macedonian), MLT (Maltese), MSA (Malay), MYA (Burmese), NEP (Nepali), NLD (Dutch), NOR (Norwegian), ORI (Oriya), PAN (Panjabi), POL (Polish), POR (Portuguese), PUS (Pushto), RON (Romanian), RUS (Russian), SAN (Sanskrit), SIN (Sinhala), SLK (Slovak), SLV (Slovenian), SPA (Spanish), SPA-OLD (Old Spanish), SQI (Albanian), SRP (Serbian), SRP-LAT (Latin Serbian), SWA (Swahili), SWE (Swedish), SYR (Syriac), TAM (Tamil), TEL (Telugu), TGK (Tajik), TGL (Tagalog), THA (Thai), TIR (Tigrinya), TUR (Turkish), UIG (Uighur), UKR (Ukrainian), URD (Urdu), UZB (Uzbek), UZB-CYR (Cyrillic Uzbek), VIE (Vietnamese), YID (Yiddish) (optional)var preprocessing = preprocessing_example; // string | Optional, preprocessing mode, default is 'Auto'. Possible values are None (no preprocessing of the image), and Auto (automatic image enhancement of the image before OCR is applied; this is recommended). (optional)try{// Convert a scanned image into words with locationImageToWordsWithLocationResult result = apiInstance.ImageOcrImageWordsWithLocation(imageFile, language, preprocessing);Debug.WriteLine(result);}catch (Exception e){Debug.Print("Exception when calling ImageOcrApi.ImageOcrImageWordsWithLocation: " + e.Message );}}}}
Done! Our response will include the text found in the image as well as some location data.