How to convert a scanned image into text in Java using OCR
Extracting text from an image is an incredibly useful function, allowing you to instantly turn a stack of physical documents into files on your computer. This has a wide range of applications; everything from scanner apps to office software can benefit. Today we will be skipping all of the hard work normally required to set this up, and instead getting straight to the results. We can accomplish this remarkably quickly by employing an API to do all the hard stuff. Let us begin.
Our initial step is to set up our library, which we will dynamically compile with Jitpack. Toward this end, we will need to add references to Maven POM, as follows:
Repository.
<repositories>
<repository>
<id>jitpack.io</id>
<url>https://jitpack.io</url>
</repository>
</repositories>
Dependency.
<dependencies>
<dependency>
<groupId>com.github.Cloudmersive</groupId>
<artifactId>Cloudmersive.APIClient.Java</artifactId>
<version>v2.75</version>
</dependency>
</dependencies>
Now we invoke imageOcrPost with the following code:
// Import classes://import com.cloudmersive.client.invoker.ApiClient;//import com.cloudmersive.client.invoker.ApiException;//import com.cloudmersive.client.invoker.Configuration;//import com.cloudmersive.client.invoker.auth.*;//import com.cloudmersive.client.ImageOcrApi;ApiClient defaultClient = Configuration.getDefaultApiClient();// Configure API key authorization: ApikeyApiKeyAuth Apikey = (ApiKeyAuth) defaultClient.getAuthentication("Apikey");Apikey.setApiKey("YOUR API KEY");// Uncomment the following line to set a prefix for the API key, e.g. "Token" (defaults to null)//Apikey.setApiKeyPrefix("Token");ImageOcrApi apiInstance = new ImageOcrApi();File imageFile = new File("/path/to/file"); // File | Image file to perform OCR on. Common file formats such as PNG, JPEG are supported.String recognitionMode = "recognitionMode_example"; // String | Optional; possible values are 'Basic' which provides basic recognition and is not resillient to page rotation, skew or low quality images uses 1-2 API calls; 'Normal' which provides highly fault tolerant OCR recognition uses 14-16 API calls; and 'Advanced' which provides the highest quality and most fault-tolerant recognition uses 28-30 API calls. Default recognition mode is 'Advanced'String language = "language_example"; // String | Optional, language of the input document, default is English (ENG). Possible values are ENG (English), ARA (Arabic), ZHO (Chinese - Simplified), ZHO-HANT (Chinese - Traditional), ASM (Assamese), AFR (Afrikaans), AMH (Amharic), AZE (Azerbaijani), AZE-CYRL (Azerbaijani - Cyrillic), BEL (Belarusian), BEN (Bengali), BOD (Tibetan), BOS (Bosnian), BUL (Bulgarian), CAT (Catalan; Valencian), CEB (Cebuano), CES (Czech), CHR (Cherokee), CYM (Welsh), DAN (Danish), DEU (German), DZO (Dzongkha), ELL (Greek), ENM (Archaic/Middle English), EPO (Esperanto), EST (Estonian), EUS (Basque), FAS (Persian), FIN (Finnish), FRA (French), FRK (Frankish), FRM (Middle-French), GLE (Irish), GLG (Galician), GRC (Ancient Greek), HAT (Hatian), HEB (Hebrew), HIN (Hindi), HRV (Croatian), HUN (Hungarian), IKU (Inuktitut), IND (Indonesian), ISL (Icelandic), ITA (Italian), ITA-OLD (Old - Italian), JAV (Javanese), JPN (Japanese), KAN (Kannada), KAT (Georgian), KAT-OLD (Old-Georgian), KAZ (Kazakh), KHM (Central Khmer), KIR (Kirghiz), KOR (Korean), KUR (Kurdish), LAO (Lao), LAT (Latin), LAV (Latvian), LIT (Lithuanian), MAL (Malayalam), MAR (Marathi), MKD (Macedonian), MLT (Maltese), MSA (Malay), MYA (Burmese), NEP (Nepali), NLD (Dutch), NOR (Norwegian), ORI (Oriya), PAN (Panjabi), POL (Polish), POR (Portuguese), PUS (Pushto), RON (Romanian), RUS (Russian), SAN (Sanskrit), SIN (Sinhala), SLK (Slovak), SLV (Slovenian), SPA (Spanish), SPA-OLD (Old Spanish), SQI (Albanian), SRP (Serbian), SRP-LAT (Latin Serbian), SWA (Swahili), SWE (Swedish), SYR (Syriac), TAM (Tamil), TEL (Telugu), TGK (Tajik), TGL (Tagalog), THA (Thai), TIR (Tigrinya), TUR (Turkish), UIG (Uighur), UKR (Ukrainian), URD (Urdu), UZB (Uzbek), UZB-CYR (Cyrillic Uzbek), VIE (Vietnamese), YID (Yiddish)String preprocessing = "preprocessing_example"; // String | Optional, preprocessing mode, default is 'Auto'. Possible values are None (no preprocessing of the image), and Auto (automatic image enhancement of the image before OCR is applied; this is recommended).try {ImageToTextResponse result = apiInstance.imageOcrPost(imageFile, recognitionMode, language, preprocessing);System.out.println(result);} catch (ApiException e) {System.err.println("Exception when calling ImageOcrApi#imageOcrPost");e.printStackTrace();}
As you can see, we must provide an imageFile, and can optionally provide the language of the text, which will improve accuracy. Note that this function is specifically designed for scanned images; if you would like to extract text from photos taken with a camera (such as from a smart phone), you should employ imageOcrPhotoToText instead, as this will adjust for camera skewing, etc.