inHow to extract information from a form photo in Java using OCR

Cloudmersive
4 min readJan 26, 2020

Today we will be looking at how to quickly set up form recognition with optical character recognition. This will allow us to extract information based on fields that we set up and even supports handwriting recognition. Unlike the manual approach, which can be very time consuming, we will be done in about 5 minutes. Let’s get right to it.

First, we will set up our library. We will accomplish this using Jitpack, so we must add two references to Maven POM, as follows.

  1. Repository
<repositories>
<repository>
<id>jitpack.io</id>
<url>https://jitpack.io</url>
</repository>
</repositories>

2. Dependency

<dependencies>
<dependency>
<groupId>com.github.Cloudmersive</groupId>
<artifactId>Cloudmersive.APIClient.Java</artifactId>
<version>v2.75</version>
</dependency>
</dependencies>

Now we must call imageOcrPhotoRecognizeForm, as demonstrated below.

// Import classes://import com.cloudmersive.client.invoker.ApiClient;//import com.cloudmersive.client.invoker.ApiException;//import com.cloudmersive.client.invoker.Configuration;//import com.cloudmersive.client.invoker.auth.*;//import com.cloudmersive.client.ImageOcrApi;ApiClient defaultClient = Configuration.getDefaultApiClient();// Configure API key authorization: ApikeyApiKeyAuth Apikey = (ApiKeyAuth) defaultClient.getAuthentication("Apikey");Apikey.setApiKey("YOUR API KEY");// Uncomment the following line to set a prefix for the API key, e.g. "Token" (defaults to null)//Apikey.setApiKeyPrefix("Token");ImageOcrApi apiInstance = new ImageOcrApi();File imageFile = new File("/path/to/file"); // File | Image file to perform OCR on.  Common file formats such as PNG, JPEG are supported.Object formTemplateDefinition = null; // Object | Form field definitionsString recognitionMode = "recognitionMode_example"; // String | Optional, enable advanced recognition mode by specifying 'Advanced', enable handwriting recognition by specifying 'EnableHandwriting'.  Default is disabled.String preprocessing = "preprocessing_example"; // String | Optional, preprocessing mode, default is 'Auto'.  Possible values are None (no preprocessing of the image), and Auto (automatic image enhancement of the image - including automatic unrotation of the image - before OCR is applied; this is recommended).  Set this to 'None' if you do not want to use automatic image unrotation and enhancement.String diagnostics = "diagnostics_example"; // String | Optional, diagnostics mode, default is 'false'.  Possible values are 'true' (will set DiagnosticImage to a diagnostic PNG image in the result), and 'false' (no diagnostics are enabled; this is recommended for best performance).String language = "language_example"; // String | Optional, language of the input document, default is English (ENG).  Possible values are ENG (English), ARA (Arabic), ZHO (Chinese - Simplified), ZHO-HANT (Chinese - Traditional), ASM (Assamese), AFR (Afrikaans), AMH (Amharic), AZE (Azerbaijani), AZE-CYRL (Azerbaijani - Cyrillic), BEL (Belarusian), BEN (Bengali), BOD (Tibetan), BOS (Bosnian), BUL (Bulgarian), CAT (Catalan; Valencian), CEB (Cebuano), CES (Czech), CHR (Cherokee), CYM (Welsh), DAN (Danish), DEU (German), DZO (Dzongkha), ELL (Greek), ENM (Archaic/Middle English), EPO (Esperanto), EST (Estonian), EUS (Basque), FAS (Persian), FIN (Finnish), FRA (French), FRK (Frankish), FRM (Middle-French), GLE (Irish), GLG (Galician), GRC (Ancient Greek), HAT (Hatian), HEB (Hebrew), HIN (Hindi), HRV (Croatian), HUN (Hungarian), IKU (Inuktitut), IND (Indonesian), ISL (Icelandic), ITA (Italian), ITA-OLD (Old - Italian), JAV (Javanese), JPN (Japanese), KAN (Kannada), KAT (Georgian), KAT-OLD (Old-Georgian), KAZ (Kazakh), KHM (Central Khmer), KIR (Kirghiz), KOR (Korean), KUR (Kurdish), LAO (Lao), LAT (Latin), LAV (Latvian), LIT (Lithuanian), MAL (Malayalam), MAR (Marathi), MKD (Macedonian), MLT (Maltese), MSA (Malay), MYA (Burmese), NEP (Nepali), NLD (Dutch), NOR (Norwegian), ORI (Oriya), PAN (Panjabi), POL (Polish), POR (Portuguese), PUS (Pushto), RON (Romanian), RUS (Russian), SAN (Sanskrit), SIN (Sinhala), SLK (Slovak), SLV (Slovenian), SPA (Spanish), SPA-OLD (Old Spanish), SQI (Albanian), SRP (Serbian), SRP-LAT (Latin Serbian), SWA (Swahili), SWE (Swedish), SYR (Syriac), TAM (Tamil), TEL (Telugu), TGK (Tajik), TGL (Tagalog), THA (Thai), TIR (Tigrinya), TUR (Turkish), UIG (Uighur), UKR (Ukrainian), URD (Urdu), UZB (Uzbek), UZB-CYR (Cyrillic Uzbek), VIE (Vietnamese), YID (Yiddish)try {FormRecognitionResult result = apiInstance.imageOcrPhotoRecognizeForm(imageFile, formTemplateDefinition, recognitionMode, preprocessing, diagnostics, language);System.out.println(result);} catch (ApiException e) {System.err.println("Exception when calling ImageOcrApi#imageOcrPhotoRecognizeForm");e.printStackTrace();}

As you can see, there are numerous options for customization. Specifying the language is recommended to help with accuracy. Handwriting recognition can be set with EnableHandwriting. Here is an example output:

{
"Successful": true,
"FieldValueExtractionResult": [
{
"TargetField": {
"FieldID": "string",
"LeftAnchor": "string",
"TopAnchor": "string",
"BottomAnchor": "string",
"AnchorMode": "string",
"DataType": "string",
"TargetDigitCount": 0,
"MinimumCharacterCount": 0,
"AllowNumericDigits": true,
"VerticalAlignmentType": "string",
"HorizontalAlignmentType": "string",
"TargetFieldWidth_Relative": 0,
"TargetFieldHeight_Relative": 0,
"TargetFieldHorizontalAdjustment": 0,
"TargetFieldVerticalAdjustment": 0,
"Ignore": [
"string"
],
"Options": "string"
},
"FieldValues": [
{
"Text": "string",
"XLeft": 0,
"YTop": 0,
"Width": 0,
"Height": 0,
"BoundingPoints": [
{
"X": 0,
"Y": 0
}
],
"ConfidenceLevel": 0
}
]
}
],
"TableValueExtractionResults": [
{
"TableDefinition": {
"TableID": "string",
"ColumnDefinitions": [
{
"ColumnID": "string",
"TopAnchor": "string",
"AnchorMode": "string",
"DataType": "string",
"MinimumCharacterCount": 0,
"AllowNumericDigits": true
}
],
"TargetTableHeight_Relative": 0,
"TargetRowHeight_Relative": 0
},
"TableRowsResult": [
{
"TableRowCellsResult": [
{
"ColumnID": "string",
"CellValues": [
{
"Text": "string",
"XLeft": 0,
"YTop": 0,
"Width": 0,
"Height": 0,
"BoundingPoints": [
{
"X": 0,
"Y": 0
}
],
"ConfidenceLevel": 0
}
]
}
]
}
]
}
],
"Diagnostics": [
"string"
],
"BestMatchFormSettingName": "string"
}

--

--

Cloudmersive

There’s an API for that. Cloudmersive is a leader in Highly Scalable Cloud APIs.