How to extract information from a form photo in Python using OCR

Having the ability to automate form submissions can save a lot of man hours. With modern OCR technology this capability is well within our reach, but it needs some additional help to optimize the process. In today’s tutorial less, we are going to explore an API option that can handle photo skewing, as well as extract information from specific form fields in the photo, and even handwriting recognition. All this can be accessed in a few simple steps, saving you the hours with deep learning that would normally be required.

To use our API we will need to install its client with pip install.

pip install cloudmersive-ocr-api-client

And now here’s our API function call:

from __future__ import print_functionimport timeimport cloudmersive_ocr_api_clientfrom cloudmersive_ocr_api_client.rest import ApiExceptionfrom pprint import pprint# Configure API key authorization: Apikeyconfiguration = cloudmersive_ocr_api_client.Configuration()configuration.api_key['Apikey'] = 'YOUR_API_KEY'# Uncomment below to setup prefix (e.g. Bearer) for API key, if needed# configuration.api_key_prefix['Apikey'] = 'Bearer'# create an instance of the API classapi_instance = cloudmersive_ocr_api_client.ImageOcrApi(cloudmersive_ocr_api_client.ApiClient(configuration))image_file = '/path/to/file' # file | Image file to perform OCR on.  Common file formats such as PNG, JPEG are supported.form_template_definition = NULL # object | Form field definitions (optional)recognition_mode = 'recognition_mode_example' # str | Optional, enable advanced recognition mode by specifying 'Advanced', enable handwriting recognition by specifying 'EnableHandwriting'.  Default is disabled. (optional)preprocessing = 'preprocessing_example' # str | Optional, preprocessing mode, default is 'Auto'.  Possible values are None (no preprocessing of the image), and Auto (automatic image enhancement of the image - including automatic unrotation of the image - before OCR is applied; this is recommended).  Set this to 'None' if you do not want to use automatic image unrotation and enhancement. (optional)diagnostics = 'diagnostics_example' # str | Optional, diagnostics mode, default is 'false'.  Possible values are 'true' (will set DiagnosticImage to a diagnostic PNG image in the result), and 'false' (no diagnostics are enabled; this is recommended for best performance). (optional)language = 'language_example' # str | Optional, language of the input document, default is English (ENG).  Possible values are ENG (English), ARA (Arabic), ZHO (Chinese - Simplified), ZHO-HANT (Chinese - Traditional), ASM (Assamese), AFR (Afrikaans), AMH (Amharic), AZE (Azerbaijani), AZE-CYRL (Azerbaijani - Cyrillic), BEL (Belarusian), BEN (Bengali), BOD (Tibetan), BOS (Bosnian), BUL (Bulgarian), CAT (Catalan; Valencian), CEB (Cebuano), CES (Czech), CHR (Cherokee), CYM (Welsh), DAN (Danish), DEU (German), DZO (Dzongkha), ELL (Greek), ENM (Archaic/Middle English), EPO (Esperanto), EST (Estonian), EUS (Basque), FAS (Persian), FIN (Finnish), FRA (French), FRK (Frankish), FRM (Middle-French), GLE (Irish), GLG (Galician), GRC (Ancient Greek), HAT (Hatian), HEB (Hebrew), HIN (Hindi), HRV (Croatian), HUN (Hungarian), IKU (Inuktitut), IND (Indonesian), ISL (Icelandic), ITA (Italian), ITA-OLD (Old - Italian), JAV (Javanese), JPN (Japanese), KAN (Kannada), KAT (Georgian), KAT-OLD (Old-Georgian), KAZ (Kazakh), KHM (Central Khmer), KIR (Kirghiz), KOR (Korean), KUR (Kurdish), LAO (Lao), LAT (Latin), LAV (Latvian), LIT (Lithuanian), MAL (Malayalam), MAR (Marathi), MKD (Macedonian), MLT (Maltese), MSA (Malay), MYA (Burmese), NEP (Nepali), NLD (Dutch), NOR (Norwegian), ORI (Oriya), PAN (Panjabi), POL (Polish), POR (Portuguese), PUS (Pushto), RON (Romanian), RUS (Russian), SAN (Sanskrit), SIN (Sinhala), SLK (Slovak), SLV (Slovenian), SPA (Spanish), SPA-OLD (Old Spanish), SQI (Albanian), SRP (Serbian), SRP-LAT (Latin Serbian), SWA (Swahili), SWE (Swedish), SYR (Syriac), TAM (Tamil), TEL (Telugu), TGK (Tajik), TGL (Tagalog), THA (Thai), TIR (Tigrinya), TUR (Turkish), UIG (Uighur), UKR (Ukrainian), URD (Urdu), UZB (Uzbek), UZB-CYR (Cyrillic Uzbek), VIE (Vietnamese), YID (Yiddish) (optional)try:# Recognize a photo of a form, extract key fields and business informationapi_response = api_instance.image_ocr_photo_recognize_form(image_file, form_template_definition=form_template_definition, recognition_mode=recognition_mode, preprocessing=preprocessing, diagnostics=diagnostics, language=language)pprint(api_response)except ApiException as e:print("Exception when calling ImageOcrApi->image_ocr_photo_recognize_form: %s\n" % e)

Now we can set up our various parameters, such as language, preprocessing, handwriting recognition, and form template. And that’s all there is to it!

Image for post
Image for post

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store