Convert a Scanned Image into Text in Python

Optical character recognition continues to make the convergence of our digital & analog lives dramatically easier. Enabling an application to read a physical document saves countless hours of manual transposition. With the Cloudmersive Optical Character Recognition (OCR) API, you can take advantage of high-powered OCR technology and enable your application to convert an uploaded image (common formats supported — JPEG, PNG, etc.) into text. This API is designed to handle documents with large amounts of text, so it’s perfect for processing & digitizing key information from your physical documents. It’s easy to use this API with ready-to-run code from the Cloudmersive API console page; below, we’ll walk through how to connect in Python.

Your first step will be to run the below Python SDK installation command:

pip install cloudmersive-ocr-api-client

After that, start by copying the below two snippets to begin the API call function. In the second of the two, you’ll need to provide your Cloudmersive API key for authorization.

from __future__ import print_function
import time
import cloudmersive_ocr_api_client
from cloudmersive_ocr_api_client.rest import ApiException
from pprint import pprint
# Configure API key authorization: Apikey
configuration = cloudmersive_ocr_api_client.Configuration()
configuration.api_key['Apikey'] = 'YOUR_API_KEY'

For the final step, copy the below code & be sure to review the documentation. After providing the file for OCR processing at this stage, you’ll also have the option to specify your output language in any one of the dozens of options below:

# create an instance of the API class
api_instance = cloudmersive_ocr_api_client.ImageOcrApi(cloudmersive_ocr_api_client.ApiClient(configuration))
image_file = '/path/to/inputfile' # file | Image file to perform OCR on. Common file formats such as PNG, JPEG are supported.
recognition_mode = 'recognition_mode_example' # str | Optional; possible values are 'Basic' which provides basic recognition and is not resillient to page rotation, skew or low quality images uses 1-2 API calls; 'Normal' which provides highly fault tolerant OCR recognition uses 26-30 API calls; and 'Advanced' which provides the highest quality and most fault-tolerant recognition uses 28-30 API calls. Default recognition mode is 'Advanced' (optional)
language = 'language_example' # str | Optional, language of the input document, default is English (ENG). Possible values are ENG (English), ARA (Arabic), ZHO (Chinese - Simplified), ZHO-HANT (Chinese - Traditional), ASM (Assamese), AFR (Afrikaans), AMH (Amharic), AZE (Azerbaijani), AZE-CYRL (Azerbaijani - Cyrillic), BEL (Belarusian), BEN (Bengali), BOD (Tibetan), BOS (Bosnian), BUL (Bulgarian), CAT (Catalan; Valencian), CEB (Cebuano), CES (Czech), CHR (Cherokee), CYM (Welsh), DAN (Danish), DEU (German), DZO (Dzongkha), ELL (Greek), ENM (Archaic/Middle English), EPO (Esperanto), EST (Estonian), EUS (Basque), FAS (Persian), FIN (Finnish), FRA (French), FRK (Frankish), FRM (Middle-French), GLE (Irish), GLG (Galician), GRC (Ancient Greek), HAT (Hatian), HEB (Hebrew), HIN (Hindi), HRV (Croatian), HUN (Hungarian), IKU (Inuktitut), IND (Indonesian), ISL (Icelandic), ITA (Italian), ITA-OLD (Old - Italian), JAV (Javanese), JPN (Japanese), KAN (Kannada), KAT (Georgian), KAT-OLD (Old-Georgian), KAZ (Kazakh), KHM (Central Khmer), KIR (Kirghiz), KOR (Korean), KUR (Kurdish), LAO (Lao), LAT (Latin), LAV (Latvian), LIT (Lithuanian), MAL (Malayalam), MAR (Marathi), MKD (Macedonian), MLT (Maltese), MSA (Malay), MYA (Burmese), NEP (Nepali), NLD (Dutch), NOR (Norwegian), ORI (Oriya), PAN (Panjabi), POL (Polish), POR (Portuguese), PUS (Pushto), RON (Romanian), RUS (Russian), SAN (Sanskrit), SIN (Sinhala), SLK (Slovak), SLV (Slovenian), SPA (Spanish), SPA-OLD (Old Spanish), SQI (Albanian), SRP (Serbian), SRP-LAT (Latin Serbian), SWA (Swahili), SWE (Swedish), SYR (Syriac), TAM (Tamil), TEL (Telugu), TGK (Tajik), TGL (Tagalog), THA (Thai), TIR (Tigrinya), TUR (Turkish), UIG (Uighur), UKR (Ukrainian), URD (Urdu), UZB (Uzbek), UZB-CYR (Cyrillic Uzbek), VIE (Vietnamese), YID (Yiddish) (optional)
preprocessing = 'preprocessing_example' # str | Optional, preprocessing mode, default is 'Auto'. Possible values are None (no preprocessing of the image), and Auto (automatic image enhancement of the image before OCR is applied; this is recommended). (optional)
try:
# Convert a scanned image into text
api_response = api_instance.image_ocr_post(image_file, recognition_mode=recognition_mode, language=language, preprocessing=preprocessing)
pprint(api_response)
except ApiException as e:
print("Exception when calling ImageOcrApi->image_ocr_post: %s\n" % e)

--

--

--

There’s an API for that. Cloudmersive is a leader in Highly Scalable Cloud APIs.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Unpacking Rendanheyi, Part 2: Microservices, APIs, and Minimum-Viable Products

Migrating WordPress Website To Lightsail On AWS

Migrating WordPress Website To Lightsail On AWS – Part 3

Hackathon Decoded

Set the Language Code of an HTML Document in Java

5 Tips for Managing Your Open Source Components Wisely

Creating a Successful Mobile Application

5 Things I Learned From a Corporate Software Engineering Bootcamp

Why Do We Need Next-Gen Databases?

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Cloudmersive

Cloudmersive

There’s an API for that. Cloudmersive is a leader in Highly Scalable Cloud APIs.

More from Medium

ModuleNotFoundError at /accounts/signup/

Building a Twitter clone-Part 1(setup)

Zipping Files In-Memory Using Django

Web Development in Python — Here’s What You Should Know!