How to Convert a Scanned Image of a Document to Plain Text using C/C++

2 min readSep 6, 2023

Once we scan our documents, we’re only one step away from digitizing their contents — all we need is an Optical Character Recognition (OCR) service.

Using the below code, we can easily take advantage of a free OCR API specially designed to convert scanned documents into plain text. This API will return our resulting text string along with a confidence score analyzing the perceived success of the operation.

We first need to install libcurl in our project:

libcurl/7.75.0

After that, we can copy the following ready-to-run code examples into our file to structure our API call:

CURL *curl;
CURLcode res;
curl = curl_easy_init();
if(curl) {
     curl_easy_setopt(curl, CURLOPT_CUSTOMREQUEST, "POST");
     curl_easy_setopt(curl, CURLOPT_URL, "https://api.cloudmersive.com/ocr/image/toText");
     curl_easy_setopt(curl, CURLOPT_FOLLOWLOCATION, 1L);
     curl_easy_setopt(curl, CURLOPT_DEFAULT_PROTOCOL, "https");
     struct curl_slist *headers = NULL;
     headers = curl_slist_append(headers, "recognitionMode: <string>");
     headers = curl_slist_append(headers, "language: <string>");
     headers = curl_slist_append(headers, "preprocessing: <string>");
     headers = curl_slist_append(headers, "Content-Type: multipart/form-data");
     headers = curl_slist_append(headers, "Apikey: YOUR-API-KEY-HERE");
     curl_easy_setopt(curl, CURLOPT_HTTPHEADER, headers);
     curl_mime *mime;
     curl_mimepart *part;
     mime = curl_mime_init(curl);
     part = curl_mime_addpart(mime);
     curl_mime_name(part, "imageFile");
     curl_mime_filedata(part, "/path/to/file");
     curl_easy_setopt(curl, CURLOPT_MIMEPOST, mime);
     res = curl_easy_perform(curl);
     curl_mime_free(mime);
}
curl_easy_cleanup(curl);

We can authorize our requests with a free-tier API key to make up to 800 API calls per month (with no additional commitment).

In addition, we can customize the following request details:

Recognition Mode: Basic, Normal or Advanced (default is Advanced)
Language: three-letter language abbreviation; many common languages supported (default is ENG)
Preprocessing: Auto or None. This engages further image preparation prior to performing the OCR operation (default is Auto).

How to Convert a Scanned Image of a Document to Plain Text using C/C++

Written by Cloudmersive

No responses yet