How to Convert a PDF to Text with OCR using C/C++

Cloudmersive
2 min readSep 6, 2023

--

Raster (image-based) PDF files can be easily converted to text via Optical Character Recognition services.

Using the below code, we can take advantage of a free API designed to extract text contents from raster PDFs and return the contents in a plain text string.

We’ll first need to install Libcurl in our project:

libcurl/7.75.0

Then we can copy & paste the ready-to-run code below directly into our file. We’ll need to provide a free-tier Cloudmersive API key (this allows 800 API calls per month with no additional commitment) to authorize our requests:

CURL *curl;
CURLcode res;
curl = curl_easy_init();
if(curl) {
curl_easy_setopt(curl, CURLOPT_CUSTOMREQUEST, "POST");
curl_easy_setopt(curl, CURLOPT_URL, "https://api.cloudmersive.com/ocr/pdf/toText");
curl_easy_setopt(curl, CURLOPT_FOLLOWLOCATION, 1L);
curl_easy_setopt(curl, CURLOPT_DEFAULT_PROTOCOL, "https");
struct curl_slist *headers = NULL;
headers = curl_slist_append(headers, "recognitionMode: <string>");
headers = curl_slist_append(headers, "language: <string>");
headers = curl_slist_append(headers, "preprocessing: <string>");
headers = curl_slist_append(headers, "Content-Type: multipart/form-data");
headers = curl_slist_append(headers, "Apikey: YOUR-API-KEY-HERE");
curl_easy_setopt(curl, CURLOPT_HTTPHEADER, headers);
curl_mime *mime;
curl_mimepart *part;
mime = curl_mime_init(curl);
part = curl_mime_addpart(mime);
curl_mime_name(part, "imageFile");
curl_mime_filedata(part, "/path/to/file");
curl_easy_setopt(curl, CURLOPT_MIMEPOST, mime);
res = curl_easy_perform(curl);
curl_mime_free(mime);
}
curl_easy_cleanup(curl);

We can optionally set our request’s recognitionMode to Basic, Normal or Advanced (default) to improve our fault tolerance, and we can set the language (default ENG) using its three-letter abbreviation.

--

--

Cloudmersive
Cloudmersive

Written by Cloudmersive

There’s an API for that. Cloudmersive is a leader in Highly Scalable Cloud APIs.

No responses yet