How to Convert any Document File to JPG in Python

Let’s say you have a few thousand document files in a random assortment of different formats with no organization to speak of. Converting all of those into image files would be a true headache, requiring first identifying each of the file formats (which are sometimes mislabeled or even lacking extension entirely), then setting up parsing for each of these different formats. And once that’s done, there’s still the matter of splitting your pages and generating images from them, which is a veritable quagmire in and of itself. I’m going to toss you a lifesaver in the form of an API that will deal with ALL of this mess in your stead. Its setup is easy:

First we install the client library:

pip install cloudmersive-convert-api-client

Now we call convert_document_autodetect_to_jpg:

from __future__ import print_functionimport timeimport cloudmersive_convert_api_clientfrom cloudmersive_convert_api_client.rest import ApiExceptionfrom pprint import pprint# Configure API key authorization: Apikeyconfiguration = cloudmersive_convert_api_client.Configuration()configuration.api_key['Apikey'] = 'YOUR_API_KEY'# Uncomment below to setup prefix (e.g. Bearer) for API key, if needed# configuration.api_key_prefix['Apikey'] = 'Bearer'# create an instance of the API classapi_instance = cloudmersive_convert_api_client.ConvertDocumentApi(cloudmersive_convert_api_client.ApiClient(configuration))input_file = '/path/to/file' # file | Input file to perform the operation on.quality = 56 # int | Optional; Set the JPEG quality level; lowest quality is 1 (highest compression), highest quality (lowest compression) is 100; recommended value is 75. Default value is 75. (optional)try:# Convert Document to JPG/JPEG image arrayapi_response = api_instance.convert_document_autodetect_to_jpg(input_file, quality=quality)pprint(api_response)except ApiException as e:print("Exception when calling ConvertDocumentApi->convert_document_autodetect_to_jpg: %s\n" % e)

And that’s done with! Any document type that you feed in as an argument will be identified and returned as an array of JPGs as byte arrays. Super simple.

Image for post
Image for post

Written by

There’s an API for that. Cloudmersive is a leader in Highly Scalable Cloud APIs.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store