Let’s say you have a few thousand document files in a random assortment of different formats with no organization to speak of. Converting all of those into image files would be a true headache, requiring first identifying each of the file formats (which are sometimes mislabeled or even lacking extension entirely), then setting up parsing for each of these different formats. And once that’s done, there’s still the matter of splitting your pages and generating images from them, which is a veritable quagmire in and of itself. I’m going to toss you a lifesaver in the form of an API that will deal with ALL of this mess in your stead. Its setup is easy:
First we install the client library:
pip install cloudmersive-convert-api-client
Now we call convert_document_autodetect_to_jpg:
from __future__ import print_functionimport timeimport cloudmersive_convert_api_clientfrom cloudmersive_convert_api_client.rest import ApiExceptionfrom pprint import pprint# Configure API key authorization: Apikeyconfiguration = cloudmersive_convert_api_client.Configuration()configuration.api_key['Apikey'] = 'YOUR_API_KEY'# Uncomment below to setup prefix (e.g. Bearer) for API key, if needed# configuration.api_key_prefix['Apikey'] = 'Bearer'# create an instance of the API classapi_instance = cloudmersive_convert_api_client.ConvertDocumentApi(cloudmersive_convert_api_client.ApiClient(configuration))input_file = '/path/to/file' # file | Input file to perform the operation on.quality = 56 # int | Optional; Set the JPEG quality level; lowest quality is 1 (highest compression), highest quality (lowest compression) is 100; recommended value is 75. Default value is 75. (optional)try:# Convert Document to JPG/JPEG image arrayapi_response = api_instance.convert_document_autodetect_to_jpg(input_file, quality=quality)pprint(api_response)except ApiException as e:print("Exception when calling ConvertDocumentApi->convert_document_autodetect_to_jpg: %s\n" % e)
And that’s done with! Any document type that you feed in as an argument will be identified and returned as an array of JPGs as byte arrays. Super simple.