How to convert a Rasterized PDF into a DOCX in Python

Cloudmersive
2 min readMay 11, 2020

--

In this article we will be rasterizing a PDF document, then converting that into a Word DOCX at high-fidelity. If this sounds difficult, it certainly would be, but not today. That’s because today we are using an API to do all the grunt work.

Install the API client with pip install.

pip install cloudmersive-convert-api-client

Now for our function call of convert_document_pdf_to_docx_rasterize. As you can see from the following block of code, you start by using a key to instantiate the Convert Document API. Then call the function with the desired file path.

from __future__ import print_functionimport timeimport cloudmersive_convert_api_clientfrom cloudmersive_convert_api_client.rest import ApiExceptionfrom pprint import pprint# Configure API key authorization: Apikeyconfiguration = cloudmersive_convert_api_client.Configuration()configuration.api_key['Apikey'] = 'YOUR_API_KEY'# Uncomment below to setup prefix (e.g. Bearer) for API key, if needed# configuration.api_key_prefix['Apikey'] = 'Bearer'# create an instance of the API classapi_instance = cloudmersive_convert_api_client.ConvertDocumentApi(cloudmersive_convert_api_client.ApiClient(configuration))input_file = '/path/to/file' # file | Input file to perform the operation on.try:# Convert PDF to Word DOCX Document based on rasterized version of the PDFapi_response = api_instance.convert_document_pdf_to_docx_rasterize(input_file)pprint(api_response)except ApiException as e:print("Exception when calling ConvertDocumentApi->convert_document_pdf_to_docx_rasterize: %s\n" % e)

Whew, that was easy. Got other document conversion needs? Look into the rest of this library, as it has a ton of other useful stuff like this.

--

--

Cloudmersive
Cloudmersive

Written by Cloudmersive

There’s an API for that. Cloudmersive is a leader in Highly Scalable Cloud APIs.

No responses yet