How to convert PDF to Text TXT in Python

Cloudmersive
2 min readMay 31, 2020

--

Our goal in today’s lesson is to extract all of the text from a PDF and create a plain text TXT file using Python. If you thought that this would be a time-consuming process, you would normally be right. Today, however, I’m going to show you the mother of all shortcuts, and we will be finished in very short order.

To start the process, we are going to install the API client that we are going to be using, like so.

pip install cloudmersive-convert-api-client

Now our API function call can be performed with this example code below. Optionally you may specify how you would like to deal with whitespace characters.

from __future__ import print_functionimport timeimport cloudmersive_convert_api_clientfrom cloudmersive_convert_api_client.rest import ApiExceptionfrom pprint import pprint# Configure API key authorization: Apikeyconfiguration = cloudmersive_convert_api_client.Configuration()configuration.api_key['Apikey'] = 'YOUR_API_KEY'# Uncomment below to setup prefix (e.g. Bearer) for API key, if needed# configuration.api_key_prefix['Apikey'] = 'Bearer'# create an instance of the API classapi_instance = cloudmersive_convert_api_client.ConvertDocumentApi(cloudmersive_convert_api_client.ApiClient(configuration))input_file = '/path/to/file' # file | Input file to perform the operation on.text_formatting_mode = 'text_formatting_mode_example' # str | Optional; specify how whitespace should be handled when converting PDF to text.  Possible values are 'preserveWhitespace' which will attempt to preserve whitespace in the document and relative positioning of text within the document, and 'minimizeWhitespace' which will not insert additional spaces into the document in most cases.  Default is 'preserveWhitespace'. (optional)try:# Convert PDF Document to Text (txt)api_response = api_instance.convert_document_pdf_to_txt(input_file, text_formatting_mode=text_formatting_mode)pprint(api_response)except ApiException as e:print("Exception when calling ConvertDocumentApi->convert_document_pdf_to_txt: %s\n" % e)

Alright, that’s done! Incredibly easy. Now get out there and start enjoying all that free time you just created.

--

--

Cloudmersive
Cloudmersive

Written by Cloudmersive

There’s an API for that. Cloudmersive is a leader in Highly Scalable Cloud APIs.

No responses yet