How to Convert HTML to Text in Python

Cloudmersive
2 min readMar 13, 2024

--

Without writing any new code, we can easily get plain text from HTML documents in Python.

To make our HTML to text conversion, we can simply call a free API with complementary Python code.

Our first step is to install the client SDK via pip install:

pip install cloudmersive-convert-api-client

After that, we can add the imports and call the function (with our HTML file path included) by copying the following Python examples into our file:

from __future__ import print_function
import time
import cloudmersive_convert_api_client
from cloudmersive_convert_api_client.rest import ApiException
from pprint import pprint

# Configure API key authorization: Apikey
configuration = cloudmersive_convert_api_client.Configuration()
configuration.api_key['Apikey'] = 'YOUR_API_KEY'



# create an instance of the API class
api_instance = cloudmersive_convert_api_client.ConvertDocumentApi(cloudmersive_convert_api_client.ApiClient(configuration))
input_file = '/path/to/inputfile' # file | Input file to perform the operation on.

try:
# HTML Document file to Text (txt)
api_response = api_instance.convert_document_html_to_txt(input_file)
pprint(api_response)
except ApiException as e:
print("Exception when calling ConvertDocumentApi->convert_document_html_to_txt: %s\n" % e)

Before we make our API call, let’s get a free Cloudmersive API key to authorize our request. This will allow us to make a limit of 800 API calls each month; our total will simply reset the following month once we hit our limit.

All done! Now we have a low-code, low-maintenance method for getting plain text from our HTML documents.

--

--

Cloudmersive

There’s an API for that. Cloudmersive is a leader in Highly Scalable Cloud APIs.