How to Convert HTML to Text in Python
Without writing any new code, we can easily get plain text from HTML documents in Python.
To make our HTML to text conversion, we can simply call a free API with complementary Python code.
Our first step is to install the client SDK via pip install:
pip install cloudmersive-convert-api-client
After that, we can add the imports and call the function (with our HTML file path included) by copying the following Python examples into our file:
from __future__ import print_function
import time
import cloudmersive_convert_api_client
from cloudmersive_convert_api_client.rest import ApiException
from pprint import pprint
# Configure API key authorization: Apikey
configuration = cloudmersive_convert_api_client.Configuration()
configuration.api_key['Apikey'] = 'YOUR_API_KEY'
# create an instance of the API class
api_instance = cloudmersive_convert_api_client.ConvertDocumentApi(cloudmersive_convert_api_client.ApiClient(configuration))
input_file = '/path/to/inputfile' # file | Input file to perform the operation on.
try:
# HTML Document file to Text (txt)
api_response = api_instance.convert_document_html_to_txt(input_file)
pprint(api_response)
except ApiException as e:
print("Exception when calling ConvertDocumentApi->convert_document_html_to_txt: %s\n" % e)
Before we make our API call, let’s get a free Cloudmersive API key to authorize our request. This will allow us to make a limit of 800 API calls each month; our total will simply reset the following month once we hit our limit.
All done! Now we have a low-code, low-maintenance method for getting plain text from our HTML documents.