How to convert an HTML document file to text (TXT) in Python
Today’s goal: setting up conversion for HTML documents into TXT. I will be showing you how to get this done with an absolute minimum effort requirement. This is possible by means of an API function.
A quick install of our API client comes first.
pip install cloudmersive-convert-api-client
The function we need in this case is called convert_document_html_to_txt and implementing it is pretty straightforward. We start by instancing our API, then using that instance for our function call. Here’s how the code looks:
from __future__ import print_functionimport timeimport cloudmersive_convert_api_clientfrom cloudmersive_convert_api_client.rest import ApiExceptionfrom pprint import pprint# Configure API key authorization: Apikeyconfiguration = cloudmersive_convert_api_client.Configuration()configuration.api_key['Apikey'] = 'YOUR_API_KEY'# Uncomment below to setup prefix (e.g. Bearer) for API key, if needed# configuration.api_key_prefix['Apikey'] = 'Bearer'# create an instance of the API classapi_instance = cloudmersive_convert_api_client.ConvertDocumentApi(cloudmersive_convert_api_client.ApiClient(configuration))input_file = '/path/to/file' # file | Input file to perform the operation on.try:# HTML Document file to Text (txt)api_response = api_instance.convert_document_html_to_txt(input_file)pprint(api_response)except ApiException as e:print("Exception when calling ConvertDocumentApi->convert_document_html_to_txt: %s\n" % e)
All that remains is inputting your HTML document and sending in the request. In a matter of moments, the response will be in your hot little hands.