May 5, 2020

2 min read

How to remove HTML from a text string in Python

The danger of script-based attacks is all too real. A single well-place snippet of HTML from a user submission and your whole system can be compromised. A sure-fire means of protection is required; we need to strip out any HTML code from incoming text strings. Sound difficult? You would be right, except that I have a lovely API that’s going to get the job done for us with a minimum of effort required to set it up. Let’s get started.

We may begin with installation, using pip install.

pip install cloudmersive-convert-api-client

For this next bit, we will be instancing our API first, and then following that with a call for edit_text_remove_html:

from __future__ import print_functionimport timeimport cloudmersive_convert_api_clientfrom import ApiExceptionfrom pprint import pprint# Configure API key authorization: Apikeyconfiguration = cloudmersive_convert_api_client.Configuration()configuration.api_key['Apikey'] = 'YOUR_API_KEY'# Uncomment below to setup prefix (e.g. Bearer) for API key, if needed# configuration.api_key_prefix['Apikey'] = 'Bearer'# create an instance of the API classapi_instance = cloudmersive_convert_api_client.EditTextApi(cloudmersive_convert_api_client.ApiClient(configuration))request = cloudmersive_convert_api_client.RemoveHtmlFromTextRequest() # RemoveHtmlFromTextRequest | Input requesttry:# Remove HTML from text stringapi_response = api_instance.edit_text_remove_html(request)pprint(api_response)except ApiException as e:print("Exception when calling EditTextApi->edit_text_remove_html: %s\n" % e)

And there you have it, very simple. All of our APIs work in a similarly quick fashion. This same client hosts a number of other useful functions for working with HTML, as well as Excel, Word, and other file formats.