How to convert website URL page to text TXT in Python
Our goal today is to start with a URL for a website and end up with that website’s HTML as a string in a TXT file. As you might expect, this is a bit complicated to pull off if you start from scratch. So how about a nice little shortcut instead?
Using a Cloudmersive API will make this task easy as pie, so let’s install the client for document conversion now:
pip install cloudmersive-convert-api-client
And the next thing to do is make an instance of our API, followed by using said instance to call convert_web_url_to_txt. This is pretty straightforward, as you can see from this example code I have for you here:
from __future__ import print_functionimport timeimport cloudmersive_convert_api_clientfrom cloudmersive_convert_api_client.rest import ApiExceptionfrom pprint import pprint# Configure API key authorization: Apikeyconfiguration = cloudmersive_convert_api_client.Configuration()configuration.api_key['Apikey'] = 'YOUR_API_KEY'# Uncomment below to setup prefix (e.g. Bearer) for API key, if needed# configuration.api_key_prefix['Apikey'] = 'Bearer'# create an instance of the API classapi_instance = cloudmersive_convert_api_client.ConvertWebApi(cloudmersive_convert_api_client.ApiClient(configuration))input = cloudmersive_convert_api_client.UrlToTextRequest() # UrlToTextRequest | HTML to Text request parameterstry:# Convert website URL page to text (txt)api_response = api_instance.convert_web_url_to_txt(input)pprint(api_response)except ApiException as e:print("Exception when calling ConvertWebApi->convert_web_url_to_txt: %s\n" % e)
And after sending in the request to the server, you will have your website as a TXT file in a flash.