How to Extract Links from an HTML File in Python

If you have an older web application, there is a chance you could have a security risk lurking in the corners. Extracting links from an HTML file is an efficient method to identify other linked applications, related websites, or web technologies that could affect your business. To assist in testing this facet of web security, the following API will pull out the resolved URLs from an input HTML file.

To use the Cloudmersive API, run this command to install the client:

Next we are going to configure the API key, instance the API, and call the function with the following code:

# Configure API key authorization: Apikey
configuration = cloudmersive_convert_api_client.Configuration()
configuration.api_key['Apikey'] = 'YOUR_API_KEY'
# create an instance of the API class
api_instance = cloudmersive_convert_api_client.EditHtmlApi(cloudmersive_convert_api_client.ApiClient(configuration))
input_file = '/path/to/inputfile' # file | Optional: Input file to perform the operation on. (optional)
input_file_url = 'input_file_url_example' # str | Optional: URL of a file to operate on as input. (optional)
base_url = 'base_url_example' # str | Optional: Base URL of the page, such as (optional)
# Extract resolved link URLs from HTML File
api_response = api_instance.edit_html_html_get_links(input_file=input_file, input_file_url=input_file_url, base_url=base_url)
except ApiException as e:
print("Exception when calling EditHtmlApi->edit_html_html_get_links: %s\n" % e)

The process will run quickly through the information and the returned response will list each of the identified links. Don’t forget to retrieve your free API key from the Cloudmersive website to ensure the API runs smoothly!

There’s an API for that. Cloudmersive is a leader in Highly Scalable Cloud APIs.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store