How to Extract Links from an HTML File in Python

Cloudmersive
2 min readFeb 24, 2021

If you have an older web application, there is a chance you could have a security risk lurking in the corners. Extracting links from an HTML file is an efficient method to identify other linked applications, related websites, or web technologies that could affect your business. To assist in testing this facet of web security, the following API will pull out the resolved URLs from an input HTML file.

To use the Cloudmersive API, run this command to install the client:

pip install cloudmersive-convert-api-client

Next we are going to configure the API key, instance the API, and call the function with the following code:

from __future__ import print_function
import time
import cloudmersive_convert_api_client
from cloudmersive_convert_api_client.rest import ApiException
from pprint import pprint
# Configure API key authorization: Apikey
configuration = cloudmersive_convert_api_client.Configuration()
configuration.api_key['Apikey'] = 'YOUR_API_KEY'
# create an instance of the API class
api_instance = cloudmersive_convert_api_client.EditHtmlApi(cloudmersive_convert_api_client.ApiClient(configuration))
input_file = '/path/to/inputfile' # file | Optional: Input file to perform the operation on. (optional)
input_file_url = 'input_file_url_example' # str | Optional: URL of a file to operate on as input. (optional)
base_url = 'base_url_example' # str | Optional: Base URL of the page, such as https://mydomain.com (optional)
try:
# Extract resolved link URLs from HTML File
api_response = api_instance.edit_html_html_get_links(input_file=input_file, input_file_url=input_file_url, base_url=base_url)
pprint(api_response)
except ApiException as e:
print("Exception when calling EditHtmlApi->edit_html_html_get_links: %s\n" % e)

The process will run quickly through the information and the returned response will list each of the identified links. Don’t forget to retrieve your free API key from the Cloudmersive website to ensure the API runs smoothly!

--

--

Cloudmersive

There’s an API for that. Cloudmersive is a leader in Highly Scalable Cloud APIs.