How to Extract URLs from an HTML File in Java

Cloudmersive
2 min readMar 2, 2021

--

Extracting URLs from an HTML file can be useful in a variety of ways. For example, if you have a web application that allows users to upload their backup bookmark file, using a tool to extract the resolved links and title will automate the process so the user doesn’t have to insert them manually. It can also be used as an added security feature to ensure there aren’t any hidden links that could contaminate your application. The following API will extract resolved URLs from an HTML file to assist with these and other tasks.

To use the API in Java, we will install the Maven SDK by adding a reference to the repository in pom.xml:

<repositories>
<repository>
<id>jitpack.io</id>
<url>https://jitpack.io</url>
</repository>
</repositories>

Next, we will add a reference to the dependency:

<dependencies>
<dependency>
<groupId>com.github.Cloudmersive</groupId>
<artifactId>Cloudmersive.APIClient.Java</artifactId>
<version>v3.90</version>
</dependency>
</dependencies>

Once the installation is complete, we can add the imports, configure the API key, and call the function:

// Import classes:
//import com.cloudmersive.client.invoker.ApiClient;
//import com.cloudmersive.client.invoker.ApiException;
//import com.cloudmersive.client.invoker.Configuration;
//import com.cloudmersive.client.invoker.auth.*;
//import com.cloudmersive.client.EditHtmlApi;
ApiClient defaultClient = Configuration.getDefaultApiClient();// Configure API key authorization: Apikey
ApiKeyAuth Apikey = (ApiKeyAuth) defaultClient.getAuthentication("Apikey");
Apikey.setApiKey("YOUR API KEY");
// Uncomment the following line to set a prefix for the API key, e.g. "Token" (defaults to null)
//Apikey.setApiKeyPrefix("Token");
EditHtmlApi apiInstance = new EditHtmlApi();
File inputFile = new File("/path/to/inputfile"); // File | Optional: Input file to perform the operation on.
String inputFileUrl = "inputFileUrl_example"; // String | Optional: URL of a file to operate on as input.
String baseUrl = "baseUrl_example"; // String | Optional: Base URL of the page, such as https://mydomain.com
try {
HtmlGetLinksResponse result = apiInstance.editHtmlHtmlGetLinks(inputFile, inputFileUrl, baseUrl);
System.out.println(result);
} catch (ApiException e) {
System.err.println("Exception when calling EditHtmlApi#editHtmlHtmlGetLinks");
e.printStackTrace();
}

And simple as that, you will receive a list of links as a response.

--

--

Cloudmersive
Cloudmersive

Written by Cloudmersive

There’s an API for that. Cloudmersive is a leader in Highly Scalable Cloud APIs.

No responses yet