How to convert website URL page to text TXT in Java
In today’s lesson, we will be demonstrating a very quick and easy method that can take in a website URL and extract all of its text. The manual approach would be to first download the page, parse the HTML, and finally extract the text components from it. This can be a bit messy and requires a lot of rare case handling. To make this easier, we will be covering a fully automated solution instead.
To begin, we need two references. First our repository reference for Jitpack.
<repositories>
<repository>
<id>jitpack.io</id>
<url>https://jitpack.io</url>
</repository>
</repositories>
And then a dependency for the library that we will be using.
<dependencies>
<dependency>
<groupId>com.github.Cloudmersive</groupId>
<artifactId>Cloudmersive.APIClient.Java</artifactId>
<version>v3.54</version>
</dependency>
</dependencies>
Now we just have to call our function and provide it with a URL.
// Import classes://import com.cloudmersive.client.invoker.ApiClient;//import com.cloudmersive.client.invoker.ApiException;//import com.cloudmersive.client.invoker.Configuration;//import com.cloudmersive.client.invoker.auth.*;//import com.cloudmersive.client.ConvertWebApi;ApiClient defaultClient = Configuration.getDefaultApiClient();// Configure API key authorization: ApikeyApiKeyAuth Apikey = (ApiKeyAuth) defaultClient.getAuthentication("Apikey");Apikey.setApiKey("YOUR API KEY");// Uncomment the following line to set a prefix for the API key, e.g. "Token" (defaults to null)//Apikey.setApiKeyPrefix("Token");ConvertWebApi apiInstance = new ConvertWebApi();UrlToTextRequest input = new UrlToTextRequest(); // UrlToTextRequest | HTML to Text request parameterstry {UrlToTextResponse result = apiInstance.convertWebUrlToTxt(input);System.out.println(result);} catch (ApiException e) {System.err.println("Exception when calling ConvertWebApi#convertWebUrlToTxt");e.printStackTrace();}
And that’s it! No problem.