How to Convert a URL Page to Text in Power Automate
The text content we see on any given web page is encased in HTML tags and displayed with CSS formatting. When we extract text from a web page, we remove plain text from the confines of those web markup languages — and that leaves us with a number of options.
We can use the plain text content from a web page to support NLP pipelines (e.g., preparing text for sentiment analysis or other similar analyses), and we can also use it to identify keywords for indexing and other text-based search optimization. The list goes on. The question is: how do we efficiently extract plain text from a URL and immediately ready that content for other applications in our system?
Convert a URL to Text in Power Automate
Thankfully, there’s a pretty easy answer to that question — we can simply build a Power Automate flow that uses the Cloudmersive Document Conversion connector to handle our URL text extraction. The Document Conversion connector will return a plain text string to our flow, and we can subsequently process that text using whichever application connectors suit our needs.
In this article, we’ll walk through a quick example flow that demonstrates converting a URL to plain text with the Cloudmersive Document Conversion connector. To facilitate a quick test, we’ll build an instant cloud.
We’ll start by adding a new action and typing “Cloudmersive” into the search bar. This brings up a list of Cloudmersive connectors (each corresponding to various Cloudmersive API services). We’re looking for the Document Conversion connector on this list.
To view the entire actions list, we’ll click on the “See more” option to the right of the connector name.
From here, we’ll search for an action called Convert website URL page to text (txt).
After we select this action, our first order of business is to create and authorize our Document Conversion connection. To do so, we’ll need to acquire a free Cloudmersive API key (we can get one by creating a free account on the Cloudmersive website).
Once our connection is authorized, we’ll enter a test URL in the Input/Url parameter. In my example, I’m using a link to the Cloudmersive home page.
At this point, we’re all set and ready to test our flow. If we want to create a more complex test than this, we can compose an array of URL inputs, or we can use the dynamic output from this action to feed another Power Automate action (e.g., an AI Builder connector action).
After we run our test, we’ll open our Convert website URL page to text action and review the output.
In our response, we’ll find a TextContentResult string containing line endings that correspond with our OS. We’ll find all the text included in our original URL in this string, and we can use subsequent flow actions (or various Power Automate expressions) to clean up our string for downstream use.