How to Convert Word DOCX to Plain Text in Power Automate

Cloudmersive
5 min readNov 13, 2024

--

Every DOCX document carries a plain text object in its file structure, and that text is organized and modified by various document styles and settings before it gets displayed as rich text in the MS Word application.

Rich text is great for creating an intentional aesthetic display, but it’s a hinderance in many other contexts (e.g., text analysis for search optimization, accessibility for those without access to the MS Word application, etc.).

If we want to extract plain text from a DOCX document, we can do so in one of two ways — manually, or programmatically. The former option entails selecting & copying all text from within an open DOCX file, and the latter option entails leveraging a programming library designed to navigate DOCX file structure.

Convert DOCX to Text in Power Automate

In this article, we’ll strike a happy medium between manual and programmatic text extraction methods. We’ll build a no-code Power Automate flow that extracts plain text from a file in our file system, cleans the text by removing unnecessary whitespace, and saves the text as a plain text file. We’ll handle our text extraction step using the Cloudmersive Document Conversion connector.

First, we’ll select the option to create a manually triggered, instant cloud flow. We’ll use this option so we can test with full control over our data.

In the first step of our flow, we’ll use a Get file content action to retrieve an example DOCX document from our file system. I’ll be retrieving a file from a OneDrive for Business folder in my example.

In our next step, we’ll add our text-extracting action. To find it, we’ll first search for Cloudmersive connectors, and we’ll locate the Cloudmersive Document Conversion connector with the green logo.

We’ll click “See more” to view the full actions list, and from there, we’ll search for an action called Convert Word DOCX Document to Text (txt).

Once we select this action, we’ll need to create our Cloudmersive Document Conversion connection if we haven’t already. To do that, we’ll need to get a free Cloudmersive API key from the Cloudmersive website. These allow a limit of 800 API calls per month with zero commitments; we can simply copy our API key string & paste it in the appropriate Power Automate field after it’s generated for us.

To configure our DOCX to Text request, we’ll add our file content and file name (this can be any random file name) into our two primary request parameters.

If we click “Show all” to view the Advanced parameters, we’ll find one additional parameter which gives us the option to control how whitespace is handled in our resulting text string (note that the description of this parameter gets cut off, unfortunately).

If we leave this blank, we’ll engage the default setting: “minimizeWhitespace”. This setting means the underlying service will NOT attempt to add any additional whitespace to the text string — it’ll keep the original paragraph spacing structure. If we wanted the underlying service to attempt to structure the resulting text string with MORE whitespace (in an effort to more accurately match the aesthetic of the original DOCX formatting), we could apply the setting “preserveWhitespace” here.

Now that we’ve configured our DOCX to Text conversion step, we’ll finish up our flow with a Create file action. This isn’t strictly necessary, but in the context of this demonstration, we’ll use this action to save our output text string in a .txt file so we can visualize the result.

In the above screenshot, we’ll notice I wrapped the output content from my DOCX to Text conversion action in a trim function. This function trims leading and trailing whitespace from a text string; it’s a useful function to employ any time we’re dealing with text string dynamic content in Power Automate.

Now that we’ve created our text file, we’ll save our flow and run a test.

We’ll find our text file in the folder we specified, and when we open that file, we’ll find our original DOCX text neatly structured in text lines.

We’ll notice there isn’t any leading or trailing whitespace thanks to our trim function. The whitespace in the middle of our file serves to divide text content from each page in our original document (page 1, page 2, etc.).

We can now utilize our resulting text content anyway we want!

--

--

Cloudmersive
Cloudmersive

Written by Cloudmersive

There’s an API for that. Cloudmersive is a leader in Highly Scalable Cloud APIs.

No responses yet