How to Validate Word DOCX Documents in Power Automate
Word (DOCX) documents can have all kinds of problems. Those can include anything from formatting issues to content errors and file integrity deficiencies.
If we’re building Power Automate flows that deal with a high volume of DOCX files, it’s worth checking those files for problems before we make them available to a wider audience.
Validate Word DOCX Documents in Power Automate
Thankfully, there’s an easy way to do that in Power Automate. We can validate DOCX files using an API available in the Cloudmersive Document Conversion connector library.
This API detects whether the file in question is a valid DOCX document to begin with (i.e., if it bears a .docx
extension but does NOT contain .docx
contents, which is entirely possible) and provides a detailed ErrorsAndWarnings response array enumerating any directly identifiable issues.
{
"DocumentIsValid": true,
"PasswordProtected": true,
"ErrorCount": 0,
"WarningCount": 0,
"ErrorsAndWarnings": [
{
"Description": "string",
"Path": "string",
"Uri": "string",
"IsError": true
}
]
}
It’s easy to implement this API into any regular DOCX processing flow. We’ll walk through an example of that now.
To create our test case, we’ll build a manually triggered instant cloud flow.
We’ll imagine this flow is extracting documents from one folder and moving them to another. We’ll use a List files in folder action to get information about DOCX files in our folder, and we’ll follow that with a Get file content action to grab the file bytes using the file identifiers.
Because we’re dealing with an array of files, Power Automate will automatically wrap our Get file content action in a For each control.
Within the For each control, we’ll add our DOCX validation action. We’ll find it by first searching the connector library for Cloudmersive connectors, and then locating the Cloudmersive Document Conversion connector with the green logo.
We’ll click “See more” to view the full actions list, and from there, we’ll CTRL+F search for an action called Validate a Word Document (DOCX). The actions list is organized alphabetically, so we’ll find this action at the very bottom.
Once we select this action, we’ll need to focus on creating and authorizing our third-party API connection. We’ll need a Cloudmersive API key (and, like all third-party connectors, a premium Power Automate license), and we can get one for free by visiting the Cloudmersive website and creating a free account. With a free API key, we’ll be able to make up to 800 API calls each month with no commitments.
To configure our DOCX validation request, we’ll simply pass our dynamic file content into the Input File parameter, and then we’ll pass the file name (from our List files in folder action) to the File Name parameter.
Now, to simulate how we might use this action in a production flow, we’ll add a Condition that only accepts files in the True branch when the file is deemed valid. To handle this, we’ll set body/DocumentIsValid equal to “true”.
We can test our flow as it is right now if we want — or we can add one final step that places valid DOCX files in a new folder (this can help visualize the result). To do the latter, we’ll simply add a Create file action and write each file (retaining the original file name) to a different folder in our system.
Now we’ll save and test our flow.
My example folder contained 9 files originally — 3 of which were saved with the wrong extension, and 1 of which was a different file type entirely. Those 4 files were filtered out, leaving only 5 valid DOCX files in the new folder.
In the above screenshot, the initial file in this folder — a PDF document saved with a .docx
extension — was filtered out right away. The following API response was provided for this file:
As we can see, there weren’t any DOCX-related errors found within the file, because the file didn’t use DOCX structure to begin with. It was an entirely different file type masquerading as a DOCX file, and as such, it failed to pass the baseline validation check.
With this API in our arsenal, it’s easy to filter out fake or broken DOCX files in our system and make sure they don’t become available to flow stakeholders.