Extract PDF Metadata from SharePoint List Attachments using Power Automate

7 min readJan 13, 2025

There’s a lot of useful information we can gather from a PDF file attachment on a SharePoint list item by simply examining the PDF metadata object. From the PDF metadata object, we can usually obtain information about the PDF title, author, and creator, exactly when the document was created, what keywords the document has stored (if any), the number of pages in the document, and more.

Extracting PDF Metadata via Power Automate

Thanks to Power Automate, we can programmatically extract PDF metadata from a SharePoint list attachment in a simple automated flow. We can use multiple built-in SharePoint connector actions to retrieve list item information & attachment contents, and we can then plug in a Cloudmersive PDF connector action to extract the PDF metadata object.

Once we have PDF metadata available to us as dynamic content in our flow, we can use utilize that information in all kinds of different ways — including updating the original list item with the PDF’s metadata contents.

Walkthrough

In this article, we’ll create an example flow that performs the above-described process in conjunction with a basic SharePoint list.

The list I’ll use in my example expects users to upload contract documents that have been modified by (or originally provided by) clients and customers).

Our example flow will do the following:

Trigger when new items are created on our example list
Extract attachment details and attachment contents from the list item
Determine whether the attachment was a PDF
Extract metadata from the PDF
Update the list item with PDF metadata

Create an Automated Cloud Flow

To begin our example flow, we’ll first select the option to create an Automated cloud flow in Power Automate, and we’ll elect to trigger our automated flow with the SharePoint When an item is created trigger.

Configure the Trigger Action

We’ll now configure our SharePoint When an item is created trigger action to run when items are added to our specific list. To do that, we’ll select our Site Address and List Name from each respective dropdown.

Get Attachments & Attachment Contents from the SharePoint List Item

To get metadata from a SharePoint list attachment PDF, we’ll need to ask Power Automate to retrieve the attachment file bytes. To do that, we’ll use two different SharePoint actions — Get attachments and Get attachment contents.

As shown in the above screenshots from my example flow, we’ll configure the Get attachments action with the list item ID from our trigger step, and we’ll then configure the Get attachment content action with both the list item ID and the File Identifier. The File Identifier was retrieved via Get attachments.

We’ll notice Power Automate immediately wraps our Get attachment content action in a For each control when we select the File Identifier value. That’s because this value is an array (any given list item can have multiple files attached to it).

Create a Condition to Avoid Processing Non-PDF Documents

The action we’ll eventually use for PDF metadata extraction in our flow only works on PDF documents, so we’ll need to stop other document types from moving forward in our flow. Otherwise, we’ll get a bunch of annoying notifications about failed flow runs when our PDF metadata action attempts to carry out its task.

We’ll screen out non-PDF documents by adding a Condition control directly before the Get attachment content action in our flow.

We’ll set the condition to check for DisplayNames which contain the string ‘.pdf’. This will send non-PDF documents into the False branch of condition, where the flow will simply end without any additional actions.

With our Condition properly configured, we’ll drag the Get attachment content action into the True branch of our condition.

Extract PDF Metadata

We’ll now include the API we’re using to extract PDF metadata. To find it, we’ll search the Power Automate connector library for Cloudmersive connectors, and we’ll locate the Cloudmersive PDF connector with the pink logo.

We’ll click “See more” to view the actions list, and we’ll then use a CTRL+F search to find an action titled Get PDF document metadata.

After selecting this action, we’ll need to create and authorize our Cloudmersive PDF connection before we can move any further. We’ll need a Cloudmersive API key to authorize our connection, and we can get one for free by creating a free account on the Cloudmersive website (this allows up to 800 API calls/month with zero commitment).

Once our connection is saved, we can satisfy our two request parameters with attachment file bytes & file names (i.e., Attachment Content and DisplayName).

Update List Item with PDF Metadata Information

To this point, we’ve asked our flow to extract metadata from list item attachments bearing a ‘.pdf’ extension.

To wrap up our flow, we’ll update the original SharePoint list item with the PDF metadata values that interest us the most. We’ll use SharePoint’s Update item action here.

After we’ve configured our Update item action as shown above (with the list item ID), the Advanced parameters section of this action will populate with our SharePoint list views.

We’ll leave the Title view blank, but we’ll fill the Contract Document Metadata (or whatever we named this multiline text section in our own example list) with information from our PDF metadata object.

In my example (shown above), I’ve filled the Contract Document Metadata field with information including the document title, author, and creator (this refers to the application the document was originally created in before being exported as a PDF), and also the page count, creation date, last modification date, and keywords.

It’s important to note that not all of these metadata fields will necessarily be filled in any given PDF. The idea is to capture as much information as possible in one go.

Test the Flow

Now that we’ve finished designing our flow, it’s time to run a quick test. Since we created an Automated cloud flow, we’ll carry out our test by performing the trigger action (i.e., creating a new list item with a PDF attachment).

When our flow finishes running, we’ll find metadata values (as many fields as were available in our test document) updated in the metadata view of our list.

As we can see in my from my example flow run, two pieces of metadata weren’t available in my example PDF document — namely, the Title and Keywords — but all the other fields were updated with information from the PDF metadata object. This information can immediately answer some questions about the contract document before list users think to ask them.

Conclusion

In this walkthrough, we learned how to create a flow that triggers when a new list item is created, retrieves file attachment information & contents from that item, determines if the attachment is a PDF, extracts metadata from the PDF, and then updates the original list item with certain metadata values.

We can use this exact flow logic to impact a variety of different list use-cases.