How to Extract PDF Metadata into a SharePoint List Item in Power Automate

Cloudmersive
5 min read1 day ago

--

PDF metadata objects contain useful information about the documents they describe — but that information isn’t listed in plain sight.

To access a PDF’s metadata object, we need to open the file and navigate to the “Document properties” window. There, we’ll find information about the PDF author, creator, subject, keywords, dates created & modified, and more.

By extracting PDF metadata into columns on a SharePoint list, we can place PDF document properties directly in folks’ line of view. With metadata information structured in columns, we can take conditional actions based on what metadata is present within a document, or we can download our SharePoint lists as tabular spreadsheets and analyze it that way.

Extract PDF metadata to a SharePoint list

Power Automate makes it easy to handle this process programmatically. We can, for example, extract PDF metadata directly from PDF email attachments and fill pre-set fields on a SharePoint list from there.

In this article, we’ll walk through an example flow that handles this workflow using Outlook, Cloudmersive & SharePoint connectors in Power Automate. We’ll populate a static list designed to capture PDF metadata fields (see below example):

We’ll start by creating an Automated cloud flow in Power Automate. We’ll use the Office 365 Outlook trigger action called When a new email arrives (V3).

We’ll ask our trigger action to check for emails from ourselves (this will make testing the flow easier), and we’ll specify that we need emails to contain attachments.

Before we blindly start extracting metadata from email attachments, we’ll first implement a Condition to check the Attachments Name value for a “.pdf” extension.

In the True branch of our condition, we’ll incorporate our PDF metadata extracting action.

We’ll find the action we need on the Cloudmersive PDF connector actions list.

After we select this action, we’ll need to authorize our API connection with a Cloudmersive API key. We can get a free API key by visiting the Cloudmersive website & setting up a free account (this will allow a limit of 800 API calls each month; it’s the same tier of API key we’re using in this example flow).

We’ll then configure our request by passing the Attachments Content and Attachments Name values as arguments into each respective Get PDF metadata parameter.

Now we’ll use the response from the Get PDF metadata action to populate a SharePoint list item. We’ll use the Create item action from the SharePoint connector to accomplish this.

Once we select our SharePoint site address and List Name, the Advanced parameters will populate with each field of the list we created.

We’ll then pass each field of dynamic content returned by our Get PDF metadata action as an argument into each relevant field of our SharePoint list.

To finalize our flow, we’ll use the Add attachment action from the SharePoint connector to attach the original PDF document to the same list item.

We’ll need to make sure we select the same Site Address and List Name as before; after that, we’ll supply the ID value for the list item we created. We’ll then use the Attachments Name and Attachments Content values to set the content of our attached PDF.

We’ll now save our flow and test it by emailing ourselves one or more PDF documents. Power Automate will create one list item for each PDF we attach thanks to the For each control it automatically placed our Condition in.

When our flow finishes running, we’ll find our PDF metadata populated in each field on our SharePoint list.

We can now download PDFs and their relevant metadata as an Excel or CSV spreadsheet from our list — or we can use the Get item action in Power Automate to take certain actions based on the information contained in one or more list columns.

Conclusion

In this article, we learned how to extract PDF metadata directly into a SharePoint list using Cloudmersive and SharePoint connector actions. We demonstrated this in the context of an Automated cloud flow triggered by an Outlook email containing one or more attachments from a specific sender.

--

--

Cloudmersive
Cloudmersive

Written by Cloudmersive

There’s an API for that. Cloudmersive is a leader in Highly Scalable Cloud APIs.

No responses yet