How to Scan SharePoint PDFs for Threats in C# .NET Framework
Threat actors can obfuscate malicious content in PDF files rather easily. Compromised PDF uploads can live undetected within our SharePoint ecosystem for long periods of time, only requiring inadvertent user interaction to execute their malicious contents.
Thankfully, by actively scanning PDFs in SharePoint Site Drive storage, we can proactively check suspicious PDFs for a variety of threat types and block them before SharePoint users inadvertently access and trigger their contents.
Using the below code, we can take advantage of a free API that allows us to directly scan individual files in our SharePoint Online Site Drive for viruses, malware, and various threatening content types which antivirus solutions typically won’t detect.
This latter category includes anything from invalid PDF file structure to unexpected password protection measures, image objects, scripts, and more. We can set custom threat rules as variables in our request to flag PDFs containing unwanted & threatening content types, ensuring even the slightest discrepancies will trigger a CleanResult: False response.
Before we copy code to structure our API call, we’ll need to gather some information from our SharePoint Online account. This includes:
- Client ID
- Client Secret
- SharePoint Domain Name
- Site ID
- Tenant ID (optional)
- File Path (optional)
- Item ID (optional)
With that information ready, we can next grab a free API key to authorize our API calls. With a free API key, we’ll be able to make up to 800 API calls per month with no commitments (once we reach our limit, our total will reset the following month).
Now we can go about structuring our API call in C# .NET framework. To begin, let’s install the SDK via NuGet by running the below command in our Package Manager console:
Install-Package Cloudmersive.APIClient.NET.VirusScan -Version 3.0.4
Next, let’s copy the below code into our file. Let’s copy our API key into the authorization snippet, and then let’s enter our SharePoint Online account information into their respective variables.
using System;
using System.Diagnostics;
using Cloudmersive.APIClient.NET.VirusScan.Api;
using Cloudmersive.APIClient.NET.VirusScan.Client;
using Cloudmersive.APIClient.NET.VirusScan.Model;
namespace Example
{
public class ScanCloudStorageScanSharePointOnlineFileAdvancedExample
{
public void main()
{
// Configure API key authorization: Apikey
Configuration.Default.AddApiKey("Apikey", "YOUR_API_KEY");
var apiInstance = new ScanCloudStorageApi();
var clientID = clientID_example; // string | Client ID access credentials; see description above for instructions on how to get the Client ID from the Azure Active Directory portal.
var clientSecret = clientSecret_example; // string | Client Secret access credentials; see description above for instructions on how to get the Client Secret from the Azure Active Directory portal
var sharepointDomainName = sharepointDomainName_example; // string | SharePoint Online domain name, such as mydomain.sharepoint.com
var siteID = siteID_example; // string | Site ID (GUID) of the SharePoint site you wish to retrieve the file from
var tenantID = tenantID_example; // string | Optional; Tenant ID of your Azure Active Directory (optional)
var filePath = filePath_example; // string | Path to the file within the drive, such as 'hello.pdf' or '/folder/subfolder/world.pdf'. If the file path contains Unicode characters, you must base64 encode the file path and prepend it with 'base64:', such as: 'base64:6ZWV6ZWV6ZWV6ZWV6ZWV6ZWV'. (optional)
var itemID = itemID_example; // string | SharePoint itemID, such as a DriveItem Id (optional)
var allowExecutables = true; // bool? | Set to false to block executable files (program code) from being allowed in the input file. Default is false (recommended). (optional)
var allowInvalidFiles = true; // bool? | Set to false to block invalid files, such as a PDF file that is not really a valid PDF file, or a Word Document that is not a valid Word Document. Default is false (recommended). (optional)
var allowScripts = true; // bool? | Set to false to block script files, such as a PHP files, Python scripts, and other malicious content or security threats that can be embedded in the file. Set to true to allow these file types. Default is false (recommended). (optional)
var allowPasswordProtectedFiles = true; // bool? | Set to false to block password protected and encrypted files, such as encrypted zip and rar files, and other files that seek to circumvent scanning through passwords. Set to true to allow these file types. Default is false (recommended). (optional)
var allowMacros = true; // bool? | Set to false to block macros and other threats embedded in document files, such as Word, Excel and PowerPoint embedded Macros, and other files that contain embedded content threats. Set to true to allow these file types. Default is false (recommended). (optional)
var allowXmlExternalEntities = true; // bool? | Set to false to block XML External Entities and other threats embedded in XML files, and other files that contain embedded content threats. Set to true to allow these file types. Default is false (recommended). (optional)
var restrictFileTypes = restrictFileTypes_example; // string | Specify a restricted set of file formats to allow as clean as a comma-separated list of file formats, such as .pdf,.docx,.png would allow only PDF, PNG and Word document files. All files must pass content verification against this list of file formats, if they do not, then the result will be returned as CleanResult=false. Set restrictFileTypes parameter to null or empty string to disable; default is disabled. (optional)
try
{
// Advanced Virus Scan a file in a SharePoint Online Site Drive
CloudStorageAdvancedVirusScanResult result = apiInstance.ScanCloudStorageScanSharePointOnlineFileAdvanced(clientID, clientSecret, sharepointDomainName, siteID, tenantID, filePath, itemID, allowExecutables, allowInvalidFiles, allowScripts, allowPasswordProtectedFiles, allowMacros, allowXmlExternalEntities, restrictFileTypes);
Debug.WriteLine(result);
}
catch (Exception e)
{
Debug.Print("Exception when calling ScanCloudStorageApi.ScanCloudStorageScanSharePointOnlineFileAdvanced: " + e.Message );
}
}
}
}
When we’re ready to make our request, we can block various threats by setting their Boolean request variables to “False”. While this article has focused on scanning PDFs, we can scan much more than that, including all major Office files and more than 100 unique image formats.
Here’s an example response object for reference:
{
"Successful": true,
"CleanResult": true,
"ContainsExecutable": true,
"ContainsInvalidFile": true,
"ContainsScript": true,
"ContainsPasswordProtectedFile": true,
"ContainsRestrictedFileFormat": true,
"ContainsMacros": true,
"VerifiedFileFormat": "string",
"FoundViruses": [
{
"FileName": "string",
"VirusName": "string"
}
],
"ErrorDetailedDescription": "string",
"FileSize": 0,
"ContentInformation": {
"ContainsJSON": true,
"ContainsXML": true,
"ContainsImage": true,
"RelevantSubfileName": "string"
}
}