How to Validate PDF Files and Scan for Malware in PHP

Cloudmersive
4 min readApr 25, 2024

By calling a free API with PHP code examples, we can simultaneously validate PDF documents and check the file contents for viruses and malware.

This API call will rigorously verify PDF formatting and check PDF files for more than 17 million virus and malware signatures. By combining these two actions into one low-code solution, we can feel a bit more confident about client-side PDF uploads entering our servers without writing a ton of new code.

It’s also worth noting that we can set custom threat rules in our API request to flag certain types of threatening content.

We could, for example, set $allow_executables to false, thereby disallowing executables from moving forward in our upload process.

It’s not uncommon for threat actors to disguise executables as PDF files using double extension file naming methods; if these disguised files reach locations where secondary extensions are hidden (i.e., email attachments), downstream document viewers can fall victim to sudden executable malware attacks.

To structure our API call, we can begin by installing the client SDK. Let’s execute the following command from our command line to install using Composer:

composer require cloudmersive/cloudmersive_virusscan_api_client

Next, let’s turn our attention to authorization. We’ll need a free Cloudmersive API key to authorize our requests (this will allow us to make up to 800 API calls per month with no additional commitments).

We can now call the function using the below code examples. We can set a variety of custom threat rules to our liking:

<?php
require_once(__DIR__ . '/vendor/autoload.php');

// Configure API key authorization: Apikey
$config = Swagger\Client\Configuration::getDefaultConfiguration()->setApiKey('Apikey', 'YOUR_API_KEY');



$apiInstance = new Swagger\Client\Api\ScanApi(


new GuzzleHttp\Client(),
$config
);
$input_file = "/path/to/inputfile"; // \SplFileObject | Input file to perform the operation on.
$allow_executables = true; // bool | Set to false to block executable files (program code) from being allowed in the input file. Default is false (recommended).
$allow_invalid_files = true; // bool | Set to false to block invalid files, such as a PDF file that is not really a valid PDF file, or a Word Document that is not a valid Word Document. Default is false (recommended).
$allow_scripts = true; // bool | Set to false to block script files, such as a PHP files, Python scripts, and other malicious content or security threats that can be embedded in the file. Set to true to allow these file types. Default is false (recommended).
$allow_password_protected_files = true; // bool | Set to false to block password protected and encrypted files, such as encrypted zip and rar files, and other files that seek to circumvent scanning through passwords. Set to true to allow these file types. Default is false (recommended).
$allow_macros = true; // bool | Set to false to block macros and other threats embedded in document files, such as Word, Excel and PowerPoint embedded Macros, and other files that contain embedded content threats. Set to true to allow these file types. Default is false (recommended).
$allow_xml_external_entities = true; // bool | Set to false to block XML External Entities and other threats embedded in XML files, and other files that contain embedded content threats. Set to true to allow these file types. Default is false (recommended).
$allow_insecure_deserialization = true; // bool | Set to false to block Insecure Deserialization and other threats embedded in JSON and other object serialization files, and other files that contain embedded content threats. Set to true to allow these file types. Default is false (recommended).
$allow_html = true; // bool | Set to false to block HTML input in the top level file; HTML can contain XSS, scripts, local file accesses and other threats. Set to true to allow these file types. Default is false (recommended) [for API keys created prior to the release of this feature default is true for backward compatability].
$restrict_file_types = "restrict_file_types_example"; // string | Specify a restricted set of file formats to allow as clean as a comma-separated list of file formats, such as .pdf,.docx,.png would allow only PDF, PNG and Word document files. All files must pass content verification against this list of file formats, if they do not, then the result will be returned as CleanResult=false. Set restrictFileTypes parameter to null or empty string to disable; default is disabled.

try {
$result = $apiInstance->scanFileAdvanced($input_file, $allow_executables, $allow_invalid_files, $allow_scripts, $allow_password_protected_files, $allow_macros, $allow_xml_external_entities, $allow_insecure_deserialization, $allow_html, $restrict_file_types);
print_r($result);
} catch (Exception $e) {
echo 'Exception when calling ScanApi->scanFileAdvanced: ', $e->getMessage(), PHP_EOL;
}
?>

Regardless of which threat rules we choose to set, the underlying service will automatically deep-verify PDF file formatting and perform a virus and malware scan.

This service isn’t limited to just PDFs, either — we can perform the same scan on Office files, archives, and a wide range of image file formats.

--

--

Cloudmersive

There’s an API for that. Cloudmersive is a leader in Highly Scalable Cloud APIs.