How to Block JavaScript Injection PDFs in PHP
Malicious PDFs containing JavaScript injections are often designed to execute code in our PDF readers or browsers. It’s important that we block these documents before they reach a vulnerable point.
Thankfully, we can block PDFs containing JavaScript and other malicious code using a deterministic threat scanning method. Specifically, we can call a free API (using PHP code examples provided below) that verifies the contents of PDF file uploads to ensure they rigorously conform to strict PDF formatting standards.
In other words, PDFs containing malicious JavaScript injections might bear legitimate PDF extensions, but if we dig a little deeper into the file, we’ll identify that the added code makes those documents invalid and extremely risky to open.
We can structure our API call in two quick steps, beginning with PHP client installation. We can execute the below command to install with Composer:
composer require cloudmersive/cloudmersive_virusscan_api_client
Next, we can turn our attention briefly to API call authorization. We’ll need a free Cloudmersive API key to authorize our requests — that will allow us to make a limit of 800 API calls per month with zero additional commitments.
Finally, we can use the remaining code below to perform our PDF scan. We can paste our API key in the 'YOUR_API_KEY'
placeholder within the $config
snippet.
Note that this scan will pick up a variety of additional content threats including executables, scripts, macros, unsafe archives (e.g., zip bombs), and more. It also includes basic coverage against known malware families, referencing files against a continuously updated list of more than 17 million virus and malware signatures:
<?php
require_once(__DIR__ . '/vendor/autoload.php');
// Configure API key authorization: Apikey
$config = Swagger\Client\Configuration::getDefaultConfiguration()->setApiKey('Apikey', 'YOUR_API_KEY');
$apiInstance = new Swagger\Client\Api\ScanApi(
new GuzzleHttp\Client(),
$config
);
$input_file = "/path/to/inputfile"; // \SplFileObject | Input file to perform the operation on.
$allow_executables = true; // bool | Set to false to block executable files (program code) from being allowed in the input file. Default is false (recommended).
$allow_invalid_files = true; // bool | Set to false to block invalid files, such as a PDF file that is not really a valid PDF file, or a Word Document that is not a valid Word Document. Default is false (recommended).
$allow_scripts = true; // bool | Set to false to block script files, such as a PHP files, Python scripts, and other malicious content or security threats that can be embedded in the file. Set to true to allow these file types. Default is false (recommended).
$allow_password_protected_files = true; // bool | Set to false to block password protected and encrypted files, such as encrypted zip and rar files, and other files that seek to circumvent scanning through passwords. Set to true to allow these file types. Default is false (recommended).
$allow_macros = true; // bool | Set to false to block macros and other threats embedded in document files, such as Word, Excel and PowerPoint embedded Macros, and other files that contain embedded content threats. Set to true to allow these file types. Default is false (recommended).
$allow_xml_external_entities = true; // bool | Set to false to block XML External Entities and other threats embedded in XML files, and other files that contain embedded content threats. Set to true to allow these file types. Default is false (recommended).
$allow_insecure_deserialization = true; // bool | Set to false to block Insecure Deserialization and other threats embedded in JSON and other object serialization files, and other files that contain embedded content threats. Set to true to allow these file types. Default is false (recommended).
$allow_html = true; // bool | Set to false to block HTML input in the top level file; HTML can contain XSS, scripts, local file accesses and other threats. Set to true to allow these file types. Default is false (recommended) [for API keys created prior to the release of this feature default is true for backward compatability].
$restrict_file_types = "restrict_file_types_example"; // string | Specify a restricted set of file formats to allow as clean as a comma-separated list of file formats, such as .pdf,.docx,.png would allow only PDF, PNG and Word document files. All files must pass content verification against this list of file formats, if they do not, then the result will be returned as CleanResult=false. Set restrictFileTypes parameter to null or empty string to disable; default is disabled.
try {
$result = $apiInstance->scanFileAdvanced($input_file, $allow_executables, $allow_invalid_files, $allow_scripts, $allow_password_protected_files, $allow_macros, $allow_xml_external_entities, $allow_insecure_deserialization, $allow_html, $restrict_file_types);
print_r($result);
} catch (Exception $e) {
echo 'Exception when calling ScanApi->scanFileAdvanced: ', $e->getMessage(), PHP_EOL;
}
?>
If we test this API with a JS injection PDF file, we’ll get a response that looks something like this:
{
"CleanResult": false,
"ContainsExecutable": false,
"ContainsInvalidFile": true,
"ContainsScript": false,
"ContainsPasswordProtectedFile": false,
"ContainsRestrictedFileFormat": false,
"ContainsMacros": false,
"ContainsXmlExternalEntities": false,
"ContainsInsecureDeserialization": false,
"ContainsHtml": false,
"ContainsUnsafeArchive": false,
"ContainsOleEmbeddedObject": false,
"VerifiedFileFormat": ".pdf",
"FoundViruses": null,
"ContentInformation": {
"ContainsJSON": false,
"ContainsXML": false,
"ContainsImage": false,
"RelevantSubfileName": null
}
}
First and foremost, the initial response value — “CleanResult”: false,
— indicates the file failed either the deterministic scan or the virus and malware signature scan.
Notice that PDF is confirmed as the verified file format — meaning the PDF would likely open in a PDF reader — but the “ContainsInvalidFile”
value is equal to true,
meaning the JavaScript contents do not conform to strict PDF formatting standards. This is why the document ultimately received a “CleanResult”: false,
response.
That’s all the code we’ll need — now we can easily block JS injection PDFs and a wide range of additional content threats in our PHP upload process.
We should also remember to keep our PDF rendering and processing technologies up to date at all times to protect our applications against zero-day attacks.