How to Scan a PDF for Viruses, Malware, and Other Threats in Node.js

Cloudmersive
4 min readApr 11, 2024

--

When we scan PDFs for threats, we should be looking for viruses, malware, and file formatting/file content abnormalities that might pose a threat to our rendering and processing applications.

Thankfully, using a free virus scanning API, we can simultaneously scan PDFs for viruses and deep-verify the file format to ensure they rigorously conform with PDF formatting standards.

This can help us avoid unleashing viruses and malware into our system, and it can also help prevent attacks that exploit zero-day vulnerabilities (e.g., buffer overflow) in PDF rendering and processing applications.

To structure our API call, we can begin by installing the SDK. We can run the following command to install via NPM install:

npm install cloudmersive-virus-api-client --save

Or we can alternatively add this snippet to our package.json:

  "dependencies": {
"cloudmersive-virus-api-client": "^1.1.9"
}

Now we can turn our attention to API call authorization. We can make up to 800 API calls per month with a free Cloudmersive API key (once we reach our limit, the total will just reset the following month with zero other commitments).

We can then use the below code to call the function. We can set custom threat rules via Boolean request parameters to block invalid PDF files and threatening content types that may be hidden within a PDF:

var CloudmersiveVirusApiClient = require('cloudmersive-virus-api-client');
var defaultClient = CloudmersiveVirusApiClient.ApiClient.instance;

// Configure API key authorization: Apikey
var Apikey = defaultClient.authentications['Apikey'];
Apikey.apiKey = 'YOUR API KEY';



var apiInstance = new CloudmersiveVirusApiClient.ScanApi();

var inputFile = Buffer.from(fs.readFileSync("C:\\temp\\inputfile").buffer); // File | Input file to perform the operation on.

var opts = {
'allowExecutables': true, // Boolean | Set to false to block executable files (program code) from being allowed in the input file. Default is false (recommended).
'allowInvalidFiles': true, // Boolean | Set to false to block invalid files, such as a PDF file that is not really a valid PDF file, or a Word Document that is not a valid Word Document. Default is false (recommended).
'allowScripts': true, // Boolean | Set to false to block script files, such as a PHP files, Python scripts, and other malicious content or security threats that can be embedded in the file. Set to true to allow these file types. Default is false (recommended).
'allowPasswordProtectedFiles': true, // Boolean | Set to false to block password protected and encrypted files, such as encrypted zip and rar files, and other files that seek to circumvent scanning through passwords. Set to true to allow these file types. Default is false (recommended).
'allowMacros': true, // Boolean | Set to false to block macros and other threats embedded in document files, such as Word, Excel and PowerPoint embedded Macros, and other files that contain embedded content threats. Set to true to allow these file types. Default is false (recommended).
'allowXmlExternalEntities': true, // Boolean | Set to false to block XML External Entities and other threats embedded in XML files, and other files that contain embedded content threats. Set to true to allow these file types. Default is false (recommended).
'allowInsecureDeserialization': true, // Boolean | Set to false to block Insecure Deserialization and other threats embedded in JSON and other object serialization files, and other files that contain embedded content threats. Set to true to allow these file types. Default is false (recommended).
'allowHtml': true, // Boolean | Set to false to block HTML input in the top level file; HTML can contain XSS, scripts, local file accesses and other threats. Set to true to allow these file types. Default is false (recommended) [for API keys created prior to the release of this feature default is true for backward compatability].
'restrictFileTypes': "restrictFileTypes_example" // String | Specify a restricted set of file formats to allow as clean as a comma-separated list of file formats, such as .pdf,.docx,.png would allow only PDF, PNG and Word document files. All files must pass content verification against this list of file formats, if they do not, then the result will be returned as CleanResult=false. Set restrictFileTypes parameter to null or empty string to disable; default is disabled.
};

var callback = function(error, data, response) {
if (error) {
console.error(error);
} else {
console.log('API called successfully. Returned data: ' + data);
}
};
apiInstance.scanFileAdvanced(inputFile, opts, callback);

With the information we’ll get from this API response, we can make smart decisions about what to do next with our PDFs and avoid inadvertently triggering attacks.

--

--

Cloudmersive
Cloudmersive

Written by Cloudmersive

There’s an API for that. Cloudmersive is a leader in Highly Scalable Cloud APIs.

No responses yet