How to Scan Files for Malicious Code in Node.js

Cloudmersive
4 min readMay 21, 2024

--

To easily scan our form uploads for malicious code, we can simply call a free API that performs 1) a deterministic threat scan and 2) a traditional virus and malware signature scan (referencing a continuously updated list of 17 million+ signatures).

The deterministic scan verifies the contents of each form upload, identifying high-risk content types including macros, scripts, executables, HTML, and more. Files containing malicious code and other high-risk content receive a “CleanResult”: false response in a detailed threat diagnostic.

By performing a deterministic threat scan, we can uncover specially crafted malicious files disguised as normal, safe file uploads.

Let’s look at a quick example. Specially crafted PDFs containing malicious JavaScript injections are identifiably invalid files (i.e., they don’t conform with stringent PDF formatting standards), but many PDF rendering/processing technologies will still attempt to read their contents if they contain a valid PDF extension. Such files don’t contain malware, so they won’t trip any wires in a traditional AV scan, but they can still be used to initiate remote code execution or distributed denial of service (DDoS) attacks.

If we use this API to perform a deterministic threat scan on a specially crafted PDF file containing a JavaScript injection (please note — an inert JS injection example was used for this demonstration), we’ll get an API response like the below example:

{
"CleanResult": false,
"ContainsExecutable": false,
"ContainsInvalidFile": true,
"ContainsScript": false,
"ContainsPasswordProtectedFile": false,
"ContainsRestrictedFileFormat": false,
"ContainsMacros": false,
"ContainsXmlExternalEntities": false,
"ContainsInsecureDeserialization": false,
"ContainsHtml": false,
"ContainsUnsafeArchive": false,
"ContainsOleEmbeddedObject": false,
"VerifiedFileFormat": ".pdf",
"FoundViruses": null,
"ContentInformation": {
"ContainsJSON": false,
"ContainsXML": false,
"ContainsImage": false,
"RelevantSubfileName": null
}
}

The “CleanResult”: false response indicates the file is unsafe, and the “ContainsInvalidFile”: true response tells us why: the file does not rigorously conform with PDF formatting standards. The “VerifiedFileFormat”: “.pdf” response still identifies PDF as the file extension, however, which indicates the file would still execute in a PDF reader if left unchecked.

To take advantage of this API, we can simply copy from the complementary, ready-to-run code examples provided below. We’ll need a free API key to authorize our API calls (this will allow us to make up to 800 API calls per month with zero additional commitments).

First, let’s run the following NPM command to install the client SDK:

npm install cloudmersive-virus-api-client --save

Alternatively, we can just add the Node client to our package.json:

  "dependencies": {
"cloudmersive-virus-api-client": "^1.1.9"
}

After that, let’s use the below code examples to call the threat scanning function. We can replace the ‘YOUR API KEY’ placeholder snippet with our own API key, and we can set various threat rule variables in the request body:

var CloudmersiveVirusApiClient = require('cloudmersive-virus-api-client');
var defaultClient = CloudmersiveVirusApiClient.ApiClient.instance;

// Configure API key authorization: Apikey
var Apikey = defaultClient.authentications['Apikey'];
Apikey.apiKey = 'YOUR API KEY';



var apiInstance = new CloudmersiveVirusApiClient.ScanApi();

var inputFile = Buffer.from(fs.readFileSync("C:\\temp\\inputfile").buffer); // File | Input file to perform the operation on.

var opts = {
'allowExecutables': true, // Boolean | Set to false to block executable files (program code) from being allowed in the input file. Default is false (recommended).
'allowInvalidFiles': true, // Boolean | Set to false to block invalid files, such as a PDF file that is not really a valid PDF file, or a Word Document that is not a valid Word Document. Default is false (recommended).
'allowScripts': true, // Boolean | Set to false to block script files, such as a PHP files, Python scripts, and other malicious content or security threats that can be embedded in the file. Set to true to allow these file types. Default is false (recommended).
'allowPasswordProtectedFiles': true, // Boolean | Set to false to block password protected and encrypted files, such as encrypted zip and rar files, and other files that seek to circumvent scanning through passwords. Set to true to allow these file types. Default is false (recommended).
'allowMacros': true, // Boolean | Set to false to block macros and other threats embedded in document files, such as Word, Excel and PowerPoint embedded Macros, and other files that contain embedded content threats. Set to true to allow these file types. Default is false (recommended).
'allowXmlExternalEntities': true, // Boolean | Set to false to block XML External Entities and other threats embedded in XML files, and other files that contain embedded content threats. Set to true to allow these file types. Default is false (recommended).
'allowInsecureDeserialization': true, // Boolean | Set to false to block Insecure Deserialization and other threats embedded in JSON and other object serialization files, and other files that contain embedded content threats. Set to true to allow these file types. Default is false (recommended).
'allowHtml': true, // Boolean | Set to false to block HTML input in the top level file; HTML can contain XSS, scripts, local file accesses and other threats. Set to true to allow these file types. Default is false (recommended) [for API keys created prior to the release of this feature default is true for backward compatability].
'restrictFileTypes': "restrictFileTypes_example" // String | Specify a restricted set of file formats to allow as clean as a comma-separated list of file formats, such as .pdf,.docx,.png would allow only PDF, PNG and Word document files. All files must pass content verification against this list of file formats, if they do not, then the result will be returned as CleanResult=false. Set restrictFileTypes parameter to null or empty string to disable; default is disabled.
};

var callback = function(error, data, response) {
if (error) {
console.error(error);
} else {
console.log('API called successfully. Returned data: ' + data);
}
};
apiInstance.scanFileAdvanced(inputFile, opts, callback);

We can scan all Office document formats, over 100 image formats, HTML documents, and more.

Just a few lines of code, and we’ve gained a crucial layer of protection in our Node.js form upload application.

--

--

Cloudmersive
Cloudmersive

Written by Cloudmersive

There’s an API for that. Cloudmersive is a leader in Highly Scalable Cloud APIs.

No responses yet