How to Scan HTML Files for Malware and Other Threats in Node.js

Cloudmersive
4 min readApr 18, 2024

When we open HTML files in our browser, our browser application will typically execute JavaScript and/or any additional dynamic code embedded within the file. Malicious HTML files can initiate remote malware download in this way, catching us off guard if we haven’t properly scanned the file for threats.

Thankfully, using the Node.js code examples provided below, we can take advantage of a free API that takes care of the following:

  1. Verifies HTML file content (ensures the content conforms to the file extension)
  2. Scans HTML file content for threats including viruses, malware, scripts, image objects (which can contain malware themselves), and more.

We could use this API as a simple, convenient & low-code method for protecting our Node.js forms from malicious HTML file uploads.

If we wanted to forbid HTML files from a client-side upload form entirely (a practical security decision in many upload scenarios), we could use this API for that purpose just as easily. We would simply need to enter a comma separated list of acceptable file extensions (e.g., ‘.docx,.xlsx,.pdf’) in the ‘restrictFileTypes’ parameter, and any files that didn’t pass a rigorous content verification check against those expected extensions would receive a “CleanResult”: False response.

Let’s walk through a few quick steps to structure our API call.

First, we’ll need to install the client SDK. We can run this command to install via NPM install:

npm install cloudmersive-virus-api-client --save

Alternatively, we could add the below snippet to our package.json:

  "dependencies": {
"cloudmersive-virus-api-client": "^1.1.9"
}

Now we can quickly turn our attention to API call authorization. We’ll need a free Cloudmersive API key to make up to 800 API calls per month with zero commitments (once we reach our call limit, our total will simply reset the following month).

We can now use the below code to call the function (we can set the ‘restrictFileTypes’ parameter and other custom threat rules in the request body):

var CloudmersiveVirusApiClient = require('cloudmersive-virus-api-client');
var defaultClient = CloudmersiveVirusApiClient.ApiClient.instance;

// Configure API key authorization: Apikey
var Apikey = defaultClient.authentications['Apikey'];
Apikey.apiKey = 'YOUR API KEY';



var apiInstance = new CloudmersiveVirusApiClient.ScanApi();

var inputFile = Buffer.from(fs.readFileSync("C:\\temp\\inputfile").buffer); // File | Input file to perform the operation on.

var opts = {
'allowExecutables': true, // Boolean | Set to false to block executable files (program code) from being allowed in the input file. Default is false (recommended).
'allowInvalidFiles': true, // Boolean | Set to false to block invalid files, such as a PDF file that is not really a valid PDF file, or a Word Document that is not a valid Word Document. Default is false (recommended).
'allowScripts': true, // Boolean | Set to false to block script files, such as a PHP files, Python scripts, and other malicious content or security threats that can be embedded in the file. Set to true to allow these file types. Default is false (recommended).
'allowPasswordProtectedFiles': true, // Boolean | Set to false to block password protected and encrypted files, such as encrypted zip and rar files, and other files that seek to circumvent scanning through passwords. Set to true to allow these file types. Default is false (recommended).
'allowMacros': true, // Boolean | Set to false to block macros and other threats embedded in document files, such as Word, Excel and PowerPoint embedded Macros, and other files that contain embedded content threats. Set to true to allow these file types. Default is false (recommended).
'allowXmlExternalEntities': true, // Boolean | Set to false to block XML External Entities and other threats embedded in XML files, and other files that contain embedded content threats. Set to true to allow these file types. Default is false (recommended).
'allowInsecureDeserialization': true, // Boolean | Set to false to block Insecure Deserialization and other threats embedded in JSON and other object serialization files, and other files that contain embedded content threats. Set to true to allow these file types. Default is false (recommended).
'allowHtml': true, // Boolean | Set to false to block HTML input in the top level file; HTML can contain XSS, scripts, local file accesses and other threats. Set to true to allow these file types. Default is false (recommended) [for API keys created prior to the release of this feature default is true for backward compatability].
'restrictFileTypes': "restrictFileTypes_example" // String | Specify a restricted set of file formats to allow as clean as a comma-separated list of file formats, such as .pdf,.docx,.png would allow only PDF, PNG and Word document files. All files must pass content verification against this list of file formats, if they do not, then the result will be returned as CleanResult=false. Set restrictFileTypes parameter to null or empty string to disable; default is disabled.
};

var callback = function(error, data, response) {
if (error) {
console.error(error);
} else {
console.log('API called successfully. Returned data: ' + data);
}
};
apiInstance.scanFileAdvanced(inputFile, opts, callback);

And we can refer to the generic JSON response example below to understand what a full diagnostic might look like:

{
"CleanResult": true,
"ContainsExecutable": true,
"ContainsInvalidFile": true,
"ContainsScript": true,
"ContainsPasswordProtectedFile": true,
"ContainsRestrictedFileFormat": true,
"ContainsMacros": true,
"ContainsXmlExternalEntities": true,
"ContainsInsecureDeserialization": true,
"ContainsHtml": true,
"ContainsUnsafeArchive": true,
"ContainsOleEmbeddedObject": true,
"VerifiedFileFormat": "string",
"FoundViruses": [
{
"FileName": "string",
"VirusName": "string"
}
],
"ContentInformation": {
"ContainsJSON": true,
"ContainsXML": true,
"ContainsImage": true,
"RelevantSubfileName": "string"
}
}

That’s all there is to it — now we can easily scan and/or restrict HTML file uploads with just a few lines of Node.js code.

--

--

Cloudmersive

There’s an API for that. Cloudmersive is a leader in Highly Scalable Cloud APIs.