How to Scan XML Files for Threats in Node.js

Cloudmersive
4 min readApr 18, 2024

One of the biggest concerns with XML files is XML External Entities. If our applications parse XML data with malicious external references, an attacker might quietly breach our system.

Thankfully, we can use a dynamic threat detection API to identify XXE threats within XML files. This API will also check our XML files for viruses, malware, and a wide range of additional threats (including scripts, for example), so we can feel confident that XML files processed by our server-side Node.js applications are safe before they enter a potentially compromising point in our system.

We can structure our API call in just a few quick steps.

First, we can install the SDK via NPM install:

npm install cloudmersive-virus-api-client --save

Alternatively, we could simply add the following snippet to our package.json:

  "dependencies": {
"cloudmersive-virus-api-client": "^1.1.9"
}

Next, we can copy the below code to call the function. Here, we can set various threat rules in the API request body to exert a bit more control over our scan.

var CloudmersiveVirusApiClient = require('cloudmersive-virus-api-client');
var defaultClient = CloudmersiveVirusApiClient.ApiClient.instance;

// Configure API key authorization: Apikey
var Apikey = defaultClient.authentications['Apikey'];
Apikey.apiKey = 'YOUR API KEY';



var apiInstance = new CloudmersiveVirusApiClient.ScanApi();

var inputFile = Buffer.from(fs.readFileSync("C:\\temp\\inputfile").buffer); // File | Input file to perform the operation on.

var opts = {
'allowExecutables': true, // Boolean | Set to false to block executable files (program code) from being allowed in the input file. Default is false (recommended).
'allowInvalidFiles': true, // Boolean | Set to false to block invalid files, such as a PDF file that is not really a valid PDF file, or a Word Document that is not a valid Word Document. Default is false (recommended).
'allowScripts': true, // Boolean | Set to false to block script files, such as a PHP files, Python scripts, and other malicious content or security threats that can be embedded in the file. Set to true to allow these file types. Default is false (recommended).
'allowPasswordProtectedFiles': true, // Boolean | Set to false to block password protected and encrypted files, such as encrypted zip and rar files, and other files that seek to circumvent scanning through passwords. Set to true to allow these file types. Default is false (recommended).
'allowMacros': true, // Boolean | Set to false to block macros and other threats embedded in document files, such as Word, Excel and PowerPoint embedded Macros, and other files that contain embedded content threats. Set to true to allow these file types. Default is false (recommended).
'allowXmlExternalEntities': true, // Boolean | Set to false to block XML External Entities and other threats embedded in XML files, and other files that contain embedded content threats. Set to true to allow these file types. Default is false (recommended).
'allowInsecureDeserialization': true, // Boolean | Set to false to block Insecure Deserialization and other threats embedded in JSON and other object serialization files, and other files that contain embedded content threats. Set to true to allow these file types. Default is false (recommended).
'allowHtml': true, // Boolean | Set to false to block HTML input in the top level file; HTML can contain XSS, scripts, local file accesses and other threats. Set to true to allow these file types. Default is false (recommended) [for API keys created prior to the release of this feature default is true for backward compatability].
'restrictFileTypes': "restrictFileTypes_example" // String | Specify a restricted set of file formats to allow as clean as a comma-separated list of file formats, such as .pdf,.docx,.png would allow only PDF, PNG and Word document files. All files must pass content verification against this list of file formats, if they do not, then the result will be returned as CleanResult=false. Set restrictFileTypes parameter to null or empty string to disable; default is disabled.
};

var callback = function(error, data, response) {
if (error) {
console.error(error);
} else {
console.log('API called successfully. Returned data: ' + data);
}
};
apiInstance.scanFileAdvanced(inputFile, opts, callback);

For example, if we wanted to restrict XML files from a Node.js form entirely, we could enter a comma-separated list of acceptable file extensions in the ‘restrictFileTypes’ parameter (e.g., ‘.pdf,.docx,.xlsx’). In doing so, we would block any file types (including XML) that didn’t conform to those specific files’ standards, and as a result, we would automatically flag XML file uploads as threats.

To authorize our API calls, we’ll need a free Cloudmersive API key. This will allow us to make up to 800 API calls per month with no commitments (once we reach our limit, our total will reset the following month).

In the API response body (as shown in the example JSON response below), we’ll see an item indicating if our XML file contained external entity references:

{
"CleanResult": true,
"ContainsExecutable": true,
"ContainsInvalidFile": true,
"ContainsScript": true,
"ContainsPasswordProtectedFile": true,
"ContainsRestrictedFileFormat": true,
"ContainsMacros": true,
"ContainsXmlExternalEntities": true,
"ContainsInsecureDeserialization": true,
"ContainsHtml": true,
"ContainsUnsafeArchive": true,
"ContainsOleEmbeddedObject": true,
"VerifiedFileFormat": "string",
"FoundViruses": [
{
"FileName": "string",
"VirusName": "string"
}
],
"ContentInformation": {
"ContainsJSON": true,
"ContainsXML": true,
"ContainsImage": true,
"RelevantSubfileName": "string"
}
}

And that’s all there is to it — now we can easily implement a dynamic, low-code XML threat scanning solution in our Node.js application.

--

--

Cloudmersive

There’s an API for that. Cloudmersive is a leader in Highly Scalable Cloud APIs.