Why Should I Worry about Invalid File Uploads?

4 min readSep 18, 2023

Apart from causing all kinds of data quality issues, there’s a significant risk associated with allowing unrestricted invalid file uploads to sensitive file storage locations.

Invalid files — meaning those files which contain content that does not match the file extension — can be used to smuggle malicious content past virus & malware threat detection policies. To use one real-world example, files with PDF extensions can actually contain custom HTML content with scripts designed to retrieve malicious external objects over the internet once opened or downloaded.

Detecting Invalid Files AND Virus & Malware Threats

Thankfully, we can all but eliminate this issue by incorporating in-depth content verification policies alongside our regular antivirus policies.

In fact, using the below Node.js code examples, we can quickly and easily take advantage of an API that simultaneously scans and verifies file contents in a single request. We can set a custom policy against invalid files in the API request body which ensures files with contents that don’t match their extensions are categorically blocked from entering a specific web server location.

Before we include the code to structure our request, let’s first run the following command to install the Virus API client:

npm install cloudmersive-virus-api-client --save

Alternatively — we can just add the following snippet to our package.json:

  "dependencies": {
    "cloudmersive-virus-api-client": "^1.1.9"
  }

With the SDK installed, let’s now use the below code to structure our request, and let’s make sure the allowInvalidFiles boolean is set to FALSE. After that, let’s make sure we customize any other threat rules we see to our liking, and then let’s supply a free-tier API key in the authentication step so we can make up to 800 API calls per month with zero commitments:

var CloudmersiveVirusApiClient = require('cloudmersive-virus-api-client');
var defaultClient = CloudmersiveVirusApiClient.ApiClient.instance;

// Configure API key authorization: Apikey
var Apikey = defaultClient.authentications['Apikey'];
Apikey.apiKey = 'YOUR API KEY';



var apiInstance = new CloudmersiveVirusApiClient.ScanApi();

var inputFile = Buffer.from(fs.readFileSync("C:\\temp\\inputfile").buffer); // File | Input file to perform the operation on.

var opts = { 
  'allowExecutables': true, // Boolean | Set to false to block executable files (program code) from being allowed in the input file.  Default is false (recommended).
  'allowInvalidFiles': true, // Boolean | Set to false to block invalid files, such as a PDF file that is not really a valid PDF file, or a Word Document that is not a valid Word Document.  Default is false (recommended).
  'allowScripts': true, // Boolean | Set to false to block script files, such as a PHP files, Python scripts, and other malicious content or security threats that can be embedded in the file.  Set to true to allow these file types.  Default is false (recommended).
  'allowPasswordProtectedFiles': true, // Boolean | Set to false to block password protected and encrypted files, such as encrypted zip and rar files, and other files that seek to circumvent scanning through passwords.  Set to true to allow these file types.  Default is false (recommended).
  'allowMacros': true, // Boolean | Set to false to block macros and other threats embedded in document files, such as Word, Excel and PowerPoint embedded Macros, and other files that contain embedded content threats.  Set to true to allow these file types.  Default is false (recommended).
  'allowXmlExternalEntities': true, // Boolean | Set to false to block XML External Entities and other threats embedded in XML files, and other files that contain embedded content threats.  Set to true to allow these file types.  Default is false (recommended).
  'allowInsecureDeserialization': true, // Boolean | Set to false to block Insecure Deserialization and other threats embedded in JSON and other object serialization files, and other files that contain embedded content threats.  Set to true to allow these file types.  Default is false (recommended).
  'allowHtml': true, // Boolean | Set to false to block HTML input in the top level file; HTML can contain XSS, scripts, local file accesses and other threats.  Set to true to allow these file types.  Default is false (recommended) [for API keys created prior to the release of this feature default is true for backward compatability].
  'restrictFileTypes': "restrictFileTypes_example" // String | Specify a restricted set of file formats to allow as clean as a comma-separated list of file formats, such as .pdf,.docx,.png would allow only PDF, PNG and Word document files.  All files must pass content verification against this list of file formats, if they do not, then the result will be returned as CleanResult=false.  Set restrictFileTypes parameter to null or empty string to disable; default is disabled.
};

var callback = function(error, data, response) {
  if (error) {
    console.error(error);
  } else {
    console.log('API called successfully. Returned data: ' + data);
  }
};
apiInstance.scanFileAdvanced(inputFile, opts, callback);

Now we can scan files for more than 17 million virus & malware signatures and avoid invalid file threats — all in one go.

Why Should I Worry about Invalid File Uploads?

Written by Cloudmersive

No responses yet