How to Block Unwanted File Upload Formats using Python

Cloudmersive
4 min readMay 25, 2023

--

There are an overwhelming number of file formats which threat actors can use to exploit your file upload process. Thankfully for you, however, most of those formats are entirely unnecessary for your own business purposes. For example, if you’re designing a file upload process for users to store resumes in your database, you can easily afford to limit acceptable file formats to simple, common options like .DOCX and .PDF. Those are by far the most common document formats used for storing text or image-based resumes; validating content in those formats will be quick and easy, and you won’t inconvenience your trusted users whatsoever.

Using the Advanced Virus Scan API provided below, you can easily block unwanted/unnecessary file types by entering a comma-separated string of acceptable file extensions in the restrictFileTypes request parameter. This API will scan all file uploads for viruses and malware (including ransomware, spyware, and trojans) and then verify each document’s contents against your custom list. Any documents which don’t align with your custom list will return a CleanResult: False value in the API response body, which is the same value assigned to documents containing viruses and malware. This makes it easy to clean up your file upload process in one fell swoop.

Python code examples are provided below to help you structure your API call. You can first install the Python SDK with pip install by running the following command:

pip install cloudmersive-virus-api-client

After that, copy the below code:

from __future__ import print_function
import time
import cloudmersive_virus_api_client
from cloudmersive_virus_api_client.rest import ApiException
from pprint import pprint

# Configure API key authorization: Apikey
configuration = cloudmersive_virus_api_client.Configuration()
configuration.api_key['Apikey'] = 'YOUR_API_KEY'



# create an instance of the API class
api_instance = cloudmersive_virus_api_client.ScanApi(cloudmersive_virus_api_client.ApiClient(configuration))
input_file = '/path/to/inputfile' # file | Input file to perform the operation on.
allow_executables = true # bool | Set to false to block executable files (program code) from being allowed in the input file. Default is false (recommended). (optional)
allow_invalid_files = true # bool | Set to false to block invalid files, such as a PDF file that is not really a valid PDF file, or a Word Document that is not a valid Word Document. Default is false (recommended). (optional)
allow_scripts = true # bool | Set to false to block script files, such as a PHP files, Python scripts, and other malicious content or security threats that can be embedded in the file. Set to true to allow these file types. Default is false (recommended). (optional)
allow_password_protected_files = true # bool | Set to false to block password protected and encrypted files, such as encrypted zip and rar files, and other files that seek to circumvent scanning through passwords. Set to true to allow these file types. Default is false (recommended). (optional)
allow_macros = true # bool | Set to false to block macros and other threats embedded in document files, such as Word, Excel and PowerPoint embedded Macros, and other files that contain embedded content threats. Set to true to allow these file types. Default is false (recommended). (optional)
allow_xml_external_entities = true # bool | Set to false to block XML External Entities and other threats embedded in XML files, and other files that contain embedded content threats. Set to true to allow these file types. Default is false (recommended). (optional)
allow_insecure_deserialization = true # bool | Set to false to block Insecure Deserialization and other threats embedded in JSON and other object serialization files, and other files that contain embedded content threats. Set to true to allow these file types. Default is false (recommended). (optional)
allow_html = true # bool | Set to false to block HTML input in the top level file; HTML can contain XSS, scripts, local file accesses and other threats. Set to true to allow these file types. Default is false (recommended) [for API keys created prior to the release of this feature default is true for backward compatability]. (optional)
restrict_file_types = 'restrict_file_types_example' # str | Specify a restricted set of file formats to allow as clean as a comma-separated list of file formats, such as .pdf,.docx,.png would allow only PDF, PNG and Word document files. All files must pass content verification against this list of file formats, if they do not, then the result will be returned as CleanResult=false. Set restrictFileTypes parameter to null or empty string to disable; default is disabled. (optional)

try:
# Advanced Scan a file for viruses
api_response = api_instance.scan_file_advanced(input_file, allow_executables=allow_executables, allow_invalid_files=allow_invalid_files, allow_scripts=allow_scripts, allow_password_protected_files=allow_password_protected_files, allow_macros=allow_macros, allow_xml_external_entities=allow_xml_external_entities, allow_insecure_deserialization=allow_insecure_deserialization, allow_html=allow_html, restrict_file_types=restrict_file_types)
pprint(api_response)
except ApiException as e:
print("Exception when calling ScanApi->scan_file_advanced: %s\n" % e)

Now simply include your comma-separated list, configure any other non-malware content threat detection rules to suit your needs, and supply a free-tier API key to authenticate your reqeust. It’s just that easy!

--

--

Cloudmersive

There’s an API for that. Cloudmersive is a leader in Highly Scalable Cloud APIs.