How to Block HTML Uploads in PHP

4 min readMay 7, 2024

If a file contains HTML, it could also contain scripts designed to initiate a remote connection with a malicious server.

To block files containing HTML from moving forward in our file upload process, we’ll need to perform an in-depth verification of each files’ contents.

Thankfully, we can take care of that by incorporating a free API into our form upload process.

Using code examples provided below, we can call an API that performs a determinist threat scan, checking each files’ contents for HTML, scripts, macros, and other types of threatening content.

This API also references files against a continuously updated list of 17 million + virus and malware signatures (updated every 15 minutes), so we’ll know if established malware threats are present in the file as well.

We can structure our API call in two quick steps. First, let’s execute the below command to install the PHP client with Composer:

composer require cloudmersive/cloudmersive_virusscan_api_client

Next, let’s turn our attention to API call authorization. We can grab a free Cloudmersive API key to make up to 800 API calls per month with zero commitments (once we reach our total, it’ll just reset the following month).

We can now implement the below code to call the function. We can provide our API key in the $config snippet, and we can set the $allow_html variable to false if we want to specifically block files containing HTML contents:

<?php
require_once(__DIR__ . '/vendor/autoload.php');

// Configure API key authorization: Apikey
$config = Swagger\Client\Configuration::getDefaultConfiguration()->setApiKey('Apikey', 'YOUR_API_KEY');



$apiInstance = new Swagger\Client\Api\ScanApi(
    
    
    new GuzzleHttp\Client(),
    $config
);
$input_file = "/path/to/inputfile"; // \SplFileObject | Input file to perform the operation on.
$allow_executables = true; // bool | Set to false to block executable files (program code) from being allowed in the input file.  Default is false (recommended).
$allow_invalid_files = true; // bool | Set to false to block invalid files, such as a PDF file that is not really a valid PDF file, or a Word Document that is not a valid Word Document.  Default is false (recommended).
$allow_scripts = true; // bool | Set to false to block script files, such as a PHP files, Python scripts, and other malicious content or security threats that can be embedded in the file.  Set to true to allow these file types.  Default is false (recommended).
$allow_password_protected_files = true; // bool | Set to false to block password protected and encrypted files, such as encrypted zip and rar files, and other files that seek to circumvent scanning through passwords.  Set to true to allow these file types.  Default is false (recommended).
$allow_macros = true; // bool | Set to false to block macros and other threats embedded in document files, such as Word, Excel and PowerPoint embedded Macros, and other files that contain embedded content threats.  Set to true to allow these file types.  Default is false (recommended).
$allow_xml_external_entities = true; // bool | Set to false to block XML External Entities and other threats embedded in XML files, and other files that contain embedded content threats.  Set to true to allow these file types.  Default is false (recommended).
$allow_insecure_deserialization = true; // bool | Set to false to block Insecure Deserialization and other threats embedded in JSON and other object serialization files, and other files that contain embedded content threats.  Set to true to allow these file types.  Default is false (recommended).
$allow_html = true; // bool | Set to false to block HTML input in the top level file; HTML can contain XSS, scripts, local file accesses and other threats.  Set to true to allow these file types.  Default is false (recommended) [for API keys created prior to the release of this feature default is true for backward compatability].
$restrict_file_types = "restrict_file_types_example"; // string | Specify a restricted set of file formats to allow as clean as a comma-separated list of file formats, such as .pdf,.docx,.png would allow only PDF, PNG and Word document files.  All files must pass content verification against this list of file formats, if they do not, then the result will be returned as CleanResult=false.  Set restrictFileTypes parameter to null or empty string to disable; default is disabled.

try {
    $result = $apiInstance->scanFileAdvanced($input_file, $allow_executables, $allow_invalid_files, $allow_scripts, $allow_password_protected_files, $allow_macros, $allow_xml_external_entities, $allow_insecure_deserialization, $allow_html, $restrict_file_types);
    print_r($result);
} catch (Exception $e) {
    echo 'Exception when calling ScanApi->scanFileAdvanced: ', $e->getMessage(), PHP_EOL;
}
?>

That’s all there is to it — now we can easily incorporate a dynamic threat scanning API into our form uploads to block hidden HTML and a variety of additional threats.

How to Block HTML Uploads in PHP

Written by Cloudmersive