How to Check File Uploads for HTML using PHP
Files containing HTML code can house XSS attempts, malicious scripts, local file access attempts and a variety of other threats — so it’s important to detect and block these files before they enter application storage. Thankfully, the Virus Scanning API provided below is designed to scan files for viruses, malware, and non-content threats including HTML, so you can easily set a custom policy ($allow_html) to “False” in the request body to quickly identify and neutralize HTML threats.
You can easily take advantage of this API in two steps using complementary, ready-to-run PHP code examples. To start, run the following command from the command line to install the PHP client using Composer:
composer require cloudmersive/cloudmersive_virusscan_api_client
Next, copy the code examples below to structure your API request in PHP. Provide a free-tier API key in the $config parameter (you can get one by registering a free account on the Cloudmersive website) and set the HTML policy to “False”:
<?php
require_once(__DIR__ . '/vendor/autoload.php');
// Configure API key authorization: Apikey
$config = Swagger\Client\Configuration::getDefaultConfiguration()->setApiKey('Apikey', 'YOUR_API_KEY');
$apiInstance = new Swagger\Client\Api\ScanApi(
new GuzzleHttp\Client(),
$config
);
$input_file = "/path/to/inputfile"; // \SplFileObject | Input file to perform the operation on.
$allow_executables = true; // bool | Set to false to block executable files (program code) from being allowed in the input file. Default is false (recommended).
$allow_invalid_files = true; // bool | Set to false to block invalid files, such as a PDF file that is not really a valid PDF file, or a Word Document that is not a valid Word Document. Default is false (recommended).
$allow_scripts = true; // bool | Set to false to block script files, such as a PHP files, Python scripts, and other malicious content or security threats that can be embedded in the file. Set to true to allow these file types. Default is false (recommended).
$allow_password_protected_files = true; // bool | Set to false to block password protected and encrypted files, such as encrypted zip and rar files, and other files that seek to circumvent scanning through passwords. Set to true to allow these file types. Default is false (recommended).
$allow_macros = true; // bool | Set to false to block macros and other threats embedded in document files, such as Word, Excel and PowerPoint embedded Macros, and other files that contain embedded content threats. Set to true to allow these file types. Default is false (recommended).
$allow_xml_external_entities = true; // bool | Set to false to block XML External Entities and other threats embedded in XML files, and other files that contain embedded content threats. Set to true to allow these file types. Default is false (recommended).
$allow_insecure_deserialization = true; // bool | Set to false to block Insecure Deserialization and other threats embedded in JSON and other object serialization files, and other files that contain embedded content threats. Set to true to allow these file types. Default is false (recommended).
$allow_html = true; // bool | Set to false to block HTML input in the top level file; HTML can contain XSS, scripts, local file accesses and other threats. Set to true to allow these file types. Default is false (recommended) [for API keys created prior to the release of this feature default is true for backward compatability].
$restrict_file_types = "restrict_file_types_example"; // string | Specify a restricted set of file formats to allow as clean as a comma-separated list of file formats, such as .pdf,.docx,.png would allow only PDF, PNG and Word document files. All files must pass content verification against this list of file formats, if they do not, then the result will be returned as CleanResult=false. Set restrictFileTypes parameter to null or empty string to disable; default is disabled.
try {
$result = $apiInstance->scanFileAdvanced($input_file, $allow_executables, $allow_invalid_files, $allow_scripts, $allow_password_protected_files, $allow_macros, $allow_xml_external_entities, $allow_insecure_deserialization, $allow_html, $restrict_file_types);
print_r($result);
} catch (Exception $e) {
echo 'Exception when calling ScanApi->scanFileAdvanced: ', $e->getMessage(), PHP_EOL;
}
?>
Each file passed through this API will be scanned for millions of potential virus and malware signatures (including ransomware, spyware, and trojans) and receive full content verification to detect HTML and other hidden threats.