How to Block XML External Entities with a Free API in PHP

Cloudmersive
4 min readMay 2, 2024

XML external entities are loaded from outside the document type definition they’re declared in. That presents a considerable security risk.

If we allow XML eternal entities to enter our system, they might interfere with the way our application processes XML data. This can allow a threat actor to view confidential files on our server, and in some cases, it can even allow them to interact directly with backend systems. Attacks that exploit our system in this way are labeled XXE (XML external entity) attacks.

Thankfully, with the right solution in place, we can flag XML files containing external entities at the point they enter our system and prevent them from moving forward into a vulnerable server location.

Using the below code, we can call a free API that verifies XML file content and identifies if external entities are present within the document. To block these files, we’ll just need to set a simple request parameter — $allow_xml_external_entities — to false.

We’ll get more than just an XXE scan out of this, too. The underlying service will check files for millions of virus and malware signatures, and it’ll allow us to block other types of threats including executables, macros, invalid files, and more.

We can structure our API call in a few quick steps. First, let’s install the PHP client with Composer by executing the following command from our command line:

composer require cloudmersive/cloudmersive_virusscan_api_client

Next, let’s grab a free Cloudmersive API key to authorize our requests. This will allow us to make up to 800 API calls per month with zero commitments (once we reach that limit, our total will reset the following month).

Let’s now use the below PHP examples to call the function. We can set our $allow_xml_external_entities parameter to false, and we can provide our API key in the $config snippet:

<?php
require_once(__DIR__ . '/vendor/autoload.php');

// Configure API key authorization: Apikey
$config = Swagger\Client\Configuration::getDefaultConfiguration()->setApiKey('Apikey', 'YOUR_API_KEY');



$apiInstance = new Swagger\Client\Api\ScanApi(


new GuzzleHttp\Client(),
$config
);
$input_file = "/path/to/inputfile"; // \SplFileObject | Input file to perform the operation on.
$allow_executables = true; // bool | Set to false to block executable files (program code) from being allowed in the input file. Default is false (recommended).
$allow_invalid_files = true; // bool | Set to false to block invalid files, such as a PDF file that is not really a valid PDF file, or a Word Document that is not a valid Word Document. Default is false (recommended).
$allow_scripts = true; // bool | Set to false to block script files, such as a PHP files, Python scripts, and other malicious content or security threats that can be embedded in the file. Set to true to allow these file types. Default is false (recommended).
$allow_password_protected_files = true; // bool | Set to false to block password protected and encrypted files, such as encrypted zip and rar files, and other files that seek to circumvent scanning through passwords. Set to true to allow these file types. Default is false (recommended).
$allow_macros = true; // bool | Set to false to block macros and other threats embedded in document files, such as Word, Excel and PowerPoint embedded Macros, and other files that contain embedded content threats. Set to true to allow these file types. Default is false (recommended).
$allow_xml_external_entities = true; // bool | Set to false to block XML External Entities and other threats embedded in XML files, and other files that contain embedded content threats. Set to true to allow these file types. Default is false (recommended).
$allow_insecure_deserialization = true; // bool | Set to false to block Insecure Deserialization and other threats embedded in JSON and other object serialization files, and other files that contain embedded content threats. Set to true to allow these file types. Default is false (recommended).
$allow_html = true; // bool | Set to false to block HTML input in the top level file; HTML can contain XSS, scripts, local file accesses and other threats. Set to true to allow these file types. Default is false (recommended) [for API keys created prior to the release of this feature default is true for backward compatability].
$restrict_file_types = "restrict_file_types_example"; // string | Specify a restricted set of file formats to allow as clean as a comma-separated list of file formats, such as .pdf,.docx,.png would allow only PDF, PNG and Word document files. All files must pass content verification against this list of file formats, if they do not, then the result will be returned as CleanResult=false. Set restrictFileTypes parameter to null or empty string to disable; default is disabled.

try {
$result = $apiInstance->scanFileAdvanced($input_file, $allow_executables, $allow_invalid_files, $allow_scripts, $allow_password_protected_files, $allow_macros, $allow_xml_external_entities, $allow_insecure_deserialization, $allow_html, $restrict_file_types);
print_r($result);
} catch (Exception $e) {
echo 'Exception when calling ScanApi->scanFileAdvanced: ', $e->getMessage(), PHP_EOL;
}
?>

That’s all there is to it — now we can easily check XML files for XXE threats and prevent a wide range of other potential threats from entering our system.

--

--

Cloudmersive

There’s an API for that. Cloudmersive is a leader in Highly Scalable Cloud APIs.