Sitemap

How to Sanitize Files with CDR Technology in Python

2 min readOct 10, 2025

File sanitization is exactly what it sounds like — the process of “cleaning” files by removing their “dirty” (malicious) components. In the context of CDR, however, “sanitization” takes on a slightly different meaning.

Press enter or click to view image in full size

Content Disarm and Reconstruction (CDR) technology sanitizes malicious files by rebuilding files from scratch without certain components present. It’s not exactly “sanitization” because it’s technically a brand-new file at the end — but the effect is the same.

The missing components in the new file are the parts of the original file that didn’t rigorously conform with the most stringent formatting specifications of that particular file type. These components represent potential zero-day vulnerabilities which might have been used to launch malware attacks.

Press enter or click to view image in full size

In effect, CDR is a zero-trust sanitization technology akin to normalization (a process used to sanitize text inputs and prevent threats like SQL injection).

Implementing CDR for our Python application

CDR is a complex technology, but that doesn’t mean it’s necessary difficult to implement in our Python application. Using the below code, we can take advantage of a CDR API which abstracts the entire CDR process away from our application. This API reduces the entire CDR workflow to a simple file IO in our code, and we’ll just need a free API key to use it up to 800 API calls per month.

First, we can add the following imports to the top of our file:

from __future__ import print_function
import time
import cloudmersive_cdr_api_client
from cloudmersive_cdr_api_client.rest import ApiException
from pprint import pprint

And after that, we can use the below code snippet to configure our API key, create an instance of the API class, and call the API instance in a try/except block:

# Configure API key authorization: Apikey
configuration = cloudmersive_cdr_api_client.Configuration()
configuration.api_key['Apikey'] = 'YOUR_API_KEY'
# Uncomment below to setup prefix (e.g. Bearer) for API key, if needed
# configuration.api_key_prefix['Apikey'] = 'Bearer'

# create an instance of the API class
api_instance = cloudmersive_cdr_api_client.FileSanitizationApi(cloudmersive_cdr_api_client.ApiClient(configuration))
input_file = '/path/to/file.txt' # file | Input document, or photos of a document, to extract data from (optional)

try:
# Complete Content Disarm and Reconstruction on an Input File, and output in same file format
api_instance.file(input_file=input_file)
except ApiException as e:
print("Exception when calling FileSanitizationApi->file: %s\n" % e)

We can simply save the result of the api_instance.file() method to a new file variable in our code, and we’re done handling our CDR process. Easy, right?

--

--

Cloudmersive
Cloudmersive

Written by Cloudmersive

There’s an API for that. Cloudmersive is a leader in Highly Scalable Cloud APIs.

No responses yet