How to Convert a Rasterized (Image-Based) PDF to DOCX Format in Node.js
Rasterized PDFs are those using simple 2-d images, rather than layered vector files, to display their contents. Creating raster PDFs from DOCX format is simple enough; reversing that process, however, requires a more complex degree of programmatic maneuvering. Thankfully, the below code makes this conversion simple, producing new DOCX documents from raster PDF inputs with very high fidelity. All you need to do is copy & paste ready-to-run Node.js code examples into your console, and you’re ready to make your conversion.
First things first — let’s install the SDK by running the below command:
npm install cloudmersive-convert-api-client --save
Or we can accomplish the same by adding this snippet to our package.json:
"dependencies": {
"cloudmersive-convert-api-client": "^2.6.3"
}
Now let’s call the below function, including our raster PDF file and Cloudmersive API key (obtainable by registering a free account on our website) into their respective fields:
var CloudmersiveConvertApiClient = require('cloudmersive-convert-api-client');
var defaultClient = CloudmersiveConvertApiClient.ApiClient.instance;
// Configure API key authorization: Apikey
var Apikey = defaultClient.authentications['Apikey'];
Apikey.apiKey = 'YOUR API KEY';
var apiInstance = new CloudmersiveConvertApiClient.ConvertDocumentApi();
var inputFile = Buffer.from(fs.readFileSync("C:\\temp\\inputfile").buffer); // File | Input file to perform the operation on.
var callback = function(error, data, response) {
if (error) {
console.error(error);
} else {
console.log('API called successfully. Returned data: ' + data);
}
};
apiInstance.convertDocumentPdfToDocxRasterize(inputFile, callback);
That’s all there is to it — no more code required!