How to Convert a PDF Document to Text using Node.js

Cloudmersive
2 min readDec 16, 2022

--

Removing text from a PDF can be done in two basic ways: either via OCR (optical character recognition), or by removing plain text contents from a vector PDF format. The below code performs the latter operation, returning a plain text string with customizable whitespace formatting (you can choose to preserve or minimize whitespace; it will preserve by default).

To use this API, your first step is to install the SDK. Begin by either running the below command:

npm install cloudmersive-convert-api-client --save

Or adding the below snippet to your package.json (both options are equally valid):

  "dependencies": {
"cloudmersive-convert-api-client": "^2.6.3"
}

After installation is complete, add the below code to structure your API call:

var CloudmersiveConvertApiClient = require('cloudmersive-convert-api-client');
var defaultClient = CloudmersiveConvertApiClient.ApiClient.instance;

// Configure API key authorization: Apikey
var Apikey = defaultClient.authentications['Apikey'];
Apikey.apiKey = 'YOUR API KEY';



var apiInstance = new CloudmersiveConvertApiClient.ConvertDocumentApi();

var inputFile = Buffer.from(fs.readFileSync("C:\\temp\\inputfile").buffer); // File | Input file to perform the operation on.

var opts = {
'textFormattingMode': "textFormattingMode_example" // String | Optional; specify how whitespace should be handled when converting PDF to text. Possible values are 'preserveWhitespace' which will attempt to preserve whitespace in the document and relative positioning of text within the document, and 'minimizeWhitespace' which will not insert additional spaces into the document in most cases. Default is 'preserveWhitespace'.
};

var callback = function(error, data, response) {
if (error) {
console.error(error);
} else {
console.log('API called successfully. Returned data: ' + data);
}
};
apiInstance.convertDocumentPdfToTxt(inputFile, opts, callback);

Within the code examples above, you’ll need to supply a valid Cloudmersive API key to authenticate the service. You can get one for free on our website; just register a free account (this yields a limit of 800 API calls per month and not commitments), then copy your key into the Apikey.apiKey field, and you’re all done.

--

--

Cloudmersive
Cloudmersive

Written by Cloudmersive

There’s an API for that. Cloudmersive is a leader in Highly Scalable Cloud APIs.

No responses yet