How to Split a Word DOCX Document into Separate Documents (per page) using Java
Splitting a DOCX document into individual documents per page is a common file processing problem — one that can be solved easily in limited capacities through features built into MS Word. Solving this problem efficiently at scale (with a large file/large number of documents), however, requires a programmatic solution — and thankfully, there’s an API for exactly that purpose.
Our DOCX Split API will quickly create new files from each page within your original DOCX file, returning either the raw encoding for each new file OR a file URL, depending on your preference. To help you take advantage of this API, I’ve provided complementary code examples In Java below which you can use to structure your API call. You can use this API for free by simply registering a free account on our website; you’ll receive a free-tier API key with this account which can be authenticated within the code examples provided below.
Before we call the API, we first need to install the API client. We can do so with Maven by first adding a reference to the repository in pom.xml:
<repositories>
<repository>
<id>jitpack.io</id>
<url>https://jitpack.io</url>
</repository>
</repositories>
With our Jitpack library dynamically compiled, we can now add a reference to the dependency:
<dependencies>
<dependency>
<groupId>com.github.Cloudmersive</groupId>
<artifactId>Cloudmersive.APIClient.Java</artifactId>
<version>v4.25</version>
</dependency>
</dependencies>
Now moving onto our controller, let’s include the below inputs at the top of our file:
// Import classes:
//import com.cloudmersive.client.invoker.ApiClient;
//import com.cloudmersive.client.invoker.ApiException;
//import com.cloudmersive.client.invoker.Configuration;
//import com.cloudmersive.client.invoker.auth.*;
//import com.cloudmersive.client.SplitDocumentApi;
After that, we can structure our API call with the below examples. The parameter ‘returnDocumentContents’ is set to ‘True’ by default; if we set it to ‘False’ instead, we can choose to return URLs for each new file:
ApiClient defaultClient = Configuration.getDefaultApiClient();// Configure API key authorization: Apikey
ApiKeyAuth Apikey = (ApiKeyAuth) defaultClient.getAuthentication("Apikey");
Apikey.setApiKey("YOUR API KEY");
// Uncomment the following line to set a prefix for the API key, e.g. "Token" (defaults to null)
//Apikey.setApiKeyPrefix("Token");SplitDocumentApi apiInstance = new SplitDocumentApi();
File inputFile = new File("/path/to/inputfile"); // File | Input file to perform the operation on.
Boolean returnDocumentContents = true; // Boolean | Set to true to return the contents of each Worksheet directly, set to false to only return URLs to each resulting document. Default is true.
try {
SplitDocxDocumentResult result = apiInstance.splitDocumentDocx(inputFile, returnDocumentContents);
System.out.println(result);
} catch (ApiException e) {
System.err.println("Exception when calling SplitDocumentApi#splitDocumentDocx");
e.printStackTrace();
}
After that, you’re all set — no more coding required. Your free-tier API key will allow a limit of 800 API calls per month with zero additional commitments, and you can upgrade at any time if/when you need to scale your operation further.