In this blog, we provide a step-by-step solution to dynamically changing the template of a PDF document using the open source software PDFbox.
Often a business is required to create a copy of a PDF document and add to it a company-specific header or footer. This extra information or styling does not exist in the predefined template, so the company must make changes manually which can lead to error if not standardised.
For example, a company may decide to add a specific message for a specific situation (like COVID-19) to the header of a PDF document with a predefined template. This action can be achieved by manually printing the PDF on paper that already has the header on it.
However, this approach has two fundamental issues:
- Staff have to know the exact printing paper to use for each document which can lead to error. It also adds some extra effort while printing.
- It can be done only for the hard copy of the document. This approach can’t be used if you want to send the soft copy over email instead.
Therefore the manual approach can’t always be implemented due to the above restrictions. Hence, we need to look for the automatic solution to the problem so that changes to predefined templates can be applied dynamically when required.
Assumptions:
Since "document" is a very generic term and can be stored in various formats, we are going to make the following assumptions while trying to figure out the solution.
- The document is stored as PDF format.
- The document is stored as binary in a database (file should work with slight modification).
- Each document has a unique identifier and the content for both original document and template document can be retrieved by using the document unique identifier.
- The company software stack is java compatible.
Existing Packages:
There are quite a few open source packages available that can provide this solution:
- qoppa (https://www.qoppa.com) - Java commercial library
- PDFbox (https://pdfbox.apache.org) pp, Java open source library
- itext (https://itextpdf.com/en), Java Library
- The company software stack is java compatible.
These softwares are great for doing any type of PDF manipulation however, for our particular problem PDFbox stands out since it has the specific functionality for overlay which does what we are looking for and is free.
Dependencies
This is a java solution and the required java library can be obtained by applying the following maven dependency:
Implementation
Follow the steps below to implement the solution:
- Retrieve document content and template ID by using document ID by standard SQL.
- Retrieve the template content by using template document form the first step. Please note that this query is just a sample , please change the values as per your own database.
- Create a java class (PDFOverlay.java ) and create a function say ( createFinalDocument ) - which takes the original document , a physical path of an initial template PDF and final template PDF as a byte array.
Ok, let's try to understand what is going on in the above code:
- The first section is for validation, if the template document is null or empty then we should return the original document since there is no need to apply any template here.
- Next, we need to load both the PDF contents to create two PDFDocument objects.
- As a next step we need to create a HashMap object for PDFOverlay guide, with this we can apply different templates for each page. For this HashMap key will be page number and value will be the physical path for a template PDF to apply. This is required in order to apply any sort of overlay. We can also specify where to apply overlay in the Foreground or Background by calling - Overlay.Position.FOREGROUND.
- At the same time we can override the overlayGuide by calling setAllPagesOverlayPDF function and passing our template PDF as argument.
- Finally, we need to save output PDF content as byteArrayOutputStream and then convert back to byte array before returning it.
Testing
In order to test the above code the following sample code can be used:
You should now be able to apply specific overlays to PDFs with predefined templates.