How to Manipulate, Split and Concatenate PDF Files Server-Side
The Portable Document Format (PDF) was first developed by Adobe Systems in 1993, with the aim of creating a file format that could be easily shared and printed across different computer systems, software applications, and devices. At the time, documents were typically created in proprietary file formats that were specific to the software application used to create them, making it difficult to share and view documents across different platforms.
PDFs were designed to be a universal file format that could preserve the formatting, fonts, images, and other elements of a document, regardless of the software used to create it or the device used to view it. The format quickly gained popularity and became a standard for sharing documents online, particularly for academic journals, government reports, and other professional publications.
Serving PDF files from a web server is very commonplace, and there is even paid PDF server software available, but is quite pricey. In this article, I'll show you a free and easy way to manipulate, split and concatenate PDF documents on your web server.
The software library we are going to use is called qpdf, a very powerful C++ library for PDF manipulation. It also comes with a command-line binary tool which you can invoke via a system call from Node, NGINX, Apache, or whatever your weapon of choice may be.
You can download and compile the source code yourself, or if you can download the packages on Debian/Ubuntu or MacOS. There's no reliable Windows package currently, so you'd have to compile it manually.
Install on Ubuntu:
apt -y install qpdf
Install on MacOS:
brew install qpdf
If you don't have Homebrew installed, run this command:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
Ok, now that that's installed, here's some examples of what you might want to do with QPDF.
Let's say you have a large PDF with hundreds or even thousands of pages and you want to be able to serve only one page, or a range of pages to the user. Here's how:
qpdf --empty --pages input_file.pdf 50,60-69 -- output_file.pdf
Or let's say you want to concatenate (merge) two PDF files as one:
qpdf --empty --pages input_file1.pdf input_file2.pdf -- output_file.pdf
You can even specify page ranges for multiple files:
qpdf --empty --pages input_file1.pdf 50,60-69 input_file2.pdf 1-10 -- output_file.pdf
Pretty neat huh? Let's say you have a password-protected file and you want to make a copy of it, decrypt it, and then send it to the client:
qpdf --passsword=password --decrypt secure.pdf unsecure.pdf
As you can see, QPDF is an amazing tool. There are many more things you can do with it. These examples merely scratched the surface. Not only is QPDF great for automated manipulation of PDF files, but you should also have it installed on your workstation as well for everyday PDF editing.
Rotate specific page to a specified angle (clockwise):
qpdf --rotate=90:2,4,6 --rotate=180:7-8 input.pdf output.pdf
Split a PDF into individual enumerated pages:
qpdf --split-pages=n input.pdf out_%d.pdf
Well, I hope you enjoyed this article. If so, please give it a like and/or leave a comment. I'd love to hear your feedback. What PDF solutions do you use on your web server? Will you be utilizing this solution?
For more great information about web dev, systems administration and more, please read the Designly Blog.
Please Share and Like!
About The Author
Jay is a full-stack developer, electrical engineer, writer and music producer. He currently resides in the Madison, WI area.