I need to concatenate a set of PDFs, I will take you through my standard issue Python development approach when doing something I’ve never done before in Python.

My first instinct was to google for pyPDF. Success! So, fore go reading any doc and just give the old easy_install a try.

$ sudo easy_install pypdf

Another success! Ok, a couple help() calls later and I am ready to go. The end result is surprisingly small and seems to run fast enough even for PDFs with 50+ pages.

from pyPdf import PdfFileWriter, PdfFileReader

def append_pdf(input,output):
    [output.addPage(input.getPage(page_num)) for page_num in range(input.numPages)]

output = PdfFileWriter()
append_pdf(PdfFileReader(file("sample.pdf","rb")),output)
append_pdf(PdfFileReader(file("sample.pdf","rb")),output)

output.write(file("combined.pdf","wb"))

3 Comments

  1. John says:

    I don’t know if you need further programmatic control of this process, but if not I propose an alternative (‘cat’ is the operation, and for some reason you must specify the output file with ‘output ‘):

    $ sudo apt-get install pdftk
    $ pdftk sample1.pdf sample2.pdf cat output combined.pdf

  2. Wayne says:

    Yeah, this is just a basic example, the entire process is more involved and using pdftk wasn’t really an option. I tried your example and it worked really well and very fast and that will come in good use if I have to do any one off concatenation of PDFs.

  3. Ben Atkin says:

    Wow. It’s impressive how simple it was to complete this task in Python.

Leave a Reply

You must be logged in to post a comment.