I need to concatenate a set of PDFs, I will take you through my standard issue Python development approach when doing something I’ve never done before in Python.
My first instinct was to google for pyPDF. Success! So, fore go reading any doc and just give the old easy_install a try.
$ sudo easy_install pypdf
Another success! Ok, a couple help() calls later and I am ready to go. The end result is surprisingly small and seems to run fast enough even for PDFs with 50+ pages.
from pyPdf import PdfFileWriter, PdfFileReader
def append_pdf(input,output):
[output.addPage(input.getPage(page_num)) for page_num in range(input.numPages)]
output = PdfFileWriter()
append_pdf(PdfFileReader(file("sample.pdf","rb")),output)
append_pdf(PdfFileReader(file("sample.pdf","rb")),output)
output.write(file("combined.pdf","wb"))
John says:
I don’t know if you need further programmatic control of this process, but if not I propose an alternative (‘cat’ is the operation, and for some reason you must specify the output file with ‘output ‘):
$ sudo apt-get install pdftk
05 Mar 2009, 17:20$ pdftk sample1.pdf sample2.pdf cat output combined.pdf
Wayne says:
Yeah, this is just a basic example, the entire process is more involved and using pdftk wasn’t really an option. I tried your example and it worked really well and very fast and that will come in good use if I have to do any one off concatenation of PDFs.
05 Mar 2009, 18:26Ben Atkin says:
Wow. It’s impressive how simple it was to complete this task in Python.
16 Mar 2009, 18:02