authored by Wayne Witzel III

Concatenating PDF with Python

On March 05, 2009 In python, code Permalink

I need to concatenate a set of PDFs, I will take you through my standard issue Python development approach when doing something I've never done before in Python.

My first instinct was to google for pyPDF. Success! So, fore go reading any doc and just give the old easy_install a try.

$ sudo easy_install pypdf

Another success! Ok, a couple help() calls later and I am ready to go. The end result is surprisingly small and seems to run fast enough even for PDFs with 50+ pages.

from pyPdf import PdfFileWriter, PdfFileReader

def append_pdf(input,output):
    [output.addPage(input.getPage(page_num)) for page_num in range(input.numPages)]

output = PdfFileWriter()

Read and Post Comments

Pylons and long-live AJAX request.

On February 04, 2009 In python, code, pylons Permalink
So I am playing around in Firefox with XMLHttpRequest. Looking in to a way to facilate a server update to a client without have to refresh the page or use Javascript timers. So the long-live HTTP request seems the way to go. This little app will at most have 20-30 connections at once, so I am not worried about the open connection per client. The data it calculates is rather large and intensive to gather, so I paired it with the cache decorator snippet found on ActiveState and used in Expert Python Programming. This example feeds a cached datetime string. The caching lets different client receive the same data during the cache process. There is some lag between the updates since they all set their sleep at different points, there may be away around this though. So here is my basic index.html. [sourcecode language="html"] This will push data from the server to you every 5 seconds .. enjoy!

    [/sourcecode] Now the controller code itself. [sourcecode language="python"] class ServerController(BaseController): def index(self): response.headers['Content-type'] = 'multipart/x-mixed-replace;boundary=test' return data_stream() def data_stream(stream=True): yield datetime_string() while stream: time.sleep(5) yield datetime_string() @memorize(duration=15) def datetime_string(): content = '--test\nContent-type: application/xml\n\n' content += '\n' content += '' + str( + '\n' content += '--test\n' return content [/sourcecode] Also the decorator code for good measure. [sourcecode language="python"] cache = {} def is_old(entry, duration): return time.time() - entry['time'] > duration def compute_key(function, args, kw): key = pickle.dumps((function.func_name, args, kw)) return hashlib.sha1(key).hexdigest() def memorize(duration=10): def _memorize(function): def __memorize(*args, **kw): key = compute_key(function, args, kw) if (key in cache and not is_old(cache[key], duration)): return cache[key]['value'] result = function(*args, **kw) cache[key] = {'value': result, 'time':time.time()} return result return __memorize return _memorize [/sourcecode] Full working demo will be available in the HG repos shortly.

    Read and Post Comments

Code: Buffer CGI file uploads in Windows

On August 19, 2008 In python, code Permalink
Note to self, when handling CGI file uploads on a Windows machine, you need the following boiler plate to properly handler binary files. [sourcecode language='python'] try: # Windows needs stdio set for binary mode. import msvcrt msvcrt.setmode (0, os.O_BINARY) # stdin = 0 msvcrt.setmode (1, os.O_BINARY) # stdout = 1 except ImportError: pass [/sourcecode] Also, if you're handling very large files and don't want to eat up all your memory saving them using the copyfileobj method, you can use a generator to buffer read and write the file. [sourcecode language='python'] def buffer(f, sz=1024): while True: chunk = if not chunk: break yield chunk # then use it like this ... for chunk in buffer(fp.file) [/sourcecode]
Read and Post Comments

Code: Saving in memory file to disk

On July 16, 2008 In python, code Permalink
Okay, I discovered this today when looking to increase the speed at which uploaded documents were saved to disk. Now, I can't explain the inner workings of why it is fast(er), all I know is that with the exact same form upload test ran 100 times with a 25MB file over a 100Mbit/s network this method was on average a whole 2.3 seconds faster over traditional methods of write, writelines, etc.. How does this extend to real-world production usage over external networks, well no idea. Though I plan to find out. So you all will be the first to know as soon as I find some guinea pig site that does enough file uploads to implement this on. [sourcecode language='python'] # Minus some boiler plate for validity and variable setup. import os import shutil memory_file = request.POST['upload'] disk_file = open(os.path.join(save_folder, save_name),'w') shutil.copyfileobj(memory_file.file, disk_file) disk_file.close() [/sourcecode]
Read and Post Comments