Archive for the ‘python’ Category

I had built my Python interpreter from source because I wanted a 64bit compile to use with boost-python based generic algorithm. So for the last couple months I have not had readline support, which means no arrow support in the console. Also means neat shortcut’s like _ would work for assigning the last result of an expression.

I finally got annoyed enough today to fix it. After a ton of failed attempts and trying to rebuild it with readline support. I found a post in the Ruby world where someone has having similar issues. Their solution was rlwrap. Having homebrew installed, I figured I’d give that a try.


wwitzel:~ brew install rlwrap
wwitzel:~ alias python='rlwrap python'

And presto, everything functioned as it should. I added the alias to my .bash_profile and made this blog post for when I forget about this in the future. Hope this helps someone else who might be wrestling with the same issue.

Note: Wordpress likes to convert my < > inside [ code ] blocks to to &amp;gt; and &amp;lt; so sorry about the formatting

After Beazley’s talk at PyCon "Understanding the Python GIL" I released I had never done any work that released the GIL, spawned threads, did some work, and then restored the GIL. So I wanted to see if I could so something like that with Boost::Python and Boost::Thread and the type of performance I’d get from it with an empty while loop as the baseline. So I hacked up some quick and dirty C++ code and quick bit of runable Python to test out the resulting module and away I went. Below are the code snippets, links to bitbucket, and the results of the Python runable.

#include <iostream>
#include <vector>
#include <boost/shared_ptr.hpp>
#include <boost/thread.hpp>
#include <boost/python.hpp>

class ScopedGILRelease {
public:
	inline ScopedGILRelease() { m_thread_state = PyEval_SaveThread(); }
	inline ~ScopedGILRelease() { PyEval_RestoreThread(m_thread_state); m_thread_state = NULL; }
private:
	PyThreadState* m_thread_state;
};

void loop(long count)
{
	while (count != 0) {
		count -= 1;
	}
	return;
}

void nogil(int threads, long count)
{
	if (threads <= 0 || count <= 0)
		return;

	ScopedGILRelease release_gil = ScopedGILRelease();
	long thread_count = (long)ceil(count / threads);

	std::vector<boost::shared_ptr<boost::thread> > v_threads;
	for (int i=0; i != threads; i++) {
		boost::shared_ptr<boost::thread> m_thread = boost::shared_ptr<boost::thread>(new boost::thread(boost::bind(loop,thread_count)));
		v_threads.push_back(m_thread);
	}

	for (int i=0; i != v_threads.size(); i++)
		v_threads[i]->join();

	return;
}

BOOST_PYTHON_MODULE(nogil)
{
	using namespace boost::python;
	def("nogil", nogil);
}

Then I used the following Python script to run some quick tests.

import time
import nogil

def timer(func):
	def wrapper(*arg):
		t1 = time.time()
		func(*arg)
		t2 = time.time()
		print "%s took %0.3f ms" % (func.func_name, (t2-t1)*1000.0)
	return wrapper

@timer
def loopone():
	count = 5000000
	while count != 0:
		count -= 1

@timer
def looptwo():
	count = 5000000
	nogil.nogil(1,count)

@timer
def loopthree():
	count = 5000000
	nogil.nogil(2,count)

@timer
def loopfour():
	count = 5000000
	nogil.nogil(4,count)

@timer
def loopfive():
	count = 5000000
	nogil.nogil(6,count)

def main():
	loopone()
	looptwo()
	loopthree()
	loopfour()
	loopfive()

if __name__ == '__main__':
	main()

The results I got were quite interesting and very consistent on my MacBook Pro. I ran the script about 1,000 times and got roughly the same results every time.

loopone took 364.159 ms (pure python)
looptwo took 15.295 ms (c++, no GIL, single thread)
loopthree took 7.763 ms (c++, no GIL, two threads)
loopfour took 8.119 ms (c++, no GIL, four threads)
loopfive took 11.102 ms (c++, no GIL, six threads)

Anyway, that’s all really. Nothing profound here, no super insightful ending. Just hey look and stuff is faster and I might use this. All the code for this is available in my bitbucket repo. http://bitbucket.org/wwitzel3/code/src/tip/nogil/

You will require Boost Library including Boost Python and Boost Thread as well as Python libraries and includes to build this. For boost, bjam –with-python –with-thread variant=release toolset=gcc is all I did on my Mac. Then I added the resulting lib’s as Framework dependencies in Xcode along with the Python.framework.

Today I was having a chat today about Pylons vs. Django and for the most part it was pretty diplomatic. We got to talking about the Admin interface the Django has. Which you don’t have to do any extra boiler plate for, it is just there for you. With Pylons you have to use something like FormAlchemy or use Turbogears to get a similar style admin interface for your models and data.

Since we were sitting at a computer, I went ahead brought up a quick project and did a little demo of the paster shell. Sure, it involves typing and it isn’t as pretty or “fast” as an admin panel, but he didn’t even know it existed. One of the common things he mentioned was, “if I want to change the menus that are dynamically defined” or “if a username needs to be changed” .. and the application itself doesn’t have a custom admin panel, with Pylons he had to do raw SQL.

$paster shell pylons_config.ini

All objects from demo.lib.base are available
Additional Objects:
   mapper     -  Routes mapper object
   wsgiapp    -  This project's WSGI App instance
   app        -  paste.fixture wrapped around wsgiapp

>>> error_user = meta.Session.query(model.User).filter_by(username='wwitzel 3').one()
>>> # nice thing about this, is you also will get exceptions throw if more than one record exists
>>> error_user.username
u'wwitzel 3'
>>> error_user.username = 'wwitzel3'
>>> meta.Session.commit()
>>> menu_typo = meta.Session.query(model.Menu).filter_by(id=1).one()
>>> menu_typo.value
u'Abuot'
>>> menu_typo.value = 'About'
>>> meta.Session.commit()

So that is a very simple example of how one would use the paster shell to update some bad data in the database while ensuring integrity of your custom model and extension code. After I showed this to my friend he wasn’t as concerned about the lack of a web interface for administration within Pylons.

I asked the question I Stackoverflow and maybe it was too generic for the site, since it just got trolled with “Google keyword” by some d-bag. So I deleted it and figured I’d throw it up on my blog a see about getting some feedback from the people who read this pile about. The reason I ask this is mainly because I am preparing to do some updated screencasts for Pylons.

I’ve seen multiple ways referenced in official docs and I have done it a few different ways myself. I am using Pylons and I am curious what the best practices are for this common scenario?

I have used something similar to this for auto-magically making the conversion happen.

# The auto-magic version
# I pulled this off a blog, forget the source.
def _sa_to_dict(obj):
    for item in obj.__dict__.items():
        if item[0][0] is '_':
            continue
        if isinstance(item[1], str):
            yield [item[0], item[1].decode()]
        else:
            yield item

def json(obj):
    if isinstance(obj, list):
        return dumps(map(dict, map(_sa_to_dict, obj)))
    else:
        return dumps(dict(_sa_to_dict(obj)))

# here is the controller
@jsonify
def index(self, format='html'):
    templates = Session.query(Template).all()
    if format == 'json':
        return json(templates)

I have also done the version where you use the jsonify decorator and build your dictionary manually, something like this, which is ok if I need to define some custom behavior for my JSON, but as the default behavior seems excessive.

@jsonify
def index(self, format='html'):
    if format == 'json':
        q = Session.query
        templates = [{'id': t.id,
                      'title': t.title,
                      'body': t.body} for t in q(Template)]
        return templates

I’ve also created an inherited SA class which defines a json method and have used that on all my objects to convert them to JSON. Similar to the the fedora extensions.

Maybe I missed some obviously library out there or some obvious helper in the Pylons packages, but I feel like this is a very common task being done a dozen different ways between docs, source, and my own personal projects. Curious what others are doing / using.

There is an on going discussion at the office with a team member who refuses to use dynamic languages. Claiming that most of his errors are typographical errors and they are caught by the compiler. So for him, since these errors are not caught until runtime, he throws and entire group of languages out the window. He also claims that to ensure that same level of checking with a dynamic language you would have to create more unit tests than normal to prevent introducing unhandled runtime exceptions.

So I decided to do a little test over the weekend. I created a very simple Number class in Python and C++. Using the exact same TDD development process, I implemented some very basic operations including division, addition, subtraction, etc… I ended up with 12 tests. The exact same tests for both the C++ and Python implementation resulting in 100% of the executation path being covered. I decided that the compliation (in case of C++) and passing of the tests determined a success.

Then went back and inserted common typographical errors. Mistypes, extra = signs, not enough = signs, miseplled_varaibles, etc… The end result was I was unable to get my unit tests passing while introducing syntax that would induce an unhandled runtime exception in Python. Granted, in C++ the compiler did catch a lot of things for me, but the point here is I didn’t have to create any extra tests to ensure that same level of confidence in my Python code.

Morning Tutorial

In the AM I attended the “py.Test: rapid testing with minimal effort“. I was planning to attend Python 401, but that filled up before I registered. I learn some new things about py.test that I didn’t know about, having never read the doc for it, it wasn’t hard.

I learned about the -k switch (loop on fail). Basically this continually runs the tests as source units change. It only reruns failing tests or new tests.

The generative tests using yield was something I had know about but never used and now I know exactly where I will be applying this. I have a program that takes a dozen or so different command line arguments and switches. I will generate a text file with all possible combinations. Then I can use that file to run through the command line tests. 

Afternoon Tutorial

In the afternoon I attended the “Advanced SQLAlchemy tutorial. I am not sure if I was the target audience for this tutorial or not. I have been using SQLAlchemy for a while now. The topics of coverage showed promise. Maybe I built it up too much in my head. A tutorial by Bayer himself, he wrote it. I should be blow away here. I wanted to leave this tutorial thinking, “Wow look how stupid I was, look at how much easier this is when you use FOO or BAR.” Or WOW, I never knew you could do that. Sadly, I have to say I had neither of those moments.

First let me say this, this isn’t a critique of either of the tutorial instructors or the content of it at an academia level. Both were quality.

I have to say the best thing I did see a nice shorthand way of doing somethings with the declarative_base. The coverage of inheritance mapping wasn’t really mind blow single inheritance was a sparse matrix approach using exclude and the join inheritance was just a Strategy pattern.

Transactions were covered for what seems like people who had never worked with transactions. The deadlock example that was given using SessionExtension was nice and practical and really the only thing that made me go “ahhhh” as I knew I could refactor the current way I was dealing with concurrency and databases with SQLAlchemy.

Summary

Overall, it was a good day. The coverage in the tutorials was very good. The dialog with people who were attending the tutorial was the best part really. Helping people work through the examples and answering some questions that people were either too shy or too embarrassed to ask to the whole class.

Just like with OOPSLA 2008, I like to keep a personal log of what I did at the conference each day. Today was my first day at Pycon  2009. We arrived early this morning (around 9 AM). I managed to get registered and pickup my fun bag which included a Pycon shirt, a Launchpad shirt, and an CD for Opensolaris amoung other things.

We ate at the convention center restaurant across the street from the Hyatt Regency O’Hare .. yeah don’t go there. After that, we took a nap, we had been up since the day before (our flight in to Chicago left Florida at 5 AM).

We stopped at Red Bar, the bar inside the hotel and had a few drinks and spoke with John Moulder (spelling?) who was also attending Pycon. We laughed a little since he works for the government and his last name was Moulder.

I have 2 tutorials tomorrow. An AM tutorial about py test and an afternoon tutorial on SQL Alchemy, Looking forward to both. Even though I am disappointed the Python 401 tutorial was full, I am sure the py test tutorial will be a fine substitute and equally as informative.

UPDATE / 13 March 2009: snakefight 0.3 now has a –include-jar option, prefer that to using my hack.

After reading P. Jenvey’s blog post about Deploying Pylons Apps to Java Servlet Containers I immediately downloaded the Jython 2.5 beta and installed snakefight to give it a try. One of our services where I work is a Pylons based application. It is deployed using paster and Apache ProxyPass. Our main application is written in Java and is deployed as a war under Jetty. So if I can get my Pylons application built as a war and deployed that way, it would greatly simplify our deployment process.

$ sudo /opt/jython25/bin/easy_install snakefight
$ /opt/jython25/bin/jython setup.py develop
$ /opt/jython25/bin/jython setup.py bdist_war --paster-config dev_r2.ini
... output of success and stuff ...
$ cp dist/project-0.6.8dev.war /opt/jetty/webapps

Now I visit my local server and hit the project context. I get some database errors, kind of expected them. So for the time being, I’ll be running this directly using Jython to speed up the debugging process. A quick googling of my DB issues turns up zxoracle for SQLalchemy which uses Jython zxJDBC. I install that in to sqlalchemy/databases as zxoracle.py and give it another go. Changing the oracle:// lines in my .ini file to now read zxoracle:// Now it can’t find the 3rd party Oracle libraries (ojdbc.jar).

$ cd ./dist
$ jar xf project-0.6.8dev.war
$ cd WEB-INF/lib
$ ls
# no ojdbc.jar as expected ...
$ cd ~/project
$ export CLASSPATH=/opt/jython25/jython.jar:/usr/lib/jvm/java/jre/lib/ext/ojdbc.jar
$ /opt/jython25/bin/jython /opt/jython25/bin/paster serve --reload dev_r2.ini

Now it is looking a little better and it able to find the jar, but still a DB issue, now with SQLalchemy library. Not having a ton of time to investigate, I decide to try rolling back my SQAlachemy version for Jython. Turns out rolling back to 0.5.0 fixed the issue. I’ll be investigating why it was breaking with 0.5.2 soon ™. So now I rerun it, and get a new error.

AttributeError: 'ZXOracleDialect' object has no attribute 'optimize_limits'

I decide I am just going to go in to the zxoracle.py and add optimize_limits = False to the ZXOracleDialect. No idea what this breaks or harms, but I do it anyway and rerun the application. Success! Every thing is working now. No liking the idea of having to manually insert the Oracle jar in to the WEB-INF/lib and not really wanting to much around with environment variables, I also implemented a quick and dirty include-java-libs for snakefight, the diff for command.py is below. This allows me to pass in a : separated list of jars to include in the WEB-INF/lib. EDIT: The diff I posted isn’t needed since I put it on my hg repo. You can grab it from here.

So now I am back to building my war. Just as before.

$ /opt/jython25/bin/jython setup.py bdist_war --paste-config dev_r2.ini --include-java-libs /opt/jython25/extlibs/ojdbc.jar
running bdist_war
creating build/bdist.java1.6.0_12
creating build/bdist.java1.6.0_12/war
creating build/bdist.java1.6.0_12/war/WEB-INF
creating build/bdist.java1.6.0_12/war/WEB-INF/lib-python
running easy_install project
adding eggs (to WEB-INF/lib-python)
adding jars (to WEB-INF/lib)
adding WEB-INF/lib/jython.jar
adding Paste ini file (to dev_r2.ini)
adding Paste app loader (to WEB-INF/lib-python/____loadapp.py)
generating deployment descriptor
adding deployment descriptor (WEB-INF/web.xml)
created dist/project-0.6.8dev-py2.5.war
$ cp dist/project-0.6.8dev-py2.5.war /opt/jetty/webapps
$ sudo /sbin/service jetty restart

And presto! I am in business. My pylons application is deployed under Jetty and all the selenium functional tests are passing. I am sure there is probably a easier, neater, or cleaner way to do all this, but this was my first iteration through and also my first time ever deploying a WAR to a java servlet container so all in all I am happy with the results. Performance seems about the same as when running the application with paster serve, but Jetty does use a little more memory than before (expected I guess).

Heading to PyCon this year. Looking forward to the tutorials and the great line up of keynotes. I highly recommend attending this year, it looks like one of the best PyCon’s in a while. I’ll be attending the Advanced SQLAlchemy tutorial and the py Test tutorial. I was hoping to get in to the Python 401 tut , but registered late and it was already full.

They key notes I am looking forward

  • Building tests for large, untested codebases by C. Titus Brown
  • Metaprogramming with Decorators and Metaclasses by Bruce Eckel
  • Topics of Interest by Ian Bicking

So if you are a Python hacker get over to http://us.pycon.org sign up and get yourself there! It is gonna be a great conference this year.

I need to concatenate a set of PDFs, I will take you through my standard issue Python development approach when doing something I’ve never done before in Python.

My first instinct was to google for pyPDF. Success! So, fore go reading any doc and just give the old easy_install a try.

$ sudo easy_install pypdf

Another success! Ok, a couple help() calls later and I am ready to go. The end result is surprisingly small and seems to run fast enough even for PDFs with 50+ pages.

from pyPdf import PdfFileWriter, PdfFileReader

def append_pdf(input,output):
    [output.addPage(input.getPage(page_num)) for page_num in range(input.numPages)]

output = PdfFileWriter()
append_pdf(PdfFileReader(file("sample.pdf","rb")),output)
append_pdf(PdfFileReader(file("sample.pdf","rb")),output)

output.write(file("combined.pdf","wb"))