You see lots of examples on the net for SQLalchemy. Implementing a blog, implementing a wiki, even other articles on implementing tags. Some are good, some are pretty poor, and some are just plain out of date. After some researching on best practices for implementing a Tag system with SQLalchemy I’ve come up with the solution you are about to read.
I’ve pulled these examples from real world production code. Just renamed them and shortened them up a little for the blog post. I pulled the naming convention right from SimpleSite example for Pylons. Here is the the table layout. Simple. A page, tag, and relation table.
page_table = sa.Table("page", meta.metadata,
sa.Column("id", sa.types.Integer, sa.schema.Sequence('page_seq_id', optional=True), primary_key=True),
sa.Column("name", sa.types.Unicode(100), nullable=False),
)
tag_table = sa.Table("tag", meta.metadata,
sa.Column("id", sa.types.Integer, sa.schema.Sequence('taq_seq_id', optional=True), primary_key=True),
sa.Column("name", sa.types.Unicode(50), nullable=False, unique=True),
)
pagetag_table = sa.Table("pagetag", meta.metadata,
sa.Column("id", sa.types.Integer, sa.schema.Sequence('pagetag_seq_id', optional=True), primary_key=True),
sa.Column("pageid", sa.types.Integer, sa.schema.ForeignKey('page.id')),
sa.Column("tagid", sa.types.Integer, sa.schema.ForeignKey('tag.id')),
)
Now the important part, the mapper. The mapper is what is going to tell sqlalchemy what you are trying to do and how to handle and relate those ForeignKeys. It does the heavy lifting so you don’t have to.
class Tag(object):
pass
class Page(object):
pass
orm.mapper(Tag, tag_table)
orm.mapper(Page, page_table, properties = {
'tags':orm.relation(Tag, secondary=pagetag_table, cascade="all,delete-orphan"),
})
This does two things. It setups the relationship and also uses the built-in cascade rule from SQLalchemy to ensure that no orphan tags are left in the database.
So now we can use this model setup like so. Here, I’ve just started up my paster shell so I could work through some quick usage examples.
page = model.Page()
page.name = "Example Page"
tag = model.Tag()
name = "tag"
page.tags.append(tag)
meta.Session.save(page)
meta.Session.commit()
tag_q = meta.Session.query(model.Tag)
tags = tag_q.all()
len(tags)
# filter pages by tag(s)
page_q = meta.Session.query(model.Page)
pages = page_q.join('tags').filter_by(name="tag").all()
# delete-orphans does the work for us here...
meta.Session.delete(pages[0])
meta.Session.commit()
tags = tag_q.all()
len(tags)
# tag cloud anyone?
# see the source code linked below for a properly weighted tag cloud.
tag_q = meta.Session.query(func.count("*").label("tagcount"), model.Tag)
tag_r = tag_q.filter(model.Tag.id==model.pagetag_table.c.tagid).group_by(model.Tag.id).all()
# what about pages with related tags?
page_q = meta.Session.query(model.Page)
taglist = ["tag1", "tag2"]
tagcount = len(taglist)
page_q.join(model.Page.tags).filter(model.Tag.name.in_(taglist)).\
group_by(model.Page.id).having(func.count(model.Page.id) == tagcount).all()
Ok, now the fun part, what about all related tags? An intersection between an arbitrary number of many-to-many relationships? For that I added a static method to my tag class. Something like this.
class Tag(object):
@staticmethod
def get_related(tags=[]):
tag_count = len(tags)
inner_q = select([pagetag_table.c.pageid])
inner_w = inner_q.where(
and_(pagetag_table.c.tagid == Tag.id,Tag.name.in_(tags))
).group_by(pagetag_table.c.pageid).having(func.count(pagetag_table.c.pageid) == tag_count).correlate(None)
outer_q = select([Tag.id, Tag.name, func.count(pagetag_table.c.shipid)])
outer_w = outer_q.where(
and_(pagetag_table.c.pageid.in_(inner_w),
not_(Tag.name.in_(tags)),
Tag.id == pagetag_table.c.tagid)
).group_by(pagetag_table.c.tagid)
related_tags = meta.Session.execute(outer_w).fetchall()
return related_tags
A big thanks to PHP-Cake and TagSchema for the ideas, concepts, and implementation examples.
You can find the actual code that this blog was the basis for at:
http://trac.pieceofpy.com/pieceofpy/browser/tags-sqlalchemy
rholmes says:
If I could make one suggestion, if you’re going to change the background colour on your web pages you should also change the foreground colour. Without images enabled we get black text on blue background.
Great article though, and definitely worth the images
11 Oct 2008, 09:34Wayne says:
Thanks. When it comes to design, I just plugin a Wordpress theme, but I disabled imagines and was able to view what you were talking about.
I’ll get that fixed up.
11 Oct 2008, 18:08Wayne says:
Trying this new theme, looks ok to me when images are off. Enjoy.
11 Oct 2008, 18:30Empty says:
Great post. Thank you. This kind of thing is extremely helpful. I plan to get into this on a project I’m working on so it will come in quite useful.
14 Oct 2008, 02:41Walter Cruz says:
I’ve published a translation on pt_BR here: http://artigos.waltercruz.com/tags-com-sqlalchemy
20 Oct 2008, 16:44Wayne says:
Thank you for this translation. Much appreciated.
21 Oct 2008, 04:29Matt Bowen says:
Thanks for writing such thoughtful tutorial that shows off modern SQLAlchemy features. One question — why have the “id” column in pagetag_table? Couldn’t you just have a composite primary key instead? You know that the combination of page and tag IDs will be unique. MySQL Forge actually has an interesting discussion of tag schemas that talks about why to avoid having a surrogate key like pagetag_table’s id (http://forge.mysql.com/wiki/TagSchema)
20 Dec 2008, 20:05Peter Panski says:
Thank you very much for sharing! I would love to see this done without the inefficient HAVING. Matt’s Forge link gives a hint how to do this (see the “Toxi-Schema”), but sadly i just don’t know enough SA to figure it out on my own.
02 Mar 2009, 09:33