Archives

February 2010 (1)
September 2009 (1)
May 2009 (1)
April 2009 (1)
March 2009 (4)
January 2009 (3)

November 2008 (2)
October 2008 (2)
September 2008 (1)
August 2008 (5)
July 2008 (3)
June 2008 (1)
May 2008 (5)
April 2008 (8)
March 2008 (3)
February 2008 (1)
January 2008 (2)

December 2007 (2)
November 2007 (4)
October 2007 (17)
September 2007 (9)

Falling between two stools

Tuesday, May 13 2008

I love a good bit of language lawyering as much as the next programmer, but perhaps not so much when I am the victim, as opposed to the smug bastard who remembers all the footnotes and can do a good Nelson Muntz-style “ha ha!”

It turns out that in SQL, the BETWEEN AND comparison operator does not work quite the way you would expect. Quoting the Postgres manual:

a BETWEEN x AND y

is equivalent to

a >= x AND a <= y

Similarly,

a NOT BETWEEN x AND y

is equivalent to

a < x OR a > y

Now, if perchance x is greater than y — for example, if you write SELECT foo FROM bar WHERE foo BETWEEN 200 AND 100 — SQL will treat that as if you had written SELECT foo FROM bar WHERE foo >= 200 AND foo <= 100, and return nothing, thus doing the complete opposite of what you might expect from the normal English meaning of  “between”.

This has tripped up both me and a colleague in the last couple of weeks.

Not so much syntactic sugar as syntactic Nutrasweet…

no comments

Tags: sql ~ programming ~ annoyance

Sanitising smelly text

Tuesday, December 04 2007

At work we are migrating an old site to a new CMS.

Unfortunately the content is a mess. Owing to people pasting text in from Word and various other accidents, one fragment of HTML can be a mixture of UTF-8 and Latin-1 and cp1252 and goodness knows what else. When you’ve been a good boy and coded all your templates to declare “I am UTF-8, honest guv” it’s a bit trying. Especially when the client complains.

The markup is pretty broken too. It’s littered with weird markup from Word and generally non-compliant.

So far I’m having good results from a pipeline of various tricks.

  1. Python’s unicode function. unicode takes a string and transcodes it into Unicode. You can optionally force it to treat input as a particular encoding, and you can tell it how to handle errors.
  2. Beautiful Soup. It finds tag soup delicious. It also does a best-effort to detect encodings and transcode to Unicode. (You have to love software with a module called UnicodeDammit).
  3. htmltidy, in its utidylib manifestation. Does beautiful cleanup. It’s not super-robust though; I can make it segfault and dump core by feeding it the crap we have. Which is why I clean up with BeautifulSoup first.
  4. I butchered Josh Goldfoot’s marvellous XSS-defense script to strip out some of the more outrageous markup that I know we won’t use.

The only downside is that over thousands of items, this is pretty slow. But it’s the price you pay to be beautiful, I guess.

no comments

Tags: python ~ unicode ~ markup ~ programming ~ html tidy ~ beautiful soup

The tale of the teddy bear

Thursday, October 25 2007

Years ago someone told me about the teddy bear, and I have never forgotten. It’s a story I tell people — or the programming subset thereof — and that they in turn pass on.

Imagine a university computer lab, full of students working on their programming assignments. (You can tell this is an old story, because these days no doubt all the students work on their own computers wherever they like). There are only two teaching assistants for the whole lab. They’re pretty busy. At their desk is a large teddy bear.

One of the students gets stuck, goes to the free teaching assistant and says “help! My program won’t compile!”

And the teaching assistant says “I’m happy to help you. But first, you must explain your problem to this teddy bear.”

The student hesitates, then starts to tell the teddy bear “well my code creates a pointer for the beginning of the list, and then …” (Blah blah blah).

And after about 30 seconds of explanation, the student says “oh shit! I know what it is!”, goes back to their console, and fixes their own problem.

Which is why my number one debugging strategy for intractable problems is to find someone else to explain it to. Not because I expect them to solve my problem for me, but because telling a coherent story about the problem, with all the information needed to explain it to someone else, helps you understand it yourself. And strategy number two, if I can’t find someone else to talk to, is to write things down as though I were sending a detailed bug report to someone else.

no comments

Tags: debugging ~ programming

A tiny WSGI framework in an hour or two

Sunday, October 14 2007

 I’m not sure whether the first story here is meant to encourage or dissuade, but I am writing my own WSGI framework to support Burble. Colubrid is holding me back, and its replacement Werkzeug is overkill for what I want. (To be fair, Colubrid has been a great help to me in getting started.) I’m really getting into the educational aspect of doing things from scratch where I can.

It turns out that putting together a very lightweight WSGI framework is very easy indeed, especially having made a small compromise by using a few pre-built things from Ian Bicking’s Paste. (Yeah, I contradict myself. I am large, I contain multitudes.)

It’s so easy that I’m almost done, so I present a tiny, noddy framework for your reading pleasure. It implements a regex-based URL dispatcher a la Web.py.


#!/usr/bin/python
from paste.request import parse_formvars
from paste.response import HeaderDict
import re

def attrsfromdict(d):
"""From Python cookbook s6.18 p 280"""
self = d.pop('self')
for n,v in d.iteritems():
setattr(self, n, v)

def simplerepr(obj):
d = obj.__dict__
members = ', '.join([n + '=' + v.__repr__() for n,v in d.iteritems()])
return '%s(%s)' % (obj.__class__.__name__, members)

class NoMatchingControllerException(Exception):
pass

class Request(object):
def __init__(self, environ):
self.environ = environ
self.fields = parse_formvars(environ)

class Response(object):
def __init__(self,
status_code='200',
response_phrase="OK", body="",
headers=HeaderDict({'content-type': 'text/html'})
):
attrsfromdict(locals())
def __str__(self):
return simplerepr(self)
def __repr__(self):
return self.__str__()
def status(self):
return ' '.join([self.status_code, self.response_phrase])

class Dispatcher(object):
"""
The Dispatcher maintains an internal list of regexes and controllers.
The Dispatcher accepts strings, and tries to match in turn against
the regexes. As soon as a match is found, the corresponding controller is
invoked.
"""
def __init__(self, regex_app_tuples):
self.dispatch_list = []
for k, v in regex_app_tuples:
p = re.compile(k)
self.dispatch_list.append((p, v))

def dispatch(self, request):
"""
Expects to call a Controller's instance method GET or POST
with the request and the groups obtained from the regex
as arguments.
"""
path_info = request.environ.get('PATH_INFO', '')
method = request.environ['REQUEST_METHOD']

for pat, app in self.dispatch_list:
mo = pat.match(path_info)
if mo != None:
args = [request]
                args.extend([i for i in mo.groups()])
if method == 'GET':
return app.GET(*args)
elif method =='POST':
return app.POST(*args)
raise NoMatchingControllerException, "No match for %s" % path_info

class WhiskyApp(object):
def __init__(self, dispatcher):
self.dispatcher = dispatcher
def __call__(self, environ, start_response):
request = Request(environ)
response = self.dispatcher.dispatch(request)
start_response(response.status(), response.headers.items())
return [response.body]

class NoddyController(object):
def GET(self, request, id):
r = Response(body="Noddy got %s" % id)
return r

class BigEarsController(object):
def __init__(self):
print "in init"
def GET(self, request, arg1, arg2):
r = Response(body="Big Ears got %s and %s" % (arg1, arg2))
return r

if __name__ == '__main__':
from paste import httpserver
dispatch_list = [
(r'/noddy/(\d+)/?$', NoddyController()),
(r'/bigears/(.*?)/(\d+)/?$', BigEarsController())
]
dispatcher = Dispatcher(dispatch_list)
app = WhiskyApp(dispatcher)
httpserver.serve(app, host='127.0.0.1', port='8080')

That’s pretty much all I need, to be honest. I’m happy using Beaker for sessions, and I’ll probably pull in cookie stuff from Paste. I bodged up an Etag cache manager for Burble, which I want to integrate. I want to write a nice base class for controllers. And that’s it. Whee!

no comments

Tags: python ~ burble ~ wsgi ~ paste ~ programming ~ web development

Shelly Powers: the Parable of the Languages

Thursday, October 11 2007

I love the ending.

no comments

Tags: programming ~ xml ~ linky ~ funny

On “helpful” frameworks

Thursday, October 11 2007

My colleague Christine just emitted this gem:

“Layers of cake are better than layers of abstraction.”

Indeed.

no comments

Tags: software ~ programming

Making life easy on yourself in Python with quick and dirty __repr__

Wednesday, October 10 2007

When I write Python, sadly my code often has bugs. One way or another I always end up dumping out variables to see what’s in them. If those variables refer to objects, Python’s default representation is not very helpful:

>>> class C:
... def __init__(self, arg):
... self.member = arg
...
>>> obj = C('foo')
>>> print obj
<__main__.C instance at 0xb7d363ec>

It would be nice to have something that lets you know what’s inside that object

Python lets you help yourself. If you define __str__ and __repr__ methods on your classes,  then they will have nice string representations when you want to print them, or when a debugger inspects them for you. This can be a bit laborious though, especially if you want to meet the requirement (see the docs) that __repr__ should  return “a valid Python expression that could be used to recreate an object with the same value.” And as a good Python citizen, of course you want to do that.

I have a solution that won’t always be the right thing, but saves a lot of typing for many simple classes. I find that often I write classes that are really glorified dictionaries with a few helper methods. In my day job, where we mostly write Java, we work where possible with POJOs or beans. This seems like a Pythonic way to emulate and improve on that idiom.

def simplerepr(obj):
d = obj.__dict__
members = ', '.join([n + '=' + v.__repr__() for n,v in d.iteritems()])
return '%s(%s)' % (obj.__class__.__name__, members)
def attrsfromdict(d):
"""From Python cookbook s6.18 p 280"""
self = d.pop('self')
for n,v in d.iteritems():
setattr(self, n, v)

class Foo(object):
def __init__(self, arg1="wstfgl", arg2="sneeb!"):
attrsfromdict(locals())
def __str__(self):
return simplerepr(self)
def __repr__(self):
return self.__str__()

>>> f = Foo('quux')
>>> f
Foo(arg1='quux', arg2='sneeb!')
>>> g = Foo(arg1='quux', arg2='sneeb!')
>>> g
Foo(arg1='quux', arg2='sneeb!')

This was inspired by and builds on a recipe in the Python Cookbook.

no comments

Tags: python ~ programming ~ good practice

Required reading for programmers

Wednesday, October 10 2007

Today’s xkcd is particularly good.

no comments

Tags: programming ~ funny

Recent comments

Rendered at 2010-08-01 22:24:28