May 2009 (1)
April 2009 (1)
March 2009 (4)
January 2009 (3)
November 2008 (2)
October 2008 (2)
September 2008 (1)
August 2008 (5)
July 2008 (3)
June 2008 (1)
May 2008 (5)
April 2008 (8)
March 2008 (3)
February 2008 (1)
January 2008 (2)
December 2007 (2)
November 2007 (4)
October 2007 (17)
September 2007 (9)
Elements or attributes, the eternal question
Sunday, May 10 2009
Nice take on when to use attributes in XML here.
Tags: xml ~ linky
Perils of late binding in Python
Thursday, April 30 2009
So I haven’t been doing a lot of Python recently, and I got tripped up by something that in retrospect should have been obvious.
You can write code that references an undefined thingy, and Python won’t complain until you actually run the code and try to access the thingy.
Eg:
>>> def f():
... z()
...
>>> f()
Traceback (most recent call last):
File "", line 1, in
File "", line 2, in f
NameError: global name 'z' is not defined
Which kind of sucks when you forgot to write a test for that code, and you get a runtime exception.
Tags: pythonWednesday, March 25 2009
Dear WhitePages,
Your “suggested locations” dropdown and region searching are broken.
Eg, I want to find “City GPs” in Wellington. As I type “Wellington”, the first suggested location is “Wellington Central”. Aha! I think. They are indeed in Central Wellington.
However, this search yields no results. Neither does “Wellington City” or “Wellington CBD”, even though they are at the city end of Willis St and definitely would be in both those regions. Only plain “Wellington” gives me a result.
To add insult to injury, the suggestion when there are no results is to “refine my search”. But “Wellington CBD” IS more refined than “Wellington”.
At this point, your suggested locations feature is actually more useless and more annoying than if it didn’t exist at all. Either get rid of it, or make your subcategories work as expected.
Yours sincerely
Stephen
Tags: usability ~ catalyst ~ misfeatureSunday, March 22 2009
The other day I was reading Ryan Tomayko’s blog and I got inspired.
Ryan wrote the Kid templating library which drives this blog, and is quite the Python/Ruby hacker. He also has a very minimalist design. Its principles are outlined here.
With hypertext, the information itself is the interface. The content takes center stage while the chrome and tool areas are placed in the back-seat. This inversion of priorities has created as big a leap in interface innovation as the first graphical user interfaces did to the terminal based applications before them.
And yet, these fine attributes of hypertext are regularly subverted. Since the web’s inception and subsequent boom, people have been trying to get around hypertext’s “limitations” as an interface medium: first with Java Applets and Active X controls, later with Flash sites, and today with Rich Internet Application (RIA) platforms. There was a time when sites were authored with the goal of preventing the vertical scroll-bar from ever appearing! The goal is always the same: invert the web’s superior content-oriented interface back to the GUI era and allow for the types of administrative debris so common and accepted in desktop applications.
I have applied them over on my other channel. (I also made a bunch of other improvements, like per-tag RSS feeds, and better 404 handling.)
I often have rude things to say about other people’s usability, so it feels good to get my own house in order. I am interested though in whether there such a thing as best practice design for blogs. For example, are “recent comments” widgets useful? Should you have whole articles rather than excerpts on your home page, and if so, how many? I don’t know, but I’d like to.
Naturally, this blog is still untouched and looks like pus; in fact owing to changes made for the other channel, it’s worse than before. This will not be the case for long.
Tags: usability ~ burble ~ ryan tomayko ~ catalystTuesday, March 03 2009
Had a burst of hacking over the weekend, and one of the outcomes was the realisation that I have a few practises that could be usefully put into a template for new scripts.
So: here is my current starting point for any new script.
#!/usr/bin/pythonTags: python
# -*- coding: utf-8 -*-
from optparse import OptionParser
def _test():
import doctest
doctest.testmod()
def _profile_main(filename):
import cProfile, pstats
prof = cProfile.Profile()
ctx = """_main(filename)"""
prof = prof.runctx(ctx, globals(), locals())
stats = pstats.Stats(prof)
stats.sort_stats("time")
stats.print_stats(10)
def _blurt(s):
pass
def _main(filename):
pass
if __name__ == "__main__":
usage = "usage: %prog [options]"
parser = OptionParser(usage=usage)
parser.add_option('--profile', '-P',
help = "Print out profiling stats",
action = 'store_true')
parser.add_option('--test', '-t',
help ='Run doctests',
action = 'store_true')
parser.add_option('--verbose', '-v',
help ='print debugging output',
action = 'store_true')
(options, args) = parser.parse_args()
# assign non-flag arguments here
# filename = args[0]
def really_blurt(s):
print s
if options.verbose:
_blurt = really_blurt
if options.profile:
_profile_main(filename)
exit()
if options.test:
_blurt = really_blurt
_test()
exit()
_main()
Using Gnome Do’s Docky view with dual monitors
Tuesday, March 03 2009
Gnome Do offers a thing called “Docky” which is somewhat like the Mac OS X Dock. I’ve become quite fond of it.
Docky has an autohide mode, so that it will only appear when your mouse goes below the bottom edge of the screen.
I have dual monitors at home and at work, and I’m afraid that if auto-hide is on, Docky disappears and won’t come back, except for the odd flicker. This is a problem, because the only easy way to toggle auto-hide mode is by right-clicking on Docky.
I realised that this setting was probably in gconf. It is. You can use gconf-editor to find Gnome Do’s settings and tweak Docky autohide there. Problem solved.
Also, bug reported.
Tags: gnome do ~ docky ~ autohide ~ dual monitorsKiwibank’s KeepSafe feature, and ETAOIN SHRDLU
Friday, January 30 2009
Kiwibank have added a new step to their login process, called KeepSafe.
In this step, user knows the answer to a small range of questions they have selected, like “Where were you born” or “What’s your pet’s name?” And when they log in they are prompted with the questions and asked to select random letters from the answer (eg to select the 1st and 5th letters).
The aim is to defeat keyloggers. The user uses their mouse to select letters from a display of the alphabet, and they never type the whole answer, so an attacker who logged mouse clicks would have to capture multiple logins.
My guess is that password-stealing malware is common enough now that it poses a significant risk to banks.
Unfortunately for users, this system is quite inconvenient. It involves an unaccustomed degree of mental and physical dexterity to select the correct letters. It also is unaccessible for people with text only browsers, or who have Javascript turned off (ironically, the very people least likely to be vulnerable to malware).
A friend suggested that their Keepsafe answer would be “Keepsafe is bloody annoying”. This inspired me. I realise now that the savvier user will set all their Keepsafe answers to AAAAAAAAAAAA.
I also wonder whether it wouldn’t be reasonably easy to guess Keepsafe answers. If I were a wily hacker, I’d use my dictionary to compile stats of the most common letters in English words, by word length and position in the word. Let’s see.
#!/usr/bin/python
import string
f = file('/usr/share/dict/words')
counts = [{'all':0},{'all':0},{'all':0},{'all':0},{'all':0},{'all':0}]
# snag all 6 letter words
for line in [l.lower().strip() for l in f.readlines() if len(l) == 7]:
for i in range(6):
# count the letters in position i
letter = line[i]
counts[i][letter] = counts[i].get(letter, 0) + 1
# keep a total so we can compute a percentage easily
counts[i]['all'] = counts[i]['all'] + 1
for pos in range(6):
print "Position %d" % (pos + 1)
tops = {}
for letter in string.lowercase:
tops[letter] = counts[pos].get(letter,0)*100/counts[pos]['all']
# take the top ten most frequent letters
for pair in sorted(tops.iteritems(), key=lambda(k,v):(v,k), reverse=True)[0:9]:
print "%s %02.2f%%" % (pair[0], pair[1]),
Results:
Position 1
s 11.00% c 7.00% b 7.00% p 6.00% m 6.00% t 5.00% r 5.00% d 5.00% a 5.00%
Position 2
a 18.00% o 15.00% e 13.00% i 10.00% u 9.00% r 7.00% l 5.00% n 3.00% h 3.00%
Position 3
r 10.00% a 9.00% n 8.00% l 7.00% s 6.00% o 6.00% i 6.00% t 5.00% e 5.00%
Position 4
i 10.00% e 10.00% t 8.00% a 7.00% n 6.00% l 6.00% o 5.00% s 4.00% r 4.00%
Position 5
e 27.00% n 7.00% l 6.00% a 5.00% t 4.00% r 4.00% o 4.00% i 4.00% u 2.00%
Position 6
s 36.00% d 11.00% e 9.00% r 8.00% y 6.00% n 5.00% t 4.00% g 3.00% a 3.00%
The distribution of letters is quite skewed, and you get three goes with Keepsafe, so a patient intruder could probably guess a substantial minority of answers.
I’m not sure what the end of this arms race will be.
Tags: security ~ kiwibank ~ pythonA letter to Steven Joyce about S92A of the Copyright Amendment Act
Wednesday, January 28 2009
Dear Mr Joyce
I am writing to you in the hope that you will take action to prevent s92a of the Copyright Amendment Act from taking effect.
The law in question suffers from the following problems:
– it reverses the normal presumption of innocence
– it imposes no penalty for improper accusations
– it provides no easy remedy for people wrongly accused to have their access to an essential service restored
– it is likely to punish people who have done no wrong (for example, parents of teenagers, managers of organisations with careless employees, victims of viruses, flatmates who share an internet connection, etc).
In other jurisdictions, especially the US, recording industry bodies have been both aggressive and inaccurate in their attempts to pursue file sharers. In Australia they are suing ISPs who ask them to verify their accusations. In the UK, a parallel law has already been ruled out as being unworkable from the get-go.
Our government officials are on record as saying that laws against fraud will be sufficient to deter false accusations. This is clearly not so. The recording industry, unlike the typical citizen, is well-funded and well-advised by lawyers. It will be difficult for the police or for a private citizen to prove criminal intent for an incorrect takedown notice.
This law is ill-conceived, attacks the rights of ordinary citizens, and poses a real threat to the livelihood of anyone who depends on a working internet connection.
I look forward to hearing that this legislation from the previous government will be reviewed by the current one in the common sense manner prized by the National party.
Yours sincerely
Stephen Judd
Painless html parsing with lxml
Wednesday, January 14 2009
I am working on a Ruminator 2.0. I intend to parse full stories, not just the summaries that appear in RSS.
So I’ve been investigating my options for HTML parsing. There are quite a few options for Python, with varying degrees of speed, flexibility, and tolerance for broken markup.
After a rapturous writeup from Ian Bicking, I thought I’d try lxml, which is a Pythonic wrapper around Gnome’s libxml and libxlst libraries. I’m sold. You can even use CSS selectors if, just like jQuery! (I like not having too much loaded into my head at once).
Suppose you want to scrape a news story (for statistical analysis, not copyright infringement) from the NZ Herald:
>>> from lxml.html import parse
>>> doc = parse('http://www.nzherald.co.nz/nz/news/article.cfm?c_id=1&objectid=10551829&ref=rss&pnum=0').getroot()
>>> paras = doc.cssselect('div.article-holder p')
>>> for p in paras:
... print p.text_content()
Easy peasy.
Tags: python ~ lxml ~ the ruminatorIssues in authentication systems
Friday, November 14 2008
I have my own issues with biometric authentication systems, but this is not one I had foreseen.
Tags: security ~ authentication ~ biometricsTo Whom it May concern: It has come to the attention of Recognition Systems that some people have a particular concern about using our hand scanners which relates to their religious beliefs. The concern revolves around the detection or placement of what is described in the Scriptures as “the mark of the Beast.”
Rendered at 2009-07-04 20:19:29