The Elements

I was home sick yesterday. So annoying. Getting up and doing something in the darkroom was out of the question. I had already spent three feverish hours the day before, confusedly producing unspeakable garbage. I decided that if I was going to produce any more, at least it must be harmlessly and inexpensively perpetrated on the computer, in the comfort of this chair by the fire, with the cat on my lap.

Well, for one thing, my daughter (seven years old) walks around singing Tom Lehrer songs. She’s a great fan of Poisoning Pigeons in the Park (awesome) and The Masochism Tango (awesome?). She also likes The Elements, but the words are a bit harder to master. Some weeks ago she began by singing, “There’s arsenic, aluminum, selenium, and bubble gum!”, and her interest (and accuracy) has only grown since then. I can hear her in the other room when she sings about thulium and thallium, or gold and protactinium. It warms the heart on these cold winter days.

And as for me, well, at my age, I still don’t know the lyrics for a song I’ve loved since I was not much older than she. When I was a kid, my best friend Steven and I even went trick-or-treating at the Tom Lehrer’s house. (Lehrer has Steven’s neighbor.) I remember how we stood dumb on his front porch when the Great Man opened the door, so in awe were we of this genius whose records we listened to over and over and over, lying on the floor in front of the record player, surrounded by legos and mineral specimens. I remember him just looking at us, smiling through his thick black glasses, while we stood there, clutching our halloween bags, wanting to say something, but unable to speak. He was just too awesome, in any sense of the word you like.

So there it is. Clearly, we need a way to learn the Elements song. So here’s my feeble attempt: The Elements

It plays Sir Arthur Sullivan’s music while flashing Tom Lehrer’s lyrics and highlighting the elements in the periodic table as they are mentioned. It looks like this as you go:

For pedagogical purposes, you can use the following key controls:

  • Left arrow: Back up a bit
  • Right arrow: Go forward a bit
  • Up arrow: Go faster!
  • Down arrow: Go slower. Oh well.
  • Space bar: pause to breathe / resume the madness

This was my first try at in-browser audio, which I’ve been neglecting for way too long. I found the brilliant supersynth “game”, from which I have cribbed unapologetically.

The core of the program uses audiolib.js and music.js.

Please fork the code and make it better. And if you are an audio person, special thanks in advance for digging in and telling me what to do better!

I’ll try to get this hosted somewhere soon so we can all sing along.

Juanita's Magic Blenders. Function Composition with a 7-Year-Old.

Emmie marched into my room on Saturday morning, refuting my theory that I’d be able to sleep in, and bringing me a number of arithmetic problems to solve. Actually, there was a first course of spelling problems. She’s very particular about dictionary spelling. But after several series of words with weird Anglo consonant clusters in them, she switched to arithmetic. Most of her problems involved addition and subtraction. Her last which included a bit of multiplication and was phrased a bit like her second-grade homework problems are.

This got me wondering what would happen if we tried functions? I wanted to introduce her to functions, but I didn’t want to come across as trying to “teach” her anything. Kids are suspicious of people who try to tell them or show them things in a scholastic way, and well they should be. Why should they want to be told to learn what somebody else already figured out? Where’s the fun in that? No, if we were going to embark on new territory, she was going to get to discover it all herself. I chose to play to her love of narrative and art, avoid anything that looked like “instruction,” and see what would happen.

I said to her, “Ok, now that it’s my turn. Now, I’m going to take this in a slightly different direction. I want to introduce you to someone I don’t think you’ve met. Let me introduce Juanita. Juanita the cat. Juanita is a very clever cat. In fact, she has started her own business making blenders. But they’re not just any old blenders; they are magic blenders.” As I speak, I am drawing a cat head next to some blenders. Emmie likes the cat head, but says my blenders look like toilets.

“Thank you. Yes, Juanita’s blenders,” I continue, “aren’t for food; they’re for numbers. You put one number in and you get a different number out. But the interesting thing is that each blender always does the same thing to any number. So for instance, this blender is her Model ‘f’. You put in a 2 and you get out a 4; you put in a five and you get out a 10; you – ” but she interrupted me. “Wait! Stop!” she said, “I know what it does. You put in a number and you get it back times two.” And she proceeded to put in a 6 to prove it.

“Exactly. But you know what? Juanita wasn’t going to stop there. She made a blender called ‘g’. And what do you think that one did? Well, you put in, say, 16, and you get back 8.” Again, she interrupted me. “Och. That’s so easy. ‘g’ gives you half of what you put in.” And she put 15 and to my surprise arrived correctly at 7 ½. She qickly put in a nice even 10 in to reassure herself. Emmie is generally a bit suspicious of odd numbers and their fractional appendages.

This was fun. We were actually writing functions together. I asked Emmie what blender she thought Juanita would make next. She said, “She makes a blender called ‘h’.” “What does ‘h’ do?” I asked. “When you put a number into ‘h’, you get half back plus ½.” (I guess she wanted to deal with those weirdo fractions after all.) And she proceeded to draw a much more expensive-looking blender. I said I wanted to put in a 9. Emmie worked it out and was pleasantly surprised to find that she could put an odd number into a function involving division and still get a whole number as a result. And she was amused that even numbers, once so stolidly coherent, now looked very silly and awkward with their fractions sticking out. She computed h(9), h(8), h(10), and h(14).

At this point a hand wielding an axe threatened to decapitate the snow-kitty. I had to produce a bird carrying a steel slab in its beak to intervene.

I continued with my story about Juanita. “You know, Juanita, like most cats, is very curious. She was wondering what would happen if she put her blenders side-by-side so that when she puts a number in one, it would get chewed up, fly out into the next blender, get chewed up again, and come out again. So here, you see, she has put her model ‘f’ next to the ‘g’. What do you think will happen if she tosses a 5 into the ‘f’?” And so as I lay there with a small pyramid of stuffed bears and cats that Emmie had absent-mindedly been piling on my head, she dove into function composition, both with f of g, and g of f. And as a bonus, once she figured out that f of g and g of f (using Juanita’s models) cancel each other out, she stopped bothering to compute the intermediate values. She objected irritably that it was “so easy” to see that if you put a number into ‘f’, and that goes into ‘g’, you get the same number you started with.

Now for the problem of a notational system. “So now Juanita wants to sell her blenders. She wants to make a catalogue that explains what each blender does. But she can’t just give a lot of examples and hope her buyers figure it out; she wants to have a clear way of showing what will happen when she puts any number into it. So she needs to invent some way of saying ‘any number’. What do you think she should do?” “I don’t know.” “Well, Juanita like froggies. So she is going to use a froggie to stand for any number. Remember, they’re not real froggies; a froggy means ‘any number.’” And we started to draw the catalog. I think I ruined it already at ‘h’, because I didn’t want the poor froggie’s head to get severed, so I drew two frogs going in. Which works, but it complicates things. I guess emotional batrachophilia and math don’t mix. I should have picked cupcakes.

At this point, Emmie took over again and started making up problems for me using Juanita’s blenders. She drew Mrs. B., a cranky-looking, school-marmish old cat with dress unexpectedly covered with hearts. “If you use Juanita’s blender f and put in 80 something very interesting happens,” encouraged the stoic Mrs. B. So I followed her instructions and put 80 into f. “Now try putting 80 into g!” exhorted the aged feline. I obeyed, carefully working it out and checking my work with Emmie.

Now there came Mathy (which I was told is pronounced Matty, short for Matthew), husband of Mrs. B. He was important enough to land on his own new sheet of paper, whence he proposed, “If you put fifty in Juanita’s blender g you will have to work with halves again! try it!” 50 is such a big number. I had to ask Emmie for her help on that one.

Lith Printing

Having read Tim Rudman’s article on lith printing, and seen the author’s images, I had to try it.

The main difference between regular processes and lith is that, whereas in normal B&W development the developer becomes exhausted sooner in the blacks, giving the midtones and highlights a chance to come out, lith accelerates into a kind of chain reaction in the shadows; the more the developer has to work, the more exicted it gets, and the faster the darks come out. It’s nice to work with a process that’s simple and yet completely backwards from what you have been practicing.

To develop a lith print, then, you have to give the highlights a fighting chance to catch up with the rapidly developing blacks. You do this on two fronts. First, you overexpose the print by two or three stops (do a regular test print to find out and then go from there). Second, you use a highly diluted developer to slow things down. Bonus points for re-using nearly spent developer from the day before (“old brown”) as, say, a quarter of your developer.

Don’t bother setting your timer for two minutes. Put on some music and pour a good drink instead. Watch the print develop under normal safelight, and when it’s ready (and it could be 15 or 20 minutes), quickly snatch it out and drop it in the stop. People say this process is repeatable, but I don’t see how. The developer gets exhausted so quickly, and it’s hard to know what the real potential of your “old brown” portion is. (And I really do recommend re-using some old brown; I don’t know what’s going on, but you’ll see the difference right away.)

The results are shimmering and glistening highlights with crushed blacks. It’s contrasty, but it glows like it’s on fire.

I’m using Fotospeed LD20 lith developer and Fomatone 131 glossy paper. If you get into this, you’ll quickly find yourself obsessing over papers. They all behave differently. There are some awesome Foma papers. Ilford work fine. The glory days of Kodak lith papers are long gone. I can only read about them and envy people who have some of that paper in their fridges.

Otherwise, these are normal photographs. Normally exposed on 400TX, developed in XTol diluted 1:3.

Above, toned in gold chloride to bring out some blues.

Learning to Write C++ Modules for Node.JS/V8

It’s what everyone’s been writing about for so long, but sometimes you have to do a thing yourself to really get the difference. Node.JS JavaScript running on V8 is about as fast as native C/C++, and Node.JS C++ modules running asynchronously on the V8 thread pool are really, really fast. You knew this; I knew this. But I had to prove it to myself anyway.

I was laid low by a flu a week ago, and in my delirium, I thought I should at least be doing something useful, like learning how to write native C++ modules for Node.JS. It turns out this isn’t as scary as I thought it would be. And I’m not even a C++ programmer. (Though this is making me want to learn more.)

I started with Paul Querna’s tutorial, which is the best introduction I have yet found to the workings of V8, and which provides template code for both synchronous and async methods. (Note to non-C++ programmers: Everything you need is in Paul’s post. Read and re-read until you get it.)

My project is node-rot13. It performs ROT13 substitution on an input text. Strap on your seatbelts.

I ensured that, although ROT13 is a simple ascii cipher, my library should nevertheless work on Unicode input. So even though the Unicode snowman enciphers to himself, you can still use him as a plaintext. I can hear your sigh or relief from here.

I wrote a simple CommonJS implementation next to the C++ sync and async implementation.

You can use it like so:

% node
> var rot13 = require("rot13")
> var r = new rot13.Rot13()
> r.rotate("Attack at dawn!")
'Nggnpx ng qnja!'

or:

> r.rotateAsync("Attack at dawn!", function(r) { console.log("rot13: " + r)} )
rot13: Nggnpx ng qnja!

Now we’re cooking with gas!

As part of the test suite, I encode the first book of Vergil’s Aeneid 100 times over with both the JS and the two C++ implementations. The results are instructive. (NB, I give the JavaScript version one preliminary run so it can JIT.)

On my 2GHz Core2 Duo MacBook:

  • Average js run: 2.77 milliseconds to encode Aeneid I 100 times
  • Average cpp run: 2.59
  • Average cpp async run: 1.23

Lessons learned:

  1. Write a quick and dirty JavaScript function, and it JITs to be about as fast as C/C++. Wow. V8 rules.
  2. Write a C++ module that does its work asynchronously on the V8 thread pool and it’s twice as fast. Wow. V8 rules.

If you have ROT13 needs are are feeling lazy, you can npm install rot13 to get this module.

Many thanks to my friends Adam Ferrall, Joe Ardent, and Nick Porcino for answering this C++ newbie’s questions.

Porter stemmer for node.js

Martin Porter’s stemming algorithm is a classic for information retrieval applications. Now you can use it easily in node.js applications.

His algorithm, along with a ton of translations, is here.

[Update: the code is now linked directly from Dr Porter’s site]

I’ve put a node-friendly version with all his test cases on github here:

https://github.com/jedp/porter-stemmer

And it’s available via npm:

npm install porter-stemmer

Example:

> var stemmer = require("./porter").memoizingStemmer;
> stemmer("winning");
'win'

For the most part, this is not my translation or the algorithm. I’ve tweaked the JS a little and added a memoizer, but overall the credit for the JS translation goes to Christopher McKenzie and “Andargor”.

A Python Logging Handler for Redis Pub/Sub

This is a little python log handler that lets you stream your log data over a redis pub/sub channel, so you can monitor your system in real time from any redis client.

You can get it here: https://github.com/jedp/python-redis-log

Update (May 28, 2011): You can now easy_install python-redis-log

Note that this doesn’t actually save anything to redis – it just uses redis as a publication system. So be sure you stash your logs some how so you can look at them later.

I like to use Andrei Savu’s MongoDB Handler.

If you haven’t used it before, you can read up on Redis pub/sub here.

Usage

Obviously, be sure redis-server is running.

To see what’s happening, you can run redis-client and issue the following command:

subscribe log

Now, run the python logger and watch the redis shell:

>>> from redislog import handlers, logger
>>> l = logger.RedisLogger("my.app")
>>> l.addHandler(handlers.RedisHandler.to("log"))
>>> l.info("I like pie!")

If you’re watching the redis-cli shell, you should see output immediately. If you don’t, something’s wrong. Check that you’re sending the log to() the exact channel you’re subscribed to.

Tracebacks can be sent using the exc_info flag:

>>> def crash():
...     try:
...         1/0
...     except Exception, e:
...         l.error(e, exc_info=True)
...
>>> crash()

If you get the json data that produced and loads() it, you should have something like the following:

{'args': [],
 'exc_info': 'Traceback (most recent call last):\n  File "", line 3, in crash\nZeroDivisionError: integer division or modulo by zero',
 'filename': '',
 'funcname': 'crash',
 'hostname': 'smoothie.local',
 'level': 'error',
 'line_no': 5,
 'msg': 'integer division or modulo by zero',
 'name': 'my.logger',
 'time': '2011-05-26T07:04:42.532842',
 'username': 'jed'}

Future Work

I intend to add an example node.js console to monitor the logs.

I am also interested in mapping the python logging names (foo.bar.baz) to redis channels that would enable you to listen to arbitrary applications, or subscribe to a collection using the redis wildcard subscriptions.

I hope this is useful. Please drop me a line if you have any suggestions.

Talking to Strange Computers with Node.js and Jison

At work, I’ve been using Node.JS to make a bridge between a proprietary, third-party application and our internal tools. This application has the cool ability to receive insructions in its native language over a network socket. Its language is however not widely known or used, and so while it would be awesome to use its network control feature, it would be preferable to do it in a more familiar langauge (like, say, Python).

Most of this problem is solved by using Node.JS to broker client connections to the application, and Jison to interpret the application’s output and convert it to javascript. From there, it’s a snap to convert the values to json, and rehydrate them into live Python objects on the other side.

This is my first experience with Jison, and it’s so awesome, I thought I should share my happy experiences. Jison is the work of Zach Carter. It will probably be familiar if I mention that it’s what compiles CoffeeScript to JavaScript. It is completely awesome, and if I ever meet Zach, I’m going to buy him a pizza.

I’m just going to focus on the part of translating the application’s expression language into JavaScript. There’s a lot more to the overall problem that I hope to be able to get into later.

And rather than continue calling the application “the application,” I’m henceforth going to call it Steve.

The Goal

We want to take the Steve’s output and convert it to javascript. For example, here’s how Steve gives me a list of floats:

float[] => float[] {1.0, 1.0, 2.0, 3.0, 5.0}

It’s saying, “Your function, calling for a list of floats, returned a list of floats, namely 1.0, 1.0, 2.0, 3.0, and 5.0.”

I just want to convert that to the following JavaScript list:

[1, 2, 3, 4, 5]

I don’t care about the type declarations in this case. I’m going to throw them away, because JavaScript figures out types dynamically.

The Process

To do this, I want to figure out the grammar that Steve uses. I don’t have access to the actual grammar rules of Steve’s language, so I’m going to have to guess.

I’m going to poke at Steve and get as many different responses as I can. Each time I find a response my program can’t interpret, I’m going to make a unit test case for it. Once all my tests pass, and I can’t find any other types of response Steve might give, I win.

Parsing with Jison

Jison is named after Bison, which is part of the GNU toolset (along with Flex), which are named after the Unix tools Lex and Yacc. Lex is for lexical analysis, i.e., figuring out where words begin and end, and how to break up the pieces of a sentence into their minimal components. Yacc, for Yet Another C Compiler, lets you declare a grammar, i.e., the rules for interpreting the meaning of a stream of tokens that Lex produced, and convert that grammar into a program in your language of choice. My language of choice is not actually C, but JavaScript, hence the use of Jison.

So like its Lex and Yacc antecendents, Jison lets you declare the lexer and the grammar.

Why not just dive in with a really simple example. The following lets you parse numerals into an integer object. That is, we can take the string “42” and turn it into 42. Not very exciting, but it exposes many of the important things we have to deal with.

Here’s a Jison file to do this:

The sections of the file are articulated by the characters %%. A single % begins a special keyword. The lexer section ends with /lex.

In the %lex section, we have some rules that explain how the program should break up the text it finds. These rules look like regular expression matches on the left. On the right is what to do when any of the rules matches.

Here, any number of digits comprise an INT. The - sign is the “–” sign. The special <> means end-of-file, and is converted to a token called EOF. I get to make these tokens up. I could just as well have called it INIGO_MONTOYA. Finally, any remaining characters, the . expression, are interpreted as TOTAL_DISASTER, which I don’t handle in any rules, and which will therefore crash the parser. In other words, no other characters should match than the ones I had previously specified.

Then there’s a precedence rule, %left UMINUS. If you were writing a calculator or something, you might have a bunch of precendence rules, so I wanted to include one. This says that the token I’m calling UMINUS has left-precedence. More on this below.

Then we have the grammar section. I happened to say %start grammar, but you could say %start winning or anything you want. Whatever you %start, that’s where the parser will begin. So here it looks at my grammar definition first.

These rules have the form name : pattern | pattern | pattern ... ;. Each pattern has an action associated with it. The parser will try to match the first pattern it can. So here, a successful grammar is one that matches either an integer followed by EOF, or jut EOF. What about integer? Where’d that come from? I made it up. I get to do that. I define it just below. You can probably guess what the return $integer and return null do.

Ok, integer. An integer is defined as either an INT token (as defined in the lexical analysis section, i.e., one or more numerals, [0-9]+), or a - sign followed by an integer with left-precendence. The super powerful thing here is that integer can refer to itself recursively.

What about the actions on the right? Those are all JavaScript with some special variables thrown in. $$ is the return value it passes along in the parser. $1, $2, $3, etc., are the first, second, third, etc., things that match. I can also match by name where I use one, like I did in $integer in the grammar section.

Finally, yytext is the text that the parser returned. So Number(yytext) means “Make a JavaScript number out of the text you found.”

Ok, that’s a simple parser of strings to numbers. Now how do you use it?

Using It

To compile the grammar, you save this file as something with the .jison suffix and run jison on it. If I save this as numfoo.jison, I do this:

$ jison numfoo.jison

This yields a numfoo.js module. It’s just sitting there, waiting to do your bidding. Use it like so:

$ node
> var parse = require('./numfoo').parse
> parse("42")
42
> parse("-42")
-42
> parse("41") + 1
42

w00t! String to integer. It’s that easy. Now, if you try to parse anything that doesn’t match the grammar, like “+42”, or something with whitespace, it will crash. We’ll deal with whitespace below.

A Fuller Example

To begin parsing the output from Steve, I want to do a few more things.

  • Ignore whitespace
  • Deal with floats, ints, strings, etc.
  • Deal with lists and extensible structures

I’m also going to ignore half of Steve’s output. Like I said, I don’t care about type declarations. (Well, not in the case of strings and ints. If Steve were returning other objects, I’d have to care and do something with them to convert them to some kind of JSON-stringifiable representation.)

Remember I said that Steve has a very particular way of expression lists of floats:

float[] => float[] { 1.0, 4.2 }

Ditto for string[], int[], etc.

Here’s a grammar that will handle those things. (Note that this is not actually the grammar I use to talk to Steve – it’s not even part of it.)

Here are a few important points by line:

6. Ignore whitespace. Find one or more whitespace characters and do nothing.

9. To find a quoted string, we discard the quotes and return the contents. Hence the use of yylen and yytext.substr. Also, as we have a compound command, squiggly brackets are needed.

22. I can put JS like console.log in there for debugging, yes.

32. In my simple type declaration, I can have either “list of type” or “type” itself. I need to match the type declarations for the parser, but I ignore the values. Hence there are no functions after the rules.

37. JS doesn’t really distinguish between int and float.

43. It’s useful to have a primitive type definition. I’ll use this to construct lists of things below.

44. This is cool. A list of primitives is one or more primitive separated by a ,. Note that the solitary primitive is returned as the single element in a list. A list of primitives is defined by pushing the last primitive onto a list of primitives. Recursively, this will result in making a list of a single primitive, and then pushing each successive one onto it. Useful pattern.

46. This is Steve-speak. All I do is return the result of primitives, which will be my comma-separated list of primitives.

47. Finally, the expression itself. It’s a list or a primitive.

Note that my grammar does nothing to test whether Steve’s output is valid Steve-speak. For instance, if Steve said string[] => int[] { 3.14 }, my grammar would not complain. Do I care? In this case, no. I trust Steve to say the right thing.

Test Cases

For a hacky, reverse-engineering job like this, having test cases is invaluable. As I go, I accumulate a test case for each result, because as my grammar grows in complexity, I need to be sure that everything still works.

You could have a module like this for use with nodeunit:

var testCase = require('nodeunit').testCase;
var parse = require('your-parser').parse;

module.exports = testCase({
  testFloatList: function(test) {
    result = parse("float[] => float[] {1.2, 2.3, 3.4}");
    test.ok(result[1] === 2.3);
    test.done();
  },

  // more tests ...

});

And so on.

Translating to Other Languages

Now that we have live JavaScript from Steve-speak, we can JSON.stringify our results and ship them off to our clients. A Python client can, say, simplejson.loads the result and think that Steve is just speaking Python. Awesome.

Further Reading

I hope this is useful. Jison has turned out to be a huge help to me. Many thanks to Zach Carter for this fantastic tool.

Auto-complete with Redis and Node

After making a Python implementation of the beautifully simple redis autocomplete code that Salvatore Sanfilippo has shared, I made this javascript version for use in Node.js.

I thought it would be fun to build this into a simple Node app to get a sense of how this approach would perform in a more realistic setting. This was so much fun, I thought I’d share it in the form of a little tutorial.

The result looks like this, with rows of tweets appearing and changing with every keystroke, and it’s responsiveness is amazing.

If you’re wondering whether Node.js and/or redis will make your work faster, this tutorial is for you. I started trying to get real work done with Node only a few months ago. In that time, my new motto has become: “If all you have is a Node.js hammer, everything just looks done.”

Looking at my git history, I can see that I put the code for this together in about five hours after work one evening, with a couple more hours of noodling the details. Five hours to write the core autocomplete module and build a real-time web app around it. Wow. Exhilarating. Thanks, Node.js.

(Edit – May 1, 2011: This is now available as an npm module: npm install redis-completer)

Text Sources

This isn’t really part of the app, so skip if you already have stuff to index.

The first thing I wanted was some data to index. Fortunately, I had already written a little tweet generator for the purpose of seeding a little microblogging app. The tweet generator (which I’ve put on github here) does a markov mashup on pairs of real twitter feeds. I’m using Kanye West + Victor Medvedev, and Lady Gaga + Martha Stewart. (Not surprisingly, Kanye West’s and Victor Medvedev’s language doesn’t overlap all that much. But when it does, the results are brilliant.)

To keep things simple, I just wrote all of these to a text file, and instructed my app to index the text file in redis if necessary. This makes it easy enough to add to the initial corpus without being bound to other applications.

Basic Structure

To build node apps, my formula is basically this:

  1. Use express for my server.

  2. Use backbone.js as my client-side MVC.

  3. Use now.js to let the two communicate in real-time.

  4. Use head.js for fast script loading.

Now.js wraps Socket.IO and provides some really handy utilities for building a message-passing system for the server and all its clients.

Design the Interface

I try to get my UI down before I write any code. I first make a jade template and css file to construct the interface. Once the basic interaction is right, I build out the backbone.js models and views, tying the frontend to the backend with now.js.

Here’s the jade template for the application (index.jade):

#application

  #search
    div Search:
    div 
      input(type='text', id='search-input')

  #tweets

When I hit a key in the search-input box, I want the client to ask the server for a bunch of matching tweets. I’ll then build a set of views in the client to display the tweets.

I’m going to attach the backbone.js views to the #application div there.

So I need a template to use to display the tweets. I put those in layout.jade. I’m being lazy by not combining layout.jade and index.jade into a single file, since this is just a single-page app. But whatever. So here’s my layout.jade. As you can see, it uses head.js to load all my javascripts, and it provides an invisible template for showing tweets.

!!!
html
  head
    title= title
    link(rel='stylesheet', href='/stylesheets/style.css')
    script(type='text/javascript', src='/javascripts/headjs/dist/head.js')
  body

    #tweet-template
      .tweet-content
        .tweet-row
          .tweet-text {{ username }}
        .tweet-row
          .tweet-text {{ text }}

    #content!= body

    script(type='text/javascript')
      head.js(
        "https://ajax.googleapis.com/ajax/libs/jquery/1.5.2/jquery.min.js",
        "/javascripts/underscore/underscore.js",
        "/javascripts/backbone/backbone.js",
        "/nowjs/now.js",
        "/javascripts/models.js"
      );

To design the css, I can take that tweet template, copy it into the index page a few times over, and fill it with dummy data.

Declare Necessary Models and Views

Fine. Now I want to get data from the server into the page. I use backbone.js to make views and to model the data. I’m going to have now.js provide the ability to load json data from the server in real time, which will be hydrated into backbone.js models on the client.

So it’s time to think about the architecture quickly.

I have an Application. The Application has a view to display it. The application displays a bunch of tweets. So there’s a Tweet model and a Tweet view. I could have a Collection of tweets to manage things, but since I’m not actually manipulating the tweets – I’m just showing them – I don’t think I need this feature. Ok then. Two models, each with a view.

Back in the index.jade page, I’ll put in some code to instantiate the application model and its view:

script(type='text/javascript')
  head.ready(function() {
    var app = new CompleterApp()
    var appView = new CompleterAppView({model: app});
  });

Now I go back to my models.js file and fill this out. I begin where I left off in the index.jade page, namely with the application model:

var CompleterApp = Backbone.Model.extend({});

That’s really it. I’m not even going to use this. I figure I might at some point have need to store application state (e.g., the name of the current user). But at this point, it’s really just boilerplate.

Now for the view. This is going to contain the logic for the interactivity of the page. I’ve got a search box the user will type into, and in response, I’ve got a list of tweets I want to show.

var CompleterAppView = Backbone.View.extend({
  model: CompleterApp,

  el: '#application',

  events: {
    'keydown #search': 'searchKeydown'
  },

  searchKeydown: function(event) {
    // Search for what the user has typed
    //
    // This event handler will get fired before the 
    // search-input box receive the character typed, 
    // so concatenate the current character onto the 
    // end of the contents of the search box.
    var text = $('#search-input').val();
    text += String.fromCharCode(event.which);

    var self = this;
    this.$('.tweet-content').remove();
    now.search(text, 10, function(err, results) {
      _.each(results, function(tuple) {
        var tweet = new Tweet(
          {username: tweetToHtml(tuple[0]),
           text: tweetToHtml(tuple[1])});
        self.addTweet(tweet);
      });

  addTweet: function(tweet) {
    var view = new TweetView({model: tweet});
    $(this.el).append(view.render().el);
  }

});

Here’s what’s going on: The view is associated with the CompleterApp model. It is bound to the element div#application. It listens for keydown events in the search div.

On search keydown, it calls searchKeyDown, which does the following:

  1. Get the existing search text + the new keystroke
  2. Remove existing content in the page
  3. Search, and for each returned tweet, add a tweet view.

Build the Server, Expose Methods to the Client

Search? Where did now.search come from? This is now.js exposing a function I wrote on the server up into the client javascript.

So it’s time to take a look at the server. In addition to the usual express.js lines, it has this at the top:

var nowjs = require('now');
var completer = require('./completer');

var app = module.exports = express.createServer();
var everyone = nowjs.initialize(app);

and this in the routing section:

// Routes

app.get('/', function(req, res){
  res.render('index', {
    title: 'node-redis autocomplete'
  });
});

// nowjs methods

everyone.now.search = function(text, count, callback) {
  completer.search(text, count, callback);
}

The completer module is my implementation of the redis autocomplete algorithm. nowjs, if you haven’t yet used it, binds to your server and gives you a variable (here called everyone) that lets you magically expose methods from the server up to the client, where they are available in the now namespace as seen above.

The function everybody.now.search is used in the ApplicationView as now.search.

Searching the Tweets

This was supposed to be about an autocompleter. But you could write your app to search for anything in any way. Here, I’m being really lazy – too lazy, actually – and returning a list of tuples. This gets sent up to the CompleterAppView, which knows how to map the tuples into Tweets.

This is not so good. The interpretation of those tuples should not be left to the view logic like that. If I ever do any more work on this, I will certainly fix that.

The full completer code is here.

Next Steps

Why yes, I have left out the harder stuff!

Now that we’re on the right track, there are some things we might want to add:

  1. A way to delete tweets so they are no longer indexed.

  2. A live search that inserts new matching elements in the result set in real time. This will require use of a Collection of Tweets after all.

  3. A way to safely cap the number of tweets that are indexed at any time so we don’t run out of memory. An ordered set of tweets scored by date could help.

And of course, we might want to bind this to a real microblogging app.

I hope this is useful to someone. Happy times with Node and Redis!

Auto-complete with Redis and Python

Salvatore Sanfilippo is always sharing cool things to do with redis. As with all things redis, it never ceases to amaze how much you can do with so little code. Thanks, Salvatore.

He posted this gist which implements an autocompleter in redis using ordered sets to represent trie data.

j4mie promptly translated the same into Python.

I’ve branched j4mie’s version and made it work for multi-word phrases. There’s a github gist here.

In his comments, j4mie extends the invitation to make his version more Pythonic. I’ve tried to do so in my fork. I hope I haven’t damaged too much in the process. I also put a command-line wrapper around it, so you can run it from the shell and play with redis interactively.

Original copies of duplicates

Digging through boxes of ancient junk, I just unearthed a collection of book return receipts from Doe Library at UC Berkeley. This brings back memories of a fun hack and an ironic lesson in the problem of authenticity and forgery.

I was a graduate student at Berkeley, working towards a doctorate in Latin (Ph.D. 2001 w00t). As a graduate student, I read, or pretended I read, lots of books. And when you return a book to the UC Berkeley library, there’s a not infinitesimal chance that they will misplace the book and tell you you’ve lost it and have now got to pay for it. For a graduate student to pay for anything is a burden; and “paying for” some rare 19th-century German volume is inconceivable.

To its credit, the library system acknowledged that it was unreliable and offered book return receipts. These were slips of paper on which you would write the call number of the book you were returning, your name, and the date. The librarian would sign and stamp the receipt as a token that you had indeed returned the book. That way, when the call came that you had failed to return a certain volume, you could say, “I’m sorry, but Mr Book Return Receipt here says you are wrong (again).”

As I said, as a graduate student, I filled out scores of these each week. It felt like a stupid pointless labor, since they already knew I had the book checked out. It was in the system. You could look it up on the VT101 terminals! I thought they should just be able to print a receipt for me right there with no signing involved by anyone. But the librarians said they had no such system and This Is The Way We Do Things.

Offended by the stupidity of this, I wrote a Perl script to dial into the library modem pool and query the database for information about all the books I had checked out. There was a little reverse engineering required for the results it sent back. I recall it was peppered with lots of non-printing characters whose purpose I didn’t understand. But pretty quickly I had something that made sense of the results and extracted the data I needed.

I then sought to write a PostScript program that would reproduce as exactly as possible the forms they provided. Here’s what one of the originals looked like:

The only important information is the call number and the patron’s name. It’s good to have the title and book author there as well in case there’s a typo in your call number, I guess.

I carefully measured this slip, trying to duplicate every detail (including the way the descenders in the p and y in ‘Copy’ cross the line of the box). Unfortunately, on my Linux box (I don’t remember what distro I was using in 1999), I did not have all the typefaces they were using, so I used what I had. Here’s what I came up with:

Awesome, I thought. All that time I would have wasted filling in these stupid forms I have now spent on something actually interesting. (I notice that these are receipts for books on metrics, an area of lingistics I find particularly awesome, in which I did some work when I should have been working on my more “marketable” dissertation instead. So this is a double transcript of my working on something I found more captivating than the things I was actually supposed to be doing.)

But to my surprise, when I presented the librarian with my auto-generated book return receipts, he did not want to accept them. He complained that they were not the real thing; that they were false copies of the true item. Furthermore, he objected that my name was printed by the laser printer, and had not been written by hand by me. Therefore, in his mind, this was a forgery and could not be used to prove with authenticity that I had returned the book I was returning, even if he signed it and stamped the date on it.

I replied that the receipts he wanted me to fill out were themselves copies of copies (which you could easily tell by the accumulation of photocopier artifacts). The library’s receipts were no closer to some Platonic ideal of a Book Return Receipt than mine. In fact, you could say they were farther away, because if anything was being reproduced, it was the data at the heart of the system. And my approach removed the possibility of human errors in transcription.

Furthermore, my name, filled in by the computer program so I wouldn’t have to do any stupid writing at all, wasn’t replacing a personal signature (and the imprint of a signature itself is a problematic event, as Derrida can tell you); instead, like everyone, I wrote my name in block capitals for legibility. The real transaction in these receipts takes place when the librarian signs and dates them. That’s the whole point.

Well, as you can see from the datestamps on the receipts above, the librarian eventually conceded to stamp my book as “returned,” but not without becoming frustrated with me and turning the conversation over to his supervisor. I don’t remember much of what she said, but don’t think she was so troubled by my copy of a copy of a copy. And I was able to use my book return receipt generator for the rest of my time at Berkeley.

I don’t know if they still use book return receipts at Berkeley. I’d be interested to hear from anyone about what they do now. I expect that 12 years later they can scan the bar code and give you a printed receipt without anyone signing anything – the physical receipt is what makes the system work – but I don’t know. I also miss those VT101 terminals.