Arcane Sentiment: It's not homoiconicity

Lisp, say inumerable introductions, is a homoiconic language: its programs are represented in the same data structure they operate on.

I'm not bringing this up to complain about how often introductory material is copied uncritically. (I suspect that's because introductions tend to be conservative, and what's more conservative than repeating something other people have said before?) I'm bringing it up because it is so often repeated, even though everyone knows it is wrong.

It's commonplace to point out that C is homoiconic, and so is Perl, and every language that can process text. They can generate and manipulate their own code. It is often overlooked that this is actually useful, even in C. It's possible for a C program to generate and compile C code - this is how Goo is implemented. And of course many languages have self-hosted REPLs and other tools. Homoiconicity matters.

So why is Lisp better for metaprogramming? It's because the representation of code is a convenient one. Lisp trees are terse and closely reflect the abstract structure of programs, so they are easy to work with. This convenience isn't a simple, formalizable quality (although attempts to do so could produce some interesting papers). Like so many questions in programming languages, convenience is not a matter of adding some interesting new power, but of minimizing the amount of work spent doing uninteresting things.

When abstract code is hard to work with, it loses its metaprogramming benefits. Scheme, for instance, virtually requires a more elaborate representation of code in order to implement syntax-rules. Many Schemes expose that representation, but hacking it hasn't caught on, because it's quite inconvenient. This isn't a necessary problem with richer abstract syntax; it's just that this one is done badly. (In retrospect, it's a bad sign when you have an important type called syntax-object.) I think there could be better alternatives to S-expressions that preserve their all-important convenience.

By the way, a language needn't be homoiconic to benefit from a good representation of programs. Imagine a language whose domain does not include its programs, but which is hosted in another language which does. The host language can manipulate the trees, so it can metaprogram the embedded language - easily, if the representation is a good one. This is the case for many DSLs embedded in Lisp. They benefit from their convenient representation, even though they themselves can't manipulate it.

Please, Lispers, stop telling people what's special about Lisp is homoiconicity! If you want another fancy Greek word, you could speak of euiconicity. But let's not. Much as text is an inconvenient representation for programs, these big words are inconvenient for simple concepts. Just say S-expressions make metaprogramming convenient because they match the abstract structure. That's easier to understand, and as a bonus it transfers well to other data structure questions.

3 comments:

Anonymous25 May 2009 at 14:26
"I think there could be better alternatives to S-expressions that preserve their all-important convenience."

That's why I don't think ignoring object orientation as a 'special case of closures' is a good idea. Because the answer is literally staring them in the face: make the syntax type a subclass of s expressions. Let it carry more information, but let car, cdr, cons, list, quasiquote, etc. all work with it as transparently as they do with normal lists!
Sean B. Palmer7 February 2010 at 11:57
Or, you know, you could use the same source that you'd have to look up S-expressions, metaprogramming, and abstract structure in to look up homoiconicity...
Unknown11 December 2010 at 15:51
What many people fail to understand is that homoiconicity isn't about manipulating code, per se, but rather manipulating the syntax. The difference here is that code is typically thought of as a stream of characters (i.e., a string), whereas syntax is a higher level. The reason that Lisps are homoiconic is because the code, by nesting s-expressions, mirrors the abstract syntax tree of the program. In many other languages, you need to both scan and parse the code to form a syntax tree, but in Lisps you can get away with just scanning, since the program is already represented as a tree.

If the language's syntax is represented as a sequence of characters, and the language has a string type to manipulate strings, then it'd be homoiconic.

Languages like C and Perl can operate on strings, but the strings themselves don't represent the syntax of the program. They'd have to be parsed first, and that's why they're not homoiconic.

In short, Lisps are said to be homoiconic because their programs are represented as a syntax tree and Lisps feature those trees as a datatype.

It's OK to comment on old posts.