Arcane Sentiment

Purely algebraic abstractions

Any abstraction expresses something all its instances share. Usually this is semantic: all instances have some common meaning. They represent the same values, or perform the same computation, or support the same operations. When you know a construct is an instance of a certain abstraction, you know something about what it means.

Some abstractions are different. Their instances have no meaning in common, only algebraic properties. These are purely algebraic abstractions. Such an abstraction tells what transformations are valid on its instances' expressions, but says nothing about what they mean.

The classic algebraic abstractions are (of course) those of abstract algebra: groups and rings and fields and such. They abstract the properties necessary for algebraic transformations, and nothing else. If you know that your objects and their operators form a ring, you can manipulate formulae and even prove theorems about them, without knowing anything about what they mean.

In contrast, most abstractions in computing focus on meaning, and express algebraic properties only incidentally. If you have a java.util.Collection or a java.util.Map, you know you can add and remove items, test whether they're there, and iterate over them — but do you know any algebraic properties? Even the most basic properties are broken by unusual collections like caches or Bloom filters. They're semantically legitimate collections, and their algebraic properties are unreliable because they're irrelevant.

(Algebraic abstractions are not entirely reliable either, because most of their computational incarnations don't quite satisfy their axioms. Reflection and other debugging features can often detect differences between supposedly equivalent objects. Limitations of memory and time create edge cases where expressions equivalent in denotation are different in operation. Floating-point arithmetic makes a sport of breaking nearly every algebraic property that could be expected of it. But similar problems afflict nearly all attempts to reason about abstractions; they're not specific to algebraic abstractions, and they don't make them useless. The equivalences generally hold for the properties we care to preserve, so they're correct in practice though not in theory.)

Haskell typeclasses

Most Haskell typeclasses have semantic content: Show and Num are about operations with the same meaning for all their instances; Eq expects algebraic properties (reflexivity and transitivity) but still defines the meaning of ==. There are a small but increasing number of purely algebraic typeclasses: the Prelude has Monoid, Functor, Applicative and Monad, whose instances have nothing in common but algebraic equivalences.

This is why monads are so hard to learn. Each student of Haskell asks what monads mean, and invents a variety of wrong answers (typically semantic generalizations of IO actions), because they're sure that such an important abstraction must be meaningful, and have never heard of algebraic abstraction. Eventually they learn to use monads without asking what they mean, because monads don't mean anything.

This is a sore point among Haskellers. It will get more sore, because Haskell is gaining more algebraic abstractions. Applicative is in the Prelude now!

Haskell Prime numbers

Haskell Prime is a collection of ideas for future versions of Haskell, including a proposal to generalize the numeric typeclasses by removing their semantic content, replacing Num and most of its subclasses with purely algebraic classes:

(+) :: AbelianGroup a ⇒ a → a → a
(*) :: Ring a ⇒ a → a → a
(/) :: DivisionRing a ⇒ a → a → a
mod :: EuclideanDomain a ⇒ a → a → a

(A division ring is like a field except that multiplication is not necessarily commutative.)

This makes the numeric operations maximally general, at the cost of making them meaningless. It also gives mundane code types (and type errors) that make sense to mathematicians and no one else:

factorial :: (Ring a, Ord a) ⇒ a → a
sum :: AbelianGroup a ⇒ [a] → a

(I'm not sure why + is on AbelianGroup instead of something more general like Magma. Maybe it's to comply with users' expectation that + be associative and commutative.)

This proposal brings the straightforward clarity of Monad to arithmetic, an area where Haskell has long suffered from comprehensibility bordering on practicality.

I'm not sure algebraic abstractions are always a bad idea for programming languages, but the difficulty of Monad suggests they're hazardous.

A semantic abstraction tells you what its instances mean. An algebraic abstraction only tells you what transformations preserve that meaning. That's enough for optimization, but not for understanding.

Antedating “datatype” all the way to Plankalkül

Previously I speculated that the word “datatype” might have been used in computing before 1958. In response, dvt found a precedent from 1945! It's Konrad Zuse's very early language Plankalkül (Plan Calculus). Zuse's notes pervasively use the words Angabentyp and Angabenart, without bothering to define them. Modern German uses “Daten” instead of “Angaben”, but the terms are otherwise unchanged: “Datentyp” and “Datenart”.

Plankalkül was the world's first programming language, and it begins from first principles: the only primitive type is the bit, charmingly called a “Ja-Nein-Wert” (yes-no-value). It builds everything else out of arrays and tuples. The section on datatypes begins:

Angaben und ihre Darstellung [Data and its representation]

Die auftretenden Angaben können mannigfacher Art sein. Z.B. J.-N.-Werte, Zahlen, Listen usw. [The data given can be of various types, e.g. Y-N-values, numbers, lists etc.]
[...]
Die Unterscheidung der einzelnen Angabenarten soll nun wie folgt formalisiert werden [The distinction between the various datatypes will now be be formalized as follows]:
Angaben-Strukturen [Data structures]

Unter Struktur einer Angabe wird der komponentenmäßige Aufbau einer Angabe ohne Hinblick auf die Bedeutung der einzelnen Fälle und Komponenten verstanden. [The structure of a datum is the component composition of a datum without regard to the meaning of the individual instances and components.]
Wir haben Angaben von starrer und von variabler Struktur. Wir führen nun Angabenstrukturzeichen ein, welche jeder Angabe zugeordnet sind. Diese werden mit S und einer Kennzahl bezeichnet. Die Entwicklung der zusammengesetzten Strukturen erfolgt dann durch „Strukturgleichungen“ aus einfachen (bereits definierten) Strukturen. [We have data of fixed and of variable structure. We now introduce data structure symbols, which are assigned to each datum. These are denoted by S and an ID number. The development of composite structures then follows by “structure equations” from simple (already defined) structures.]
So wird dem einfachen Ja-Nein-Wert das Strukturzeichen S0 zugeordnet. Eine Folge von n J-N-Werten hat dann die Struktur S1.n. Es gilt die Strukturgleichung: [Thus the structure symbol S0 is assigned to the simple yes-no value. Then a sequence of n yes-no values has the structure S1.n. The structural equation applies:]
S1.n = n × S0
Durch Verfolgung der Strukturgleichungen ist es jederzeit möglich, den Aufbau einer Angabe zu ermitteln, auch wenn dieser sehr kompliziert ist. [By following the structure equations, it is possible at any time to determine the composition of a datum, even when it is very complex.]

Plankalkül was never implemented (well, not until 1975), but Zuse wrote enough code in it to discover the need for generics, and duly invented them:

Wir brauchen noch „unbestimmte“ Strukturzeichen. Wollen wir z.B. andeuten, daß eine Angabe aus einer Liste von n Gliedern besteht, ohne die Struktur des Gliedes im einzelnen festzulegen, so schreiben wir: n × σ. [We still need “undefined” structure symbols. Let us suppose, for example, that a datum consists of a list of n elements, without specifying the structure of the individual elements, so we write: n × σ.]
Für σ kann dann ein beliebiges Strukturzeichen eingesetzt werden. [For σ any structure symbol can be used.]
¤ × σ Ist das allgemeinste Strukturzeichen einer Liste. (Struktur der Glieder und Zahl der Glieder offen gelassen). Is the common structure symbol of a list. (Structure of elements and number of elements left open.)
¤ × 2σ Ist die Struktur einer Paarliste, bei der die Glieder der einzelnen Paare von gleicher Struktur σ sind. Is the structure of a pair-list where the elements of each pair are of the same structure σ.
¤ × (σ, τ) Ist die Struktur einer Paarliste bei der die Vorderglieder die Struktur σ, und die Hinterglieder die Struktur τ haben. Is the structure of a pair-list where the front elements have the structure σ and the back elements have the structure τ.
2 × n × σ Ist keine Paarliste, sondern ein Paar von Listen. Is not a pair-list, but a pair of lists.

Array indexes, incidentally, are zero-based:

Es sei noch darauf aufmerksam gemacht, daß bei einer aus n Gliedern bestehenden Angabe der höchste Index der Komponenten gleich n − 1 ist, da die Komponentennumerierung mit 0 beginnt. [It should be pointed out that for a datum consisting of n elements, the highest index of the components is equal to n − 1, as the component numbering begins with 0.]

Separately from data structures, Plankalkül supports constraints on which values can actually be used:

Eine Angaben-Beschränkung liegt vor, wenn die volle Variabilität der zu einer Angabenart gehörenden Struktur nicht voll ausgenutzt ist. Z.B. können Dezimalziffern durch 4 J.N.-Werte dargestellt werden. Es werden jedoch nur 10 von den 16 möglichen Variationen ausgenutzt. [A data-restriction is available when the full variability of the structure belonging to a datatype is not fully used. E.g. decimal digits can be represented by 4 bits. However, only 10 of the 16 possible variations are used.]
In solchen Fällen wird durch eine Beschränkungsformel angegeben, welche Fälle der Struktur in den Definitionsbereich der Angabenart fallen. Eine solche Formel wird mit B und einer Kennzahl bezeichnet. [In such cases, a restriction formula specifies which cases of the structure fall within the defined range of the datatype. Such a formula is denoted by B and an ID number.]

“Typ” and “Art” are synonyms, so they're ripe for distinction by anyone who wants words for two concepts. Zuse does: Angabentypen are optional annotations distinct from both structures and restrictions, while Angabenarten bundle all three together:

Angabentypen [Datatypes]

Den gleichen Strukturen und Beschränkungsformeln können Angaben verschiedener Bedeutung zugeordnet sein. (Z.B. x = und y = Koordinaten). Im allgemeinen ist es nicht nötig, diese zu unterscheiden. Ist dies jedoch vorteilhaft, so werden Typenbezeichnungen eingeführt. Z.B. T₁, T₇ usw. [The same structures and restriction-formulas can be assigned to data of different meaning. (E.g. x = and y = coordinates). In general it is not necessary to distinguish them. If it is advantageous, however, type-designations will be introduced. E.g. T₁, T₇ etc.]
Angabenart [Datatype]

Jeder Angabenart ist eine Struktur und evtl. eine Beschränkung bzw. eine Typenbezeichnung zugeordnet. Darüber hinaus kann eine Angabenart noch durch spezielle Bedeutungen der Komponenten gekennzeichnet sein. (Z.B. Zahlen in halblogarithmischer Form, vergl. Zahlenrechnungen S. 119 ff). [Each datatype is assigned a structure and possibly a restriction or type-designation. In addition, a datatype can be further characterized by specific meanings of the components. (E.g. numbers in semi-logarithmic [=floating-point] form, see Numerical Calculations, p.119 ff.)]
Alle diese Kennzeichnungen können dann unter einem Angabenzeichen A zusammengefaßt werden. Ist eine Angabe durch ein A-Zeichen z.B. A10 gekennzeichnet, so ist die besondere Kennzeichnung der Struktur usw. nicht erforderlich, da diese in A10 mit enthalten ist. [All these identifiers can be combined under one data symbol A. If a datum is marked with an A-symbol, e.g. A10, the specific identifier of the structure etc. is not required, as it is included in A10.]
Angabenart-Zeichen können jedoch auch einer Gruppe analoger Angabenarten verschiedener Struktur zugeordnet sein. Z.B. können Zahlen durch verschiedene Strukturen (z.B. Dual-Zahlen, Dez.-Zahlen) dargestellt werden. Jedoch kann ein allgemeines Zeichen (z.B. A8 vergl. Zahlenrechnen S. 121) eingeführt werden, welches lediglich besagt, daß es sich um eine Zahl handelt, ohne ihre Struktur im einzelnen festzulegen. [Datatype symbols can, however, also be assigned to a group of analogous datatypes of different structures. E.g. numbers can be represented by various structures (e.g. binary numbers, decimal numbers). However, a generic symbol (e.g. see A8, Numerical Calculations, p.121) can be introduced which only says that it is a number, without specifying its structure in detail.]
Wir führen entsprechend σ ein unbestimmtes Angabenartzeichen α ein. [We introduce an undefined datatype symbol α corresponding to σ.]

With abstract types in 1945, Plankalkül's type system is ahead of its time. So is its support for predicate calculus, which is worth a post of its own. Less exotically, it has the basic features of languages a decade later: (one-armed) conditionals, loops, function calls, and the assignment statement (written left-to-right).

One feature of Plankalkül is conspicuously primitive. All of the symbols for data structures, restrictions, constants, variables, and so on are not named but numbered. It's like Intercal but 27 years earlier!

Zuse noticed that it was confusing to so many numbers with so many different meanings, and tried to distinguish them with a unique two-dimensional syntax:

Die Zeilendarstellung [The line format]

Um die zu einer Angabe gehörenden verschiedenen Kennzeichnungen, wie Variablen-Index, Komponentenangabe, Angabenart bzw. Struktur usw. übersichtlich darstellen zu können, werden diese einzelnen Kennzeichnungen je verschiedenen Zeilen einer Formel zugeordnet. [To be able to show the various identifiers belonging to a datum, such as variable index, component data, datatype or structure etc., these individual identifiers are assigned to different lines of a formula.]
Wir haben zunächst die Hauptzeile, in welcher die Formel in der bisher üblichen Art dargestellt wird. [First we have the main line in which the formula is shown in the usual way.]
Die nächste Zeile dient der Unterscheidung der verscheidenen Variablen, welche durch den „Variablen-Index“ erfolgt. (V ). Eine weitere Zeile dient der Kennzeichnung der Komponenten der durch die Zeile 1 und 2 gekennzeichneten Variablen. (Komponentenindex K.) [The next line serves to distinguish the different variables, which is done by the “variable index” (V). Another line serves to identify the components of the variables indicated by lines 1 and 2. (Component index K.)]
Es wird also z.B. der Ausdruck [Thus e.g. the expression]
K1(V₃) Komponente 1 von V₃ [Component 1 of V₃]
wie folgt geschrieben [is written as follows]:
V
3
1
bzw. [or] K2.3(Z₄) =
Z
4
2.3

In modern notation, those are V₃[1] and Z₄[2, 3].

Weitere Zeilen können der Kennzeichnung der Struktur und Angabenart bzw. der Beschränkung und dem Typ dienen. [Further lines may be used to indicate the structure and type of data, or the restriction and the type.]
Im allgemeinen wird entweder die Angabe der Struktur oder der Angabenart genügen. (S = Index bzw. A = Index) [In general either the specification of the structure or of the datatype suffice. (S-index or A-index.)]
z.B. [e.g.]
Z
4
2.3
0
bedeutet: „Z4, Komponente 2.3”. Der Wert ist von der Struktur S0. [means: “Z4, component 2.3”. The value is of the structure S0.]
Die Strukturangabe bzw. Angabenart – Angabe bezieht sich dabei auf die Komponente. [The structure specification or datatype specification refers to the component.]
Die einzelnen Zeilen werden durch Vorsetzen der Buchstaben V, K, S bzw. A vor die Zeilen der Formel gekennzeichnet: [The individual lines are identified by prefixing the letters V, K, S or A before the lines of the formula:]
  | Z ^ Z
V | 4   2
K | 2.3
S | 0   0
Wird von einer Angabe keine Komponente gebildet, so bleibt der Komponenten-index frei. [If no component is established for a datum, the component index remains empty.]
Das Zeichen A kann stets an Stelle des Zeichens S gesetzt werden; aber im allgemeinen nicht umgekehrt. Die für Strukturen bereits definierten Kennzahlen dürfen dann nicht mehr für Angabenarten benutzt werden: (Z.B. gibt es nur eine Struktur S0, S1.n und die Zeichen A0, A1.n sind mit diesen Strukturzeichen identisch.) [The symbol A can always be used in place of S, but in general not vice versa. The ID numbers already defined for structures can thus no longer be used for datatypes: (E.g. there is only one structure S0, S1.n and the symbols A0, A1.n are identical to these structure symbols.]

If only Zuse had thought of giving them names! But he was trying to solve a different problem, of typography:

Mit Hilfe dieser Darstellung ist es leicht möglich, die einzelnen Angabenarten zu unterscheiden. Es ist nicht mehr wie bisher in der Mathematik nötig, verschiedene Zeichenarten für verschiedene Angabenarten heranzuziehen. (Z.B. deutsche Buchstaben für Vektoren.) Ein solches Verfahren wäre im allgemeinen Plankalkül nicht anwendbar, da die Zahl der verschiedenen Angabenarten innerhalb der gleichen Rechenpläne bzw. Plangruppen derartig mannigfaltig sein kann, daß die zur Verfügung stehenden Zeichenarten nicht ausreichen. [With the help of this representation it is easily possible to distinguish the individual datatypes. It is no longer necessary, as hitherto in mathematics, to draw up different types of symbols for different datatypes. (E.g. German letters for vectors.) Such a method would not be practical for general plan calculus, as the number of different datatypes in one program or program-group can be so many that the available types of symbols are not enough.]
Constanten [Constants]

Den einzelnen Angabenarten, Typen bzw. Strukturen können Constanten zugeordnet werden, denen spezielle Bedeutung zukommt. Eine Constante ist ein bestimmter Fall aus der Menge der möglichen Variationen einer Angabenart bzw. Struktur. Sie werden mit C und einer Kennzahl bezeichnet. [To the individual datatypes, types or structures constants can be assigned which have special significance. A constant is a particular case from the set of possible variations of a datatype or structure. They are denoted by C and an ID number.]

In addition to constants, Plankalkül distinguishes three kinds of variables (input, intermediate, and output). Since all four can be used in the same context, the symbols C, V, Z and R must appear on every variable reference to distinguish them, so the two-dimensional syntax is not helping much. It's also difficult to transcribe, so I'll stop here rather that trying to translate all 180 pages.

I don't know if Plankalkül was known to the designers of later programming languages, or if it had any influence. But its casual usage of the words “Angabentyp” and “Angabenart” suggests they were already established in 1945.

Don't abbreviate rare names

Some languages are too consistent about keeping their names short. Arc and Wart call their macro-defining operators mac instead of defmacro or define-macro.

I understand how a designer could see mac as an important operator. If you think of macros as a central, distinctive feature of your language, and if you use quite a few of them to bootstrap your standard library, they feel important enough to deserve a short name.

mac does almost nothing for the length of programs, though. Macro definitions, however fundamental, aren't common enough for it to matter. define-macro is short enough. I prefer defmacro, but only because it follows a naming convention that makes other, more common names shorter; it's not common enough itself to justify an irregularly short name.

Save the aggressive abbreviation for common operations like make-hash-table. Giving that a one-word name (or even {}) makes more difference.

It's a normative theory.

When a theory fails to usefully describe reality, one bad response is to demand that reality stop disobeying it. Cosma Shalizi illustrates:

A: Hey, you over there, the one walking! You're doing it wrong.
B: Excuse me?
A: You're only using two feet! You should keep at least three of your six in contact with the ground at all times.
B: ...
A: Look, it's easily proved that's the optimal way to walk. Otherwise you'd be unstable, and if you were walking past a Dutchman he could kick one of your legs with his clogs and knock you over and then lecture you on how to make pancakes.
B: What? Why a Dutchman?
A: You can't trust the Dutch, they're everywhere! Besides, every time you walk it's really just like running the gauntlet at Schiphol.
B: It is?
A: Don't change the subject! Walking like that you're actually sessile!
B: I don't seem to be rooted in place...
A: It's a technical term. Look, it's very simple, these are all implications of the axioms of the theory of optimal walking and you're breaking them all. I can't get over how immobile you are, walking like that.
B: "Immobile"?
A: Well, you're not walking properly, are you?
B: Your theory seems to assume I have six legs.
A: Yes, exactly!
B: I only have two legs. It doesn't describe what I do at all.
A: It's a normative theory.
B: For something with six legs.
A: Yes.
B: I have two legs. Does your theory have any advice about how to walk on two legs?
A: Could you try crawling on your hands and knees?

Cosma is thinking of Bayesian statistics, but I sometimes feel the same way about type theory.

In both cases the problem is not with the theory, but with the movement that insists that the theory should be used for everything, whether it works or not.

Many happy returns

Some languages can return multiple values from a function, just as they can pass multiple arguments to a function. I used to think this was an important language feature. Functions that return more than one result are common, after all, so every language has a way to express them — by returning structures, or side-effecting reference parameters, or CPS — or, instead of these workarounds, by supporting multiple return values directly. It complicates the language kernel a little, but it makes code cleaner, right?

No, it doesn't. There are several reasons for a function to return multiple values. None of them require direct language support, and for most of them, it's not even helpful.

Secondary values

Some functions return one useful value, plus some secondary values that aren't interesting to most callers. For instance, Common Lisp's two-argument floor returns a quotient and a remainder, but usually you just want the quotient. This is where built-in multiple values shine: they can automatically ignore the extra value, so the common case Just Works, with no effort from either the caller or the multivalued function.

CL-USER> (format nil "π is about ~S" (floor 355 113))
"π is about 3"
CL-USER> (floor 355 113)
3
16

Returning and destructuring a tuple doesn't handle secondary values well. When the caller wants all the values, it's fine (and equivalent to built-in multiple values), but in the more common case where the caller wants only the primary value, it forces them to explicitly extract one component. This can often be done with a convenience function like left (l, r) = l, but it still adds noise.

If your language supports overloading on return type, you can make the function return either the primary value or the structure — whichever the caller wants. This is a rare and difficult feature, though.

Returning secondary values by side effect works surprisingly well. If the caller doesn't want the secondary values, it's no trouble, since the arguments to which to write them can be optional. When the caller does want them, it requires binding a variable, which typically forces the expression into a block, which is occasionally a problem for functional code. It's not very verbose, though:

int floor(int a, int b, int *remainder = NULL);

int remainder;
int quotient = floor(a, b, &remainder);

Now for my favorite: continuation-passing style. It has a bad reputation because it's associated with total CPS transformation, in which every continuation becomes an explicit lambda, which is onerous and unreadable. That is indeed bad. If you pass explicit continuations only where needed, however, it's no worse than other uses of λ.

CPS provides a way to handle secondary values at least as well as built-in multiple values do, but without language support. With no explicit continuation, the function returns the primary value, but if a continuation is provided, it receives all the values:

imaginary-lisp> (floor 355 113)
3
imaginary-lisp> (floor 355 113 (λ (quot rem) rem))
16

Success and failure

Often a secondary value encodes success or failure, as in Common Lisp's gethash, read, or macroexpand-1, or any of the many Go functions that return an error as their second value. This means callers who don't care about errors can simply ignore the extra value, while those who do can still get it.

(multiple-value-bind (val present?) (gethash vars varname)
  (if present? val (error "Unbound variable: ~S" varname)))

file, err := os.Open("/some/path")
if err != nil {
    panic(err)
}

Returning a structure + pattern matching handles this cleanly and safely: you simply return a different structure for each continuation. Usually this is something like Haskell's Maybe or Either:

case (Data.Map.Strict.lookup vars varname) of
  Nothing → error "Unbound variable: ~S" varname
  Just val → val

case foo of
  Left err → error err
  Right x → x

This can be a little verbose, but the verbosity can sometimes be eliminated by operators that automatically propagate the failure case, such as Haskell's various monad operators.

CPS has an even cleaner way to handle this: success and failure are two different continuations, so the function can simply take an optional continuation for handling errors, or two explicit continuations (of which the success continuation is often the identity function):

(slurp filename
  (fn (err) (error "Unable to open ~S: ~S" filename err)))

(gethash vars varname i (fn () (error "Unbound variable")))

This multiple-continuation style is often used in Smalltalk, where it's particularly convenient because of Smalltalk's terse lambdas. Toward Leakage Containment (Julia Lawall and Dan Friedman, 1992) recommended it for Scheme, but got little attention, perhaps because it used the nonsensical name “continuation-constructing style” (and didn't mention it in the title). I call it multi-CPS and find it very convenient — often more so than catching exceptions.

Complex returns

Some functions really do have multiple values to return — sometimes lots of them, like Common Lisp's get-setf-expansion, with five values, or get-decoded-time, with nine. These functions tend to be awkward to use regardless of how you receive the return values, but the least awkward way is to return a structure, because then at least you're not forced to name each result individually. That's why most languages do this for times:

time_t now = time(NULL);
tm *decoded = localtime(&now);
printf("The year is %d.\n", tm.tm_year + 1900);

This is less painful than Common Lisp's multiple-return-value approach, which often forces you to bind more values than you care about:

(multiple-value-bind (sec min hour day month year)
                     (get-decoded-time)
  (format t "The year is ~S.~%" year))

setf expanders are a similarly complex result that ought to be a structure.

(multiple-value-bind (temps forms svars writeform readform)
                     (get-setf-expansion x e)
  ...)

Simpler cases, like partitioning a collection, are adequately handled by tuples (if you have destructuring) or CPS (if you don't).

Why so much trouble?

Expression languages get much of their elegance by encoding dataflow in program structure. Each expression has one parent, and can therefore easily specify what to do with one return value. When there is more than one, there's not enough room in the structure to express what to do with each value, so we have to specify it in some more other way.

I still think multiple return values are important, but I no longer think they require special language support. There are plenty of good alternatives, and improving a language's support for those alternatives (by e.g. optimizing destructuring or λ) is easier than complicating the core semantics, and more likely to be useful for other purposes.

A brief history of “type”

The word “type” has a variety of meanings in programming languages, which are often a focus of confusion and contention. Here's a history of its use, focusing on particularly influential languages and papers.

1956: Fortran “modes”

The term “type” was apparently not yet established in 1956, because the Fortran manual speaks of integer and floating-point “modes” instead. It has something called “statement types”, but those are what are now called syntactic forms: assignment, conditional, do-loop, etc.

The 1963 Fortran II manual speaks of "two types of constants" (integer and floating-point), but this seems to be just the English word. When it talks about these types in more detail, it calls them “modes”, e.g. “arguments presented by the CALL statement must agree in number, order, mode, and array size with the corresponding arguments in the SUBROUTINE statement”. (Evidently the terms “formal” and “actual” parameters weren't established yet either.)

1958-63: Algol

Algol is one of the most influential languages in history. It introduced if ... then ... else, the int n declaration syntax, and semicolons. It also popularized the term “type”. The Algol 58 report defines type declarations on variables in terms of the “type” and “class” of values:

Type declarations serve to declare certain variables, or functions, to represent quantities of a given class, such as the class of integers or class of Boolean values. [...] Throughout the program, the variables, or functions named by the identifiers I, are constrained to refer only to quantities of the type indicated by the declarator.

The Algol 60 report is more consistent:

The various “types” (integer, real, Boolean) basically denote properties of values. The types associated with syntactic units refer to the values of these units.

Note that types are explicitly a property of values, not variables or expressions. But does “basically” mean someone thought otherwise, or just that this isn't a formal definition?

1967: Strachey's Fundamental Concepts

Chris Strachey's Fundamental Concepts in Programming Languages was an influential set of lecture notes that established a bunch of common terms. It defines types thus:

Most programming languages deal with more than one sort of object—for example with integers and floating point numbers and labels and procedures. We shall call each of these a different type and spend a little time examining the concept of type and trying to clarify it.

Strachey takes it for granted that types can be static or dynamic, and prefers static typing only for reasons of efficiency (which was, after all, of overwhelming importance in 1967):

It is natural to ask whether type is an attribute of an L-value or of an R-value—of a location or of its content. The answer to this question turns out to be a matter of language design, and the choice affects the amount of work, which can be done when a program is compiled as opposed to that which must be postponed until it is run.

Strachey does not mention type theory, because no one had yet realized that it could be applied to programs. That changed in the next year.

1968: type theory

James Morris was the first to apply type theory to programming languages, in his 1968 Lambda-calculus models of programming languages. “A system of types and type declarations is developed for the lambda-calculus and its semantic assumptions are identified. The system is shown to be adequate in the sense that it permits a preprocessor to check formulae prior to evaluation to prevent type errors.”

He begins by explaining what types are and why they matter, using the term in the usual programming-languages sense:

In general, the type system of a programming language calls for a partitioning of the universe of values presumed for the language. Each subset of this partition is called a type.
From a purely formal viewpoint, types constitute something of a complication. One would feel freer with a system in which there was only one type of object. Certain subclasses of the universe may have distinctive properties, but that does not necessiate an a priori classification into types. If types have no official status in a programming language, the user need not bother with declarations or type checking. To be sure, he must know what sorts of objects he is talking about, but it is unlikely that their critical properties can be summarized by a simple type system (e.g., prime numbers, ordered lists of numbers, ages, dates, etc.).
Nevertheless, there are good, pragmatic reasons for including a type system in the specifications of a language. The basic fact is that people believe in types. A number is a different kind of thing from a pair of numbers; notwithstanding the fact that pairs can be represented by numbers. It is unlikely that we would be interested in the second component of 3 or the square root of < 2,5 >. Given such predispositions of human language users, it behooves the language designer to incorporate distinctions between types into his language. Doing so permits an implementer of the language to choose different representations for different types of objects, taking advantage of the limited contexts in which they will be used.
Even though a type system is presumably derived from the natural prejudices of a general user community, there is no guarantee that the tenets of the type system will be natural to individual programmers. Therefore it is important that the type restrictions be simple to explain and learn. Furthermore, it is helpful if the processors of the language detect and report on violations of the type restrictions in programs submitted to them. This activity is called type-checking.

Then he switches without explanation to taking about static checkers, e.g:

We shall now introduce a type system which, in effect, singles out a decidable subset of those wfes that are safe; i.e., cannot given rise to ERRORs. This will disqualify certain wfes which do not, in fact, cause ERRORS and thus reduce the expressive power of the language.

So the confusion between programming-language and type-theory senses of the word began with the very first paper to use the latter.

1968: APL

APL-360 was the most popular dialect of APL. Its manual doesn't use the word “type”; it speaks of “representations” of numbers. But it considers these an implementation detail, not an important part of its semantics.

APL has a lot of unique terminology — monad and dyad for unary and binary operators, adverb and conjunction for high-order operators, and so on — so it's not surprising that it has its own word for types too.

1970: Pascal

Wirth's 1970 definition of Pascal is, as usual, plain-spoken: “The type of a variable essentially defines the set of values that may be assumed by that variable.” (But there's that “essentially”, like Algol's “basically”.)

1970-73: Lisp belatedly adopts the term

Like Fortran, early Lisps used the word “type”, but only in its ordinary English sense, never as a technical term. AIM-19, from 1960 or 1961, speaks of “each type of LISP quantity”, but doesn't use “type” unqualified. Similarly, the 1962 Lisp 1.5 Manual uses the word for various purposes, but not as an unqualified term for datatypes. The most common use is for function types (subr vs. fsubr); there are “types of variables” (normal, special, common), but datatypes were not, apparently, considered important enough to talk about. They might not have even been seen as a single concept — there are awkward phrases like “bits in the tag which specify that it is a number and what type it is”, which would be simpler with a concept of datatypes.

This changed in the early 1970s. The 1967 AIM-116a and 1970 AIM-190 still don't use “type”, but the 1973 Maclisp manual and 1974 Moonual do, and it consistently means “data type”. Most tellingly, they have typep, so the term was solidly ensconced in the name of a fundamental operator.

1973: Types are not (just) sets

By 1973, the definition of types as sets of values was standard enough that James Morris wrote a paper arguing against it: “Types are not sets”. Well, not just sets. He was talking about static typechecking, and argued that checking for abstraction-safety is an important use of static typechecking. The abstract explains:

The title is not a statement of fact, of course, but an opinion about how language designers should think about types. There has been a natural tendency to look to mathematics for a consistent, precise notion of what types are. The point of view there is extensional: a type is a subset of the universe of values. While this approach may have served its purpose quite adequately in mathematics, defining programming language types in this way ignores some vital ideas. Some interesting developments following the extensional approach are the ALGOL-68 type system, Scott's theory, and Reynolds' system. While each of these lend valuable insight to programming languages, I feel they miss an important aspect of types. Rather than worry about what types are I shall focus on the role of type checking. Type checking seems to serve two distinct purposes: authentication and secrecy. Both are useful when a programmer undertakes to implement a class of abstract objects to be used by many other programmers. He usually proceeds by choosing a representation for the objects in terms of other objects and then writes the required operations to manipulate them.

1977: ML and modern static typing

ML acquired its type system in about 1975 and was published in 1977. Until this point, the application of type theory to programming languages had been theoretical, and therefore had little influence. ML made it practical, which has probably contributed a lot to the terminological confusion.

ML's theoretical support (along with the misleading slogan “well-typed expressions do not go wrong”) came out in the 1978 paper A Theory of Type Polymorphism in Programming, which despite being about type theory, speaks of types containing values:

Some values have many types, and some have no type at all. In fact “wrong” has no type. But if a functional value has a type, then as long as it is applied to the right kind (type) of argument it will produce the right kind (type) of result—which cannot be “wrong”!
Now we wish to be able to show that—roughly speaking—an Exp expression evaluates (in an appropriate environment) to a value which has a type, and so cannot be wrong. In fact, we can give a sufficient syntactic condition that an expression has this robust quality; the condition is just that the expression has a “well-typing” with respect to the environment, which means that we can assign types to it and all its subexpressions in a way which satisfies certain laws.

The short version

So here's the very brief history of “type” in programming languages:

It wasn't used at all until 1958.
Types as sets of values: Algol-58.
The type-theory sense: Morris 1968.

These may not be the earliest uses. I got most of the old manuals from Paul McJones' collection, which is a good place to look for more. I welcome antedatings.

I'm also curious about the term “datatype”, which might plausibly be ancestral to “type”. I could find no uses of it older than “type”, but I may be looking in the wrong field. Statistical data processing is much older than computing, and has dealt with datatypes for a long time. Might the terms “datatype” and “type” have originated there?

Update August 2015: Jamie Andrews said much the same seven months earlier.

Update June 2017: In HN comments, dvt found “datatype” in 1945, in Plankalkül.

Incorrect optimization in 1963

Floating-point users today are accustomed (or resigned, sometimes) to compilers that make invalid optimizations by assuming all arithmetic is mathematically correct instead of rounding. The situation used to be worse. A 1963 IBM Fortran II manual warns that it did this for integers too:

FORTRAN assumes that mathematically equivalent expressions are computationally equivalent. Hence, a sequence of consecutive multiplications, consecutive divisions, consecutive additions, or consecutive subtractions, not grouped by parentheses will be reordered, if necessary, to minimize the number of storage accesses in the object program.
Although the assumption concerning mathematical and computational equivalence is virtually true for floating point expressions, special care must be taken to indicate the order of fixed point multiplication and division, since fixed point arithmetic in FORTRAN is “greatest integer” arithmetic (i.e., truncated or remainderless). Thus, the expression
5*4/2
which by convention is taken to mean [(5 × 4)/2], is computed in a FORTRAN object program as
((5/2)*4
i.e., it is computed from left to right after permutation of the operands to minimize storage accesses.
The result of a FORTRAN computation in this case would be 8. On the other hand, the result of the expression (5 × 4)/2 is 10. Therefore, to insure accuracy of fixed point multiplication and division, it is suggested that parentheses be inserted into the expression involved.

(Reordering “to minimize the number of storage accesses” is pointless in a constant expression, but apparently the optimizer did it anyway.)

If this reordering can be prevented by redundant parentheses, then parentheses don't only affect parsing; they change semantics by introducing a barrier against algebraic transformations!

Giving parentheses this additional meaning has an unfortunate effect: other optimizations can no longer ignore them. The manual continues by describing one such problem:

One important type of optimization, involving common subexpressions, takes place only if the expression is suitably written. For example, the arithmetic statement
Y = A*B*C + SINF (A*B)
will cause the object program to compute the product A*B twice. An efficient object program would compute the product A*B only once. The statement is correctly written
Y = (A*B) * C + SINF (A*B)
By parenthesizing the common subexpression, A*B will be computed only once in the object program.
In general, when common subexpressions occur within a expression, they should be parenthesized.
There is one case in which it is not necessary to write the parentheses, because FORTRAN will assume them to be present. These are the type discussed in “Hierarchy of operations,” and need not be given. Thus
Y = A*B+C+SINF (A*B)
is, for optimization purposes, as suitable as
Y = (A*B)+C+SINF (A*B)

I'm not sure whether the problem is simply that A*B*C does not contain the subexpression A*B, or that the CSE lifter sees it but can't merge it with (A*B) because they're not equivalent in all contexts.

Optimizers today still have limitations, and still make invalid transformations, but they've become much more subtle!

`¤ × σ`	Ist das allgemeinste Strukturzeichen einer Liste. (Struktur der Glieder und Zahl der Glieder offen gelassen).	Is the common structure symbol of a list. (Structure of elements and number of elements left open.)
`¤ × 2σ`	Ist die Struktur einer Paarliste, bei der die Glieder der einzelnen Paare von gleicher Struktur `σ` sind.	Is the structure of a pair-list where the elements of each pair are of the same structure `σ`.
`¤ × (σ, τ)`	Ist die Struktur einer Paarliste bei der die Vorderglieder die Struktur `σ`, und die Hinterglieder die Struktur `τ` haben.	Is the structure of a pair-list where the front elements have the structure `σ` and the back elements have the structure `τ`.
`2 × n × σ`	Ist keine Paarliste, sondern ein Paar von Listen.	Is not a pair-list, but a pair of lists.