Arbitrary overloading isn't always confusing

I have always accepted the conventional wisdom about overloading: that while it's very nice for operations that are conceptually identical, overloading two unrelated operations, like + for catenation, is just an invitation to mistakes. But I have to revise this opinion, because there's a conspicuous counterexample in C++. It uses << and >> for both shifting and I/O, and I've never found this confusing. Why?

I suspect what's going on here is that the shifting and I/O operators are seldom used together, or even in similar contexts. I/O code doesn't look anything like bit-crunching code, so opportunities for confusion are few and far between. The problem with + may be that addition and concatenation have a vague semantic similarity (which is what inspired the overloading in the first place), which leads them to appear in similar contexts. (It seems to me that it's much easier to be uncertain about whether some expression is a number or a string than whether it's an integer or a stream.) It probably doesn't help that they're both very common operations found in all sorts of code. And programmers rely on the associativity of both operations, so the non-associativity of the overloaded combination is not just a theoretical problem. But C++'s shift-or-stream operators show that overloading unrelated operations isn't intrinsically confusing. It depends on how they're used.

Not that I'm advocating arbitrary overloading. But maybe we shouldn't be too paranoid about it - which is good because it turns up surprisingly often. Some familiar operations are traditionally overloaded in semantically dubious ways, including good old +. Integer and (fixed-precision) floating-point arithmetic have different mathematical and practical properties, because one rounds and the other doesn't. Theoretically they should be separate, as they are in OCaml, but most languages unify them anyway. Is this a problem? Maybe a little, when someone forgets they're doing floating-point. But for the most part the two (or more) +es coexist fine.

But addition and catenation don't. But shifting and I/O do. What's the pattern here?

4 comments:

  1. Counter-example for C++ I/O: http://barrkel.blogspot.com/2008/04/c-evaluation-order-gotcha.html

    For my part, I've never been confused, or even ever surprised, by + overloaded for addition versus concatenation. On the other hand, I have been surprised by / as used for integer division in C (but not, of course, in Pascal, which I had learned previously).

    There's probably at least two things going on here. First, how adept you are as a programmer at recognizing the type of any given subexpression (i.e. not just a factor) and thus your degree of certainty as to the selected overload of any given operator; and secondly, the knowledge of the operators themselves, as they apply to values of different types.

    The first skill takes some time to learn, as one gets used to the evaluation order etc. I think the skill is built up more quickly in a statically and strongly typed language. I would strongly suspect that a weakly and dynamically typed language, e.g. one which coerces strings like "42" to integers like 42 depending on the operator, would be the corresponding worst-case scenario for learning this.

    On the operators, I think this has to do with one's preconceived notions of what effect each operator has on values of a particular type, if any. If you don't have a prior notion, it's very easy to learn, like C++ streams. If you do have prior notions, like different '/' overloads having particular return types, you'll be more likely to be tripped up.

    So, in the specific case of '+', I would expect dynamically-typed and strongly type-inferred language users to be more easily tripped up. Lack of type annotations and reliance on invariants that are in the programmer's mind, rather than in the program text, would probably lead to less accuracy in knowing subexpression types. The static type system should warn the programmer earlier, though.

    That '+' relates to such different operations as addition and concatenation, I think is a lesser problem; I think '/' as integer division and floating-point division should cause an equivalent amount of mistakes, assuming that the programmer doesn't have prior notions of either (i.e. knows all the overloads of '+' and '/' to an equivalent level).

    Finally, one last thing about stream operators: these have a specific idiom. Stream on extreme left, then a '>>' or '<<' separated list of things to extract or insert. The fact that they don't usually appear in arbitrary subexpressions is another strong indicator, to me, that the "operators" need scarcely even be recognized as such. The insertion / extraction idiom looks like a statement, not an expression.

    ReplyDelete
  2. I find integer / as floor/ confusing too, especially since I'm used to languages that have correct division.

    The confusion I get from arbitrary overloading isn't that I can't easily figure out the types of expressions, but that I can't recognize code at a glance so easily. Usually arithmetic looks different from string processing or I/O, largely because it uses different operations. So when I'm looking for a particular piece of code, I can filter out almost everything with only the most superficial reading, without considering types at all. Overloading can break this useful tool by making different sorts of code look more alike. It's not conscious confusion (except in annoying cases like Java string catenation) but a loss of obviousness.

    The stereotypical of C++ iostream expressions makes them easy to recognize, which may be a reason their overloading isn't confusing.

    I also worry about overloading interfering with extensibility, e.g. in an autoconverting language you can't catenate numbers because they get added instead, or with error detection, because adding strings isn't an error. I'm not sure I've encountered either of these, though.

    The C++ evaluation order example doesn't have anything to do with overloading, does it?

    ReplyDelete
  3. Another good example of arbitrary overloading used correctly is in the Boost filesystem library where you can use / to concatenate paths. i.e.

    newpath = path / filename;

    and it will join the path and filename using the appropriate separator for the running OS.

    ReplyDelete
  4. In fact, integer / is not floor/, it is either floor/ or truncate/, and you can't rely on arguments outside the intersection of these two functions.

    ReplyDelete

It's OK to comment on old posts.