Saturday, January 26, 2013

When is a programming feature 'helpful'?

There was a kerfuffle in my twitter feed this morning which highlighted some issues about language design. Don Stewart retweeted this conversation about funny JavaScript behavior. Reproduced in a JavaScript  console, it looks like this:
    > ['10','10','10','10','10'].map(parseInt)
    [10, NaN, 2, 3, 4]
It looks completely mysterious to a Haskeller, and by "mysterious" I mean "punishable by life in prison." I'd bet there's many a functional head listing on a sprained neck out there in Haskell-land,and that co-functional parents are explaining to their children what adult words are. I set about looking for an explanation, because I could not imagine how map could behave this way. I got the exact same answer in my browser, which made it unlikely that it was due to uninitialized memory. So first I tried using the identity function:
    > ['10','10','10','10','10'].map(function(x) {x;})
    [undefined, undefined, undefined, undefined, undefined]
Oh, right, JavaScript requires a return:
    > ['10','10','10','10','10'].map(function(x) {return x;})
    ["10", "10", "10", "10", "10"]
Great! Sensible behavior. Next, I looked up parseInt and found that it takes two arguments: parseInt(string, radix). JavaScript code is full of helpful functions that take variable numbers of arguments and guess what you mean based on their type, so this seemed a likely candidate for trouble.
    > ['10','10','10','10','10'].map(function(x) {return parseInt(x,10);} );
    [10, 10, 10, 10, 10]
Much better. Presumably then, the broken version supplies a second argument to parseInt that it pulled out of its ass. The tail of ..2,3,4] made me suspect there was some accumulation happening, so I tried this:
    > [1,2,3,4,5].map(function(x,y) {return (x+y);} )
    [1, 3, 5, 7, 9]
Prison's too good for them, I thought. Apparently this sums adjacent elements, except...wait, there would have to be a zero prepended for that to work. I observed that addition being commutative, it was projecting out valuable information, so I tried something a little more revealing:
    > [1,2,3,4,5].map(function(x,y) {return ('('+x+','+y+')');} )
    ["(1,0)", "(2,1)", "(3,2)", "(4,3)", "(5,4)"]
Still looks like adjacent sums with a magical zero. At this point I had some coffee and realized I was confused about the types. I had two lists of small integers, one from a mysterious source, not one list being zipped with itself.
    > [1,2,3,4,5].map(function(x) {return x*10;}).map(function(x,y) {return ('('+x+','+y+')');} )
    ["(10,0)", "(20,1)", "(30,2)", "(40,3)", "(50,4)"]
Okay. The second argument is clearly not a function of the array contents and now looks like the array index of the current argument. Going back to the original problem:
    > parseInt('10',0)
    > parseInt('10',1)
    > parseInt('10',2)
Great, this sort of makes sense now. A radix of zero is assumed to be 10, a very C-like idiom, and the second example fails correctly because '10' is not a unary number. So, parseInt is not the villain, nor really is map. It is perfectly reasonable to have a variant for arrays that would operate on an element and its index. For me, the villain here is the handling of optional arguments. The map function is overloaded silently and that's the kind of help I don't want.

'Helpful' means different things to different people. To a Haskeller, the definition is something like: publish your promises and keep them. Explicit type signatures, referential transparency, static type-checking are all there to make those promises come true. In JavaScript, optional arguments are considered helpful, because they let you reuse the same function name in slightly different contexts.

Steadily, monotonically, over the years, I have moved away from convenience in the specification of a problem whenever it introduces uncertainty. When I first learned C, I memorized the precedence of operators so that I could write expressions with a minimum number of parentheses. Then I learned other things and forgot the precedences and just started parenthesizing everything until it was unambiguous. A friend in graduate school used Modula-3, which had no automatic casting between numeric types. C programmers were horrified, but I felt it was a breath of fresh air.

These conveniences should be part of interactive programming environments, not programming language specifications. Fuzziness, forgetfulness, disorder are part of the human condition and should be acknowledged, but they should not make it into the code. Say what you mean to the programming environment, by all means, but have the resulting program be completely unambiguous.

Which reminds me, the JavaScript map actually accepts a function of three arguments...


Fredrik Carlén said...

Thanks for that clarification. Somebody still ought to be punished.

Anonymous said...

yes someone ought to be punished...for wasting my time. this is a great example of what happen to a programming language x bigot when he starts imagining things about language y without actually looking it up. not impressed.

Anonymous said...

map call the function with 3 params(value, index, array)