snapsvg

2011-09-06

Einstein's Constraint: Booleans

Everything should be kept as simple as possible, but no simpler.
Albert Einstein
Perl is a language that combines ideas from many other languages. It is a language designed by a linguist, and hence it uses principles from natural language. The design of Perl is therefore a combination of two things: convention set out by its muses and Larry Wall's desires.

This meant that Perl was free to pick and choose from different conventions or invent new techniques that solved problems inherent in others'. Application of Einstein's Constraint is clear in Perl 5 (as well as a few instances of failure to apply it!), and today we will look at booleans.

True and False

The most obvious starting point when deciding how to implement booleans is to decide what will be true and what will be false. Probably the most common falseness and trueness are zero and not-zero, respectively, a convention popularised—if not invented—by C. In some dialects of Lisp, an empty list is false, and in many languages, True and False are their own types.

JavaScript, PHP, Python: many languages have explicit types for true and false—global, singleton values that always represent trueness and falseness. This allows truth to be explicitly defined. But many languages these days, including PHP, JavaScript, Python and Perl, all employ a concept called coercion to switch between different types implicitly, i.e. swapping without you having to ask for it. When this is available, we also have to consider what other values have truth or falsehood.

Is it simpler to a) have True and False as separate values and coerce into them, or b) use existing values, and a rule?

Let's ask Einstein.

To answer, we have to consider usage. Perl draws a lot on the do-what-I-mean, or DWIM, philosophy of programming, a philosophy which naturally leads to Perl automatically and transparently switching between data types where possible (and in fact it is always possible, thanks to various operators). Strings and numbers are interchangeable; objects can be converted to strings; arrays to scalars or lists. Anything can be used anywhere and a rule is applied, consistently, so that the programmer knows what to expect perl1 to do.

Observing this principle it seems like we're already tending towards b. Perl is, after all, already designed with rules in mind: a consistent (and concise) set of ground rules is the easiest way to understand what to expect in a given situation.

But if we follow the thought further we realise that once there is type coercion, the difference is moot. If we introduce a separate value for true and a separate one for false, this means we have to create a whole new set of rules for how to coerce other values into these two whenever there is a boolean test. If we already have to make that decision, it then follows that it doesn't matter what we coerce into: what matters is what values are false and what values are true.

Einstein's Constraint, then, says that since we have to implement b) anyway, and know the rules, it is tautologous to implement a) as well. Indeed, there may or may not be boolean data types internally to the Perl interpreter, but this makes no difference to Perl as a language. So, we can decide that we will simply choose values, rather than types, to be true or false. This is sensible, because Perl doesn't have types in that respect. True, false, 0 and 1 are all scalar values.

Deciding which values should be true and which should be false is the next logical step. Convention from C tells us that zero should be false; but Perl has list types, and Common Lisp suggests an empty list should be false. This nicely follows existing rules: since boolean values are scalar, and an array in scalar context is its length, then a zero-length array is naturally false because it is treated as zero. A list—however it is constructed—in scalar context returns its last item; an empty list will return no item. Nothingness, then, is falsehood.

Type coercion says that the string '0' is equal to 0, and hence false. We might consider that the strings '00' and '0E0'(0×100) also numify as 0. But note that 0 will not stringify as either of these. Only the string '0' is fully equivalent to 0, because the conversion between these two will never produce a different value.

Furthermore, we are coercing to booleans, not integers: that is to say, we want to know whether it is true or not, and we are not using any specific type to represent truthiness. You may spot inconsistency. In this respect, you might say that Perl does have a boolean type, and you'd be right in a sense. However, there is no true and false; there is merely a state of trueness that a value can have. The value only has a value of truth when it is used as a truth value; in other languages, true and false always represent truth values. Truth is contextual. How recondite.

Perl therefore defines '0' to be false for consistency with 0, but any other string to be true, including '00' and the common zero-but-true value '0E0'. The exception is the empty string '', which falls into the "nothingness" category, since there's nothing in it, and is also false.

Finally, the undefined value is conceptually equivalent to an empty list: it is a scalar with no value. That, too, is a false value.

Removing the tautology has made this as simple as possible. There is no remaining tautology here.

Comparison Operators

Einstein's Constraint removed the need for extra values from Perl to represent truth and falsehood. The same mantra in fact extends Perl's collection of operators for comparison.

To explain why, we should take a look at languages that don't. JavaScript and PHP both use == to compare all things: strings, integers and objects. They both also have === to compare without coercion.

As mentioned, coercion is the practice of treating a variable as another type by silently converting from its current type to another. Using == to compare two types in JavaScript or PHP will coerce both, one or neither operand to a different type, based on rules documented somewhere.

Using == to compare two types in Perl will coerce both operands to numbers, and compare the results. Using eq will coerce both operands to strings, and compare the results.

Why?

This becomes easy to explain when we consider the types of comparison available to us. The two obvious ones are numerical and string comparison. Two numbers are equal if they represent the same platonic value—020, 16 and 0x10 are all equal. Two strings are equal when they contain the same characters in the same order.

Then you might suggest that two arrays are equal if they are the same length, which works for Perl. Or that they are equivalent: the same keys and the same values, which works for PHP. Or that they are the same actual array, which works for PHP and JavaScript.

What about two objects? Perl's objects are necessarily references, so referential equality seems reasonable—but Perl also has operator overload, so the decision could be given to the objects themselves. PHP has true objects, so you might suggest that an operator overload would be good, but PHP doesn't think you can be trusted with that so it compares them by comparing their attributes instead. JavaScript also uses referential equality, but doesn't allow for operator overloading either.

For PHP and JavaScript, == is actually an equivalence operator, and hence numerous rules are needed to determine what is equivalent to what. === is also an equivalence operator: it is still not an equality operator. It just happens that the equivalence has fewer rules, and in many cases equality is the only satisfactory state.

Also, we've been focusing on equality. What about the other operators, < and >? Strings can be compared lexically: there are rules for what is "less than" and what is "greater than". Objects, well, who knows? PHP's manual commits the fatal flaw of calling == the "comparison operator", whereas it is in fact a comparison operator called the equality (or equivalence!) operator—a mistake which allows the manual to conveniently omit the rules for the other ones. JavaScript takes a better approach and simply decides that objects are not comparable and returns false when you try a magnitude test.

But what do you do when the strings could be integers? Do you compare lexically, or numerically? Neither is incorrect. You would be upset if you were trying to sort a list by lexical analysis only to find that your language was assuming they were numbers and treating them numerically, and likewise the reverse. Is the string "10" less than or greater than the string "011"? If we weren't type-coercing we would know instantly: it's a string, so it's greater. But we are, so we don't.

Here is a generalised table over all of ==, <, <=, > and >= operators, showing you what coercion you can expect from the languages we've mentioned, on various operands. The result column is the result of the operator <, for reference. I chose that one because it performs the most erratically. In the example, [] are used to refer to real arrays in Perl, not arrayrefs.

Operands Treated as Result
L R PHP JS Perl PHP JS Perl
0 1 Numbers Numbers Numbers 1 1 1
"0" "1" Numbers Numbers Numbers 1 1 1
"a" "b" Strings Strings Numbers 1 1 0*
"10" 11 Numbers Numbers Numbers 1 1 1
"10" "011" ??? Strings Numbers 0 0 1
"10b" "11a" Strings Numbers Numbers 0 1 1
[1, 2, 3] [1, 2, 3, 4] ??? Objects** Numbers 1 0 1
[1, 2, 3] 4 Array is always greater Objects** Numbers 0 0 1
false true Booleans Numbers Numbers2 1 1 -

* Non-numeric strings numify as zero, and a warning is cast that you numified a non-numeric string. ** < and > are defined always to return false; otherwise, true is returned if they refer to the same thing.

This is Einstein's Constraint again. Perl has made it as simple as possible, but no simpler. In PHP's case it is not defined generally over the five operators how the arguments will be treated. In JavaScript's case, each pair of operands is consistent across the five operators, but the language is inconsistent as the operands change. There are rules, but why should you have to remember them? In Perl's case they are always treated as numbers. It cannot be simpler without being more complex elsewhere.

Perl sidesteps the whole issue simply by stating that if you use any of <, <=, ==, >=, > or the special Perl-only <=>3 then they are treated as numbers; and if you use any of lt, lte, eq, gte, gt or cmp, then they are treated as strings. The mnemonic is simple: the mathematical operators are used on numbers, and the letters operators are used on letters.

Triple Equals

A hue and a cry! What audacity to not mention that PHP and JavaScript have the triple-equals operator, === that enforces type checking as well. With this magical operator, we solve the problem a different way. We can, in all cases, avoid the problem of type coercion by simply demanding that it not take place.

All cases? No. Since both languages have false as well as an undefined (null) value and zero, how do you test a string, read from standard input, for falsity? Or how do you compare a variable that exists, but is not defined or is false, and differentiate it from zero? And how many more rules and exceptions are there to this new operator, that can compare types as well? Are we forgetting the principle that we should be able to implicitly treat any type than any other type? Didn't we learn a lesson from the true/false thought experiment?

Perl's use of two types of operators for two types of comparison remains simpler, and the main reason is that all things are supposed to be coerced into all other things. That is a sound principle in Perl, but without these extra operators, other languages find a barrier preventing them from seamlessly implementing the philosophy.

That aside, there is not a triple-equals version of <= or >= is there? Those are the troublemakers, after all. Those are the ones that force us to sort our number-like strings the way they want to, not the way we want to. How do we prevent this behaviour on these other operators? Oh sod it all, let's just have separate comparisons for strings and numbers.

1 By convention, Perl is the language and perl is the interpreter.
2 The Constraint explained earlier shows why we got rid of the boolean type for Perl. While this row is correct for PHP's and JavaScript's two boolean values, all three languages will come a cropper if you try to compare a trueish value with a falsish one. Perl, again, simplifies it by not doing this, and therefore we can't say what Perl will treat true and false as because they don't exist. But it would be numbers.
3 The spaceship operator returns -1 if A is less than B; zero if they are equal; and 1 if A is greater than B. The same test is three lines of code, or two chained ternary conditional operators, in other languages: sort { $a <=> $b } ... is better than sort { $a == $b ? 0 : $a < $b ? -1 : 1 } ... because a) it is legible. cmp does the same, but for strings.

No comments:

Post a Comment