snapsvg

2011-11-10

In Context


Natural language is contextual. A famous quotation demonstrates this in a way we all find familiar:

Time flies like an arrow; fruit flies like a banana.

Usually we interpret this as "the manner in which time flies (verb - to fly) is akin to an arrow" and "the creatures called fruit flies (noun pl. - fly) like (to find agreeable) a banana". However, we can also interpret the former as "Time (measure chronologically) flies (noun pl. - fly) the way an arrow would", except of course this is nonsense and we don't consider it.

Perl is the same. The three data structures covered in previous posts can be used in all sorts of ways, and Perl will attempt to do the right thing based on context. This is done by having rules about context, and what to do in those contexts.

Time Flies

When we say "time flies" in English we can either mean a) imperative "time", noun "flies" or b) noun "time" verb "flies". In Perl we can differentiate between nouns and verbs because nouns have sigils and verbs don't.

$scalar variables have $ and a singular name because they represent a single thing. @array variables use the @ sigil and have plural names because they refer to more than one thing. %hash variables refer to a set of named things; their names are often singular because either we refer to the set ("The option set %option") or a thing from the set ("The debug option $option{debug}").

Verbs, of course, are subroutines and so have no sigil. That means in Perl we can't misinterpret that:

time @flies;

Notwithstanding the fact that time is already a Perl function for finding the time, we can understand this because flies has to be a noun.

Flies Like A Banana

Context in Perl is not a grammatical thing in the natural-language sense explained above, but it is used to determine what we do with the nouns that we pass around.

A common example is to check for an array to have content:

if (@flies) {
    ...
}

In list context, an array represents the list of scalars it contains, as we saw demonstrated in this post, but in scalar context the array is treated as the number of items in it; i.e. its length.

The test in an if statement is in boolean context. Booleans are truey and falsey values: they are scalar. An empty array in scalar context is zero, which is false; hence an empty array is false.

The while loop also uses boolean, and hence scalar, context. However, the for(each) loop uses list context:

for my $fly (@flies) {
    ...
}

This is sensible because if it used scalar context it would always loop over one thing, which is the number of things in the array. A scalar in list context doesn't really do anything special; it remains that scalar:

for my $fly ($one_fly) {
    ...
}

but it does help with concatenation of several things:

for my $fly ($bluebottle, $mayfly, @various_flies) {
    ...
}

Fruit Flies Like

Surprises crop up when dealing with context once operators and functions get involved. For example:

my $banana = map { $_->fruit_i_like } @flies;

You might think, maybe this will take the first fruit that a fly likes, or the last, or some data structure representing the result of the map. Well the latter is the closest: the data structure returned is in fact an integer representing the number of fruits the flies, in total, liked. The context in this statement is in the = operator: it is determined by the thing on the left and it is enforced on the thing on the right. perldoc says that map in scalar context returns the length of the constructed list.

We can take the first fruit liked by a fly by putting the = operator in list context. In this case, that is done using parentheses:

my ($banana) = map { $_->fruit_i_like } @flies;

Parentheses only create list context on the left of =; as we learned in this post, parentheses on the right only serve to override the precedence of operators.

Of course, assigning to an aggregate type is also list context:

my @fruits = map { $_->fruit_i_like } @flies;
my %likes = map { $_->name => $_->fruit_i_like } @flies;

(In this latter example we assume fruit_i_like returns only one item; otherwise the constructed list will break the expected hash construction.)

Another common confusion is when the operator or function itself enforces a context:

my $random_fly = $flies[rand @flies];

The rand function returns a random number; if it is provided with an integer it will return a random number between 0 and that integer, including 0 and excluding the integer. Normal behaviour for a function is to use an array as a parameter list, but rand overrides this by enforcing scalar context on its arguments.

That means that the array given to rand is evaluated in scalar context, giving its length. The output is used in [] to index the array; array indices are required to be integers so the decimal return value from rand is truncated. Since rand can never return the integer provided to it, the maximum return value from this is the last index of the array, and the minimum is zero.

Enough With The Rearranging Of The Quotation You Cited

The context of any particular part of Perl can be subtle. I will describe a few common situations where context is key to the operation of the script.

if

As explained, if enforces scalar context.

if ($fly) { ...} 
if (@flies){ ... }
if (%like) { ... }
if (grep { $_->likes($banana) } @flies) { ... }
if (my ($fly) = grep { $_->flies } @times) { ... }

All of these are in scalar context. What about the last one though? We know that () on the left of = creates list context; but if() is scalar context.

Context happens in stages, naturally, just as expressions are evaluated sequentially. The = operator is in list context as defined by its LHS. This allows $fly to contain the first result from the grep. The cunning thing is that the = operator itself returns the value on its LHS after assignment: this is evaluated in scalar context and used as the boolean value for the if.

Consider:

if (my @times_that_fly = grep { $_->flies } @times) { ... }

You can expect @times_that_fly to be set and non-empty in the if block, and also for it to contain the result of the grep.

<>

The operation of the <> "diamond" operator is context-sensitive. That means it has different behaviour in list and scalar contexts. In scalar context, it returns one line of input. In list context, it returns all lines of input.

my $line = <>;
my @lines = <>;

As we know, for is a list-context control structure; hence it will read all lines into memory before iterating them:

for my $line (<>)

The while control construct, like if, tests a boolean value, except it does it repeatedly and stops running when it is false. Hence the expression in the parentheses is evaluated in scalar context. So it is idiomatic to see while used instead of for when iterating over input:

while (<>) { ... }
while (my $line = <>) { ... }

But I hear you cry that the line of input "0" is false! But it should be treated fairly! Well for a start it will probably be "0\n", but apart from that, while(<>) is actually magic and comes out as:

while (defined($_ = <>))

and even if you set your own variable:

while (my $line = <>) { ... }
while (defined(my $line = <>)) { ... }

but this is only true inside a while().

$array[rand @array]

We've covered this one but it's still for this list. The rand function enforces scalar context on its operand; hence, instead of the array providing a list of arguments to the function, it is itself the argument, but in scalar context, hence its length is returned. The array subscript [] then truncates the return value of rand to an integer.

=()=

Called the goatse operator by those with a sense of humour that compels them to do so, this is not really an operator at all but a trick of context. You see, not all operators return a count in scalar context.

For example, the match operator m/.../ or just /.../ merely returns true or false in scalar context. With the /g modifier it returns the next match! In what universe, then, can a mere Perl hacker get m/../g to return the number of matches?

The universe with the goatse operator of course.

First we know that () on the LHS of = creates list context. That is all that is required for m/../g to return a full list of all matches. So the match is run in list context and that's what we want. The output of our goatse operator is of course going to be a scalar since we want an integer; this enforces scalar context on the leftmost =, which has the effect of putting the return value of the rightmost = (which is the list created by ()) in scalar context, hence counting it.

my $count =()= /cats/g
my $count = ( () = /cats/g )

@array[LIST], @hash{LIST}

A subtle use of context is the one in array and hash subscripts. The context inside { } or [ ] when accessing elements is determined by the sigil you used on the identifier in the first place. Thus:

$array[SCALAR]
@array[LIST]
$hash{SCALAR}
@hash{LIST}

This can have unexpected effects; a warning is generated if you use the @ form but only provide a single value, but in some cases the context can propagate, for example if you are calling a function or something.

Context and Subroutines

A subroutine is called with the context of the place where you called it. This is often a useful tool, because a subroutine can also interrogate the calling context with the wantarray operator.

When a subroutine is called its return operator inherits the calling context. If it doesn't have a return operator, the usual rules apply about an implicit one.

The main reason this happens is that any basic function call - i.e. one with no special behaviour based on context - should act as though the return value were in the code in place of the function call. Consider:

rand @flies;

sub get_flies {
    # ... generate @flies
    return @flies;
}

rand get_flies();

It is reasonable to assume that, because the function returned an array, the behaviour of the latter rand will be the same as the former, and it is: the return of the subroutine is evaluated in scalar context because rand enforces scalar context.

Usually you want to save the return value of a function into some local variable, though; otherwise you have to run the function twice to use it twice, which is wasteful:

(get_flies())[rand get_flies()]

It is important to be aware of the context in which you are calling your function, especially if the function has different behaviour in different contexts. For example, here we have list context:

localtime(flies());

and here we have scalar context:

rand(flies());

and here we have a syntax error:

time(flies());

because time doesn't take any arguments. By default, all subroutines give list context when called, so:

sub context_sensitive {
    return wantarray ? 'STRING' : qw(TWO STRINGS);
}

sub print_things {
    print $_, "\n" for @_;
}

print_things context_sensitive;

Here the context_sensitive sub is in list context, so it returns qw(TWO STRINGS), which is indeed two strings, as noted by the print_things function, which will print each thing with a new line after it so we can see what's going on. However, this sort of thing can lead to confusion when you don't realise the function is context sensitive:

my $string = context_sensitive;
print_things $string;

The above code exhibits different behaviour from the first, because context_sensitive was called in scalar context and the result of that was given to print_things. This can be surprising, because it means that context-sensitive subroutines are not always drop-in replacements for the variable they were saved to.

The scalar keyword can be used here to enforce scalar context on the subroutine.

print_things scalar context_sensitive;

The scalar keyword can be used in the general case when you want to either override or be explicit about the context of anything.

scalar @array;
scalar $scalar;
scalar sub_call();
scalar grep {...} @array;

There is no real list-context equivalent because in general list context is a superset of scalar context, but as with the goatse operator, there are exceptions. Hence the construct () = can be used to evaluate something in list context when normally it would be in scalar context. The reason this is not common is the return value of this assignment will still be evaluated in scalar context, which in most cases is the same as simply not doing this at all.


Void Context

Sometimes you might get a warning along the lines of "useless use of constant in void context". What does this mean?

The difference between a statement and an expression is action. An expression is anything that returns a value: 5+5, sub_call(), @array ...

A statement is one that performs an action: $foo = 5+5, sub_call(), if(@array).

Not all legal lines of code in Perl are statements:

my $foo = 5+5;   # statement
5+5;  # not a statement

It is this latter line of code that will cause the warning about void context. It should be at least slightly apparent why it is in void context: there is no operator or function call enforcing context on the expression. There is no context at this point! This is called void context; neither scalar nor list context is enforced on the simple line of code 5+5;

Void context is not always incorrect. For example, we know that the = operator returns the value on the left hand side. That means that whenever you perform an assignment, the whole assignment itself is performed in void context. It all works because of the layered nature of expressions and sub-expressions. Consider:

$foo = @bar = baz()

baz() is performed in list context because of the assignment to @bar, but that assignment itself is performed in scalar context because of the assignment to $foo. But the assignment to $foo doesn't cause baz() to be run in scalar context. @bar = baz() happens first, and baz() is in list context. This whole expression returns @bar because that is the behaviour of =. That means the next thing that happens is $foo = @bar, which puts @bar in scalar context and populates $foo. This expression returns $foo, but there is no other expression in this statement. So this assignment happens in void context, which does nothing.

This is also void context:

baz();

You don't get a warning for the simple reason that it is perfectly legitimate to have a function that doesn't return anything. However, you could generate your own warning if your function is useless in void context:

sub baz {
  warn "baz() called in void context" if not defined wantarray;
}

The wantarray operator is undefined in void context.

In Summary and/or Conclusion

The context in which you are working defines the way Perl's nouns are treated. You should always be aware of what context you are in at any one time, and how to recognise context.

An assignment is in the context of the left-hand side.

Tests (while, if, ?:) are scalar context, and for loops are list context.

Array and hash subscripts are in the context of the sigil you use.

Operators may enforce context, and some operators also enforce coercion, in order to make the operation possible. The scalar operator is solely intended to enforce scalar context.

The argument list to functions is in list context. Overriding this is possible with prototypes but not recommended in the majority of cases.

With nothing enforcing context, the expression is in void context. Void context will throw a warning if the expression is a constant.

Functions inherit the calling context for their return statement. The nullary operator wantarray is used to inspect that context.

No comments:

Post a Comment