map
and grep
are useful tools in Perl. To understand them needs only the ability to follow some logic, but to use them requires an understanding of lists. You should read this post on lists to understand how lists work in Perl. The main thing you should understand from that is how lists exist in memory, and how they are used with relation to arguments to functions.map
and grep
are actually operators, but they behave so much like functions that you'll find them in perlfunc. What they do is they take a list of arguments, like functions do, and convert that list into another list. map
transforms the input list, and grep
filters the input list.Remember that lists with zero or one item in them are still lists.
Understanding
map
To understand
map
you need to be able to understand the following:y = f(x)
If you don't, let's briefly explain it. It means that if you do something (
f
) to a value (x
) you will get another value (y
). You've probably seen this before in graphs - the straight line formula is y = mx + c
; we can see that all of mx + c
is a function using x
(m
and c
are constants), meaning that if you feed an x
coordinate in, you will get a y
coordinate back. We say that y
is a function of x
.In the case of map,
x
will always be a scalar, because of how lists in Perl are lists of scalars. y
could be more than one value, this being how your output list can be longer than your input list. f
is provided to map
.map
is therefore a transformation. Its purpose is to consistently transform your input list into an output list, by providing each item of your input list to your f
, and collecting the y
s that you get out of it.map
turns this:map { f } x, x1, x2, x3, x4...
into this:
f(x), f(x1), f(x2), f(x3), f(x4)...
A simple example is mathematical: even numbers. The nth even number is
2n
—you learned this in school. If you didn't go to school then you'd better catch up.That means, for
y
to be the x
th even number, our f
is 2x
.f(x) = 2x f(1) = 2×1 = 2 f(2) = 2×2 = 4 ...
That means that, given the numbers 1 to 10 we can find the first 10 even numbers that are greater than 0. If we provide our
f
to map
, and the list 1 .. 10
, we should get that.Defining
f(x)
for map
is a simple case of using the special variable $_
in place of x
. For now, we will define our f
using {}
.print join ', ', map { $_ * 2 } 1 .. 10 2, 4, 6, 8, 10, 12, 14, 16, 18, 20
This is a good example of what we were learning about lists in the previous post. The
join
operator joins a list of values using the specified string, in this case a comma and a space. There is no requirement in Perl that this list be either an array or a hard-coded, comma-separated list: it can be anything that, in list context, returns a list1. That could be a function, or another operator like map
or grep
.This example serves to explain in simple terms how some functionality can be routinely applied to a set of values to return a new set of values. There is no requirement that
y
is different from x
after the execution of f(x)
. For example, you might wish to search a dictionary for a set of words, but the words themselves might be suffering from grammar, and hence have different letter cases. The dictionary itself would be entirely in lowercase, and hence you would want a lowercase edition of your set of words.my @lc_words = map { lc } @words
In this case, the
f()
that we're applying to all members of the array is the operator lc
, which operates on $_
by default and returns a lowercase version of its operand. If the input word happened to be already lowercase, the output will equal the input. No matter.List Return Values
If you recall, lists are lists, in Perl. Arrays and hashes are things constructed from lists, and both of these are treated as lists when used where a list is expected. You can see this principle in action in these examples: our first example used the range operator
..
to construct the list of integers between 1 and 10, and fed that to map
. The second example used the array @words
, which we assumed to exist for the example, as the input list.And then the output list was used in the first example as the argument list to
join
, and in the second example it was used as the constructor for a new array. So far we have been returning only one value, but the
f
we provide to map
can return any number values because it is evaluated in list context. By this mechanism you can turn an array into a hash, which is probably the most common use of that.my @words = qw(cat dog cow dog monkey); my %words = map { $_ => 1 } @words;
The
map
block here returns the input word, $_
, and 1. Remember that in this post we learned that =>
is semantically equivalent to ,
except with the quoting of the word to its left. Remember also that in the same post we learned that a hash is constructed from any even-sized list (or an odd-sized list, with a warning). So this block is returning two items for every one we put in, doubling the length of the list.The use of the fat comma in a construct such as this is idiomatic: since it has no actual effect on the output of the
map
block it is really only there to hearken to the usual construction of a hash, which is a list that uses =>
a lot. The image conjured is of taking each element of @words
and turning it into word => 1
, joining it all together until we have an even-sized list thus: my %words = ( cat => 1, dog => 1, cow => 1, dog => 1, monkey => 1 );
Knowing the intermediate steps to get to that may help:
my %words = map { $_ => 1 } qw(cat dog cow dog monkey); my %words = map { $_, 1 } qw(cat dog cow dog monkey); my %words = ('cat', 1, 'dog', 1, 'cow', 1, 'dog', 1, 'monkey', 1);
Of course, in a true application, you would not set
@words
in this manner in the first place. @words
will be computed, perhaps from a file passed in by name on the command line, which you have processed and turned into this array. A little knowledge about hash construction is relevant here to know the actual purpose of this
map
block. If you understand that a repeated key in the constructing list is simply overwritten by the latest occurrence of that key, you will see that this has the effect of making the input list unique. Although the output list is exactly double the length of the input list, that list is immediately used to construct a hash. The hash itself then undertakes its usual behaviour, and in this example the repetition of "dog" in the original array is overwritten by its last appearance by the construction of that hash - which is fine in this instance because all the keys are associated with the value 1
. Thus any repeated word is not counted.You can also skip over part of the list as well. Perl makes no assumptions on the size of your returned list when running
map
. You can use this to omit elements from your output list while doubling the rest.To do that, simply test
$_
and return an empty list if you don't want the element:map { test $_ ? ($_ => 1) : () } @things
You might return the empty list if the test returns true, for example if you are trying to transform elements you have not already transformed and remove those you have:
my %to_cache = map { in_cache $_ ? () : ($_ => transform $_) } @stuff;
For some test
in_cache
, which returns true if its argument is in the cache, and some function transform
, which returns a transformed version of its argument, you can thus construct a hash of transformed stuff ready for caching. Of course, you don't have to double the list; you can use this simply to filter it. But if you're not going to transform $_
then it is more sensible to use grep
When Not To Use
map
Now that you have a new toy you might be tempted to use it. It is simple nature. Well as with every tool, it is unwise to use it outside its purpose. There are four principles that you should follow when playing with
map
:The Principle of Transformation. You can use
map
to test each item and conditionally return the empty list, thus shrinking the size of the list. But unless your map
block transforms $_
in some other way (by returning an altered version, or a longer list), what you have actually got is grep
. In some cases it may be easier to read if you use both map
and grep
, using the output of grep
as the input to map
.The Principle of Immutability. You should never transform the input list. That seems contradictory, but by this I mean never alter
$_
directly; you should always return an altered copy. It is not, in all cases, possible to change $_
. Only when the input list is an array is it possible to overwrite $_ for all values you get from the list. When it is a list constructed by some other means, even a hash, the input values are often immutable, which means you can't change them so don't try. Always return a copy of $_
, with transformations applied to the copy.The Principle of Brevity. If your transformation is quite long-winded you should either a) put it in a sub or b) use a
for
loop and push onto another array. map
is a construct of brevity; this should be honoured.The Principle of Containment. If you need to do anything else while iterating over your list other than transforming
x
into y
then you should use a for
loop for that. Your map
block should have no side effects, which means when the map
has finished executing, everything should be the same as it was when it started, except now there is a new list in memory.The general principle is that you are getting back from
map
a new list, and leaving the old one alone. Don't run map
without using the result!Understanding grep
grep
is an operator that has many similarities to map
. It takes an input list, and returns another list. You provide some function f
, akin to that described for map
earlier, and it is run with each successive item from the list.The difference is simply that
grep
returns a list of equal size or shorter than the input list.The
f
that you give to grep
is a filter, not a transformation. It tests each $_
and returns trueness or falseness:f(x) ∈ (1,0)
We are not looking in this case to return a new value,
y
, from our function, but rather any value that is true or not true. Reasonably one could point out that if you are outputting a truth value you are running a transformation. You are: but the result of grep
itself is part of the input list. The result of map
is the output list. grep
can be written in terms of map
:@filtered = grep { test } @things; @filtered = map { test ? $_ : () } @things;
A common adage is "spelt grep, pronounced filter". That's a mnemonic, of course; it's pronounced grep.
A traditional use of
grep
is to find the defined items in an array:grep { defined } @array
It is not uncommon to have produced an array or list with undefined elements, especially if that list came from a
map
block that could return undef in some cases. Another common idiom is to map
and grep
at the same time:my @phone_numbers = map { $_->{phone} } grep { $_->{phone} } @people;
Here
@people
is assumed to be an array of hashrefs representing people. The grep
filters out those for whom the key 'phone' returns a true value, and the map
then returns the actual value.We can thus draw this distinction between the two:
map
collects output values, and grep
collects input values. You can see this is the case: the map
block is assuming that the input list is a list of hashrefs, and extracts strings from them. We collect the list of things that map
outputs. For that to work, it means that the output of grep
has to be hashrefs; and since the input to grep
is hashrefs but the grep
block only returns a value from that hashref, we see that the output of grep
is just a subset of the input.When Not To Use
grep
grep
doesn't have the principle of transformation because you're not supposed to transform the list.The Principle of Immutability -
grep
should not alter the input values; it should merely test them.The Principle of Containment - nothing in the
grep
block should affect anything outside of the expression. Everything should be the same after the grep
has finished running as it was beforehand, except now there is a new list in memory of (copies of) some or all of the input list.The Principle of Brevity - if you need to do a long-winded process to find out whether or not you want to keep a particular
$_
, take it somewhere else, because it doesn't belong here. Make a sub and put it in that.Similar Things
There is a simple concept here: You take a list, you apply a function to each element, and you get another list.
map
is the simplest example of this because the list you create is exactly the list that map
returns. grep
is arguably more complex because the list you create with this process (a list of true and false values) is itself used to alter the input list, so you get back a different list from the one you build.sort
sort
is another list operator. Like grep
, the list it returns comprises elements of the input list. Unlike grep
, the list it returns is the same length as the input list. Clearly, it sorts the input list, and returns a new one with the items in order.An important thing to have already realised by now is that if you are sorting you have to do two variables at once, instead of one. Perl sorts, haha, this out for you by providing you
$a
and $b
instead of $_
. Your task is to tell Perl whether $a
is greater than, less than, or equal to $b
, by returning -1, 0 or 1 respectively from the block.The operators
<=>
for numbers and cmp
for strings do this easily for you, but this generalisation of the sorting process allows for you to run $a
and $b
through any algorithm in order to sort the list.my @sorted = sort { length $a <=> length $b } @strings;
for
The
for
loop is much more than it is in other languages, especially when used as a statement modifier. That's when you put it at the end instead of putting it first and using braces. for
is the operator you should use when you want to modify the array itself, rather than create a modified copy. Note you can only fully modify arrays and only the values of hashes.$_++ for @array; # increment every value $_++ for values %hash; # same $_ .= "-old" for @filename_backups; # append "-old" to each filename
map
is equivalent to doing a for
loop over the old array, and pushing things onto a new array:my @output; push @output, $_ * 2 for @input;
Presumably
map
is more efficient, but part of Perl is that you have the option of being more expressive with your code, which is to say that you should write what you mean. If you mean to perform a map operation, use map;
otherwise when someone else reads your code (i.e. you, next month) there will be no head-scratching wondering why you used a for loop.Other functions in List::Util, List::MoreUtils and List::UtilsBy serve to run a function across all items of a list in the same general way, and return either a list or a single value.
List::Util
first
- Find the first item in a list; usually used to find any item that matches the criterion. Often therefore used to find whether any item matches the criterion.
my $caps_key = first { uc $_ eq $_ } keys %hash;
reduce
- Collapse a list into one value, by repeatedly applying the function. This is called reduction, as in "map-reduce". Note the use of $a and $b, like sort: we are trying to reduce a list into one value so we have to do two values at once, rather than one.
my $sum = reduce { $a + $b } @numbers;
List::MoreUtils
This module has many more than List::Util so I won't list them all.
any
,all
,notall
,none
- Test the whole list.any
is similar tofirst
from List::Util; the difference being that it will return a true value if your search succeeds, whereasfirst
could return a false value if you're looking for false values. These are essentiallygrep
, except they will stop looking if they find the answer -grep
will process the whole list in all cases.
if (any { defined } @things) {} # if any is defined
if (all { defined } @things) {} # if all are defined
;
if (notall { defined } @things) {} # if not all are defined
if (none { defined } @things) {} # if no thing is defined
pairwise
- This only works on arrays, but it takes one value from the first array and one value from the second, and gives them both to your function. Then it collects the outputs. It's likemap
, except it does two things at a time.
my %hash = pairwise { $a => $b } @keys, @values;
my @totals = pairwise { $a * $b } @prices, @quantities
This is a convenience module for all those cases where normally you'd do the same thing to both
$a
and $b
. I will stick to a couple of examples.sort_by
- Applies the procedure to both$a
and$b
and compares the results as strings (nsort_by
for numbers). Saves typing.
my @sorted = nsort_by { length } @strings;
my @sorted = nsort_by { $_->mother->mother->mother->age } @people;
uniq_by
- Makes the list unique based on the function.
my @unique_by_colour = uniq_by { $_->colour } @fruit;
I hope this has given you, the newcomer to Perl, a good idea of what
map
and grep
do. I also hope it has given you an insight into the general concept of applying a function to a sequence of values, and doing something with the result. It is an operation that is much more common than you'd think, even in real life.It is the basis of mail merge, for example, where you take a template and a list of names and you put each name in the template and, with the resulting list of letters, print and send them. That's
map
.It is the basis of searching, where you have a quantity of items and you're looking for all of those that match a certain criterion. You apply your criterion to each and keep those that match. That's
grep
.It is the basis of sorting things by height, or alphabetically, or by the number of times they've won the World Cup: you take the list of things, and two at a time you find out how many times they've won the World Cup and you sort based on that. That's
sort
(or sort_by
or nsort_by
).Examining or modifying a list is an awfully common operation. One thing I cannot teach you, however, is how to recognise when you should use one. Feel free to ask!
1 All right smart arse. Everything in list context is a list, even if it is only one scalar, because that's just a list with one item in it. The point is that this statement is true regardless of the length of the returned list.
No comments:
Post a Comment