snapsvg

2014-04-27

Changing OpenElec's /tmp size


OpenElec has a limited /tmp partition. Very limited, i.e. 10MiB. Many things fall over because they need more than this on the occasion - especially if it's not the only thing using the tmpfs.

In order to change this you either have to hack around with automatically-created symlinks in startup scripts, or change it yourself.

The size of the /tmp partition is stored in /etc/init.d/01_mount-filesystem

mount -n -t tmpfs -o size=10m tmpfs /var

The problem is, that file is readonly. The reason it's readonly is that the entire root filesystem is stored in a squashfs partition.

To amend it, it is simply a case of unsquashing it, fixing it, and resquashing it.

Fix it

Pull the SD card out of your RPi (I'm assuming that's where you have it) and put it into your card reader. Let your system mount it.

You should have a SYSTEM drive somewhere on your computer. Lubuntu mounts it at /media/altreus/SYSTEM, so let's go with that.

$ mkdir squash
$ cd squash
$ cp /media/altreus/SYSTEM/SYSTEM SYSTEM.bak
$ unsquashfs SYSTEM.bak

Now we have a copy of the OpenElec root filesystem in a .bak file so we can undo it when we screw it up later. We also have the files themselves unpacked into squashfs-root. This is the default place unsquashfs puts them.

$ vi squashfs-root/etc/init.d/01_mount-filesystem

Change the file to have a better size /tmp. I used 500mb because my SD card is 8GB. Ignore the first instance of tmpfs in the file; we want to change the 10mb one.

$ sudo mksquashfs ./squashfs-root SYSTEM

It's important that you do this with sudo. The file /etc/shadow has permissions 000, making it only accessible by root. This is how we got it when we unsquashed it, so this is how we want to keep it. My /etc/shadow is 600, but they presumably wanted theirs to be 000. If we want to do the above step without root, we'd have to change the permissions so our user can see it - we can't change the permissions after it's squashed, so the only way to get a 000 file into the filesystem is to squash it with root.

Anyway, done.

$ cp SYSTEM /media/altreus/SYSTEM

Your new squashfs file will be mounted by OpenElec and your tmpfs will now be mounted with the size you gave it.

I'm not 100% certain this is stable. My Pi has started rebooting occasionally; but I might be giving it more than it can handle. It is an old model, but if I've introduced a bug because 500mb is too much, or something, I'm sure I'll get to the bottom of it and update the post,

2014-02-27

Code review time!

Look! A horrible piece of code in a horrible language in a horrible frame for a sickeningly twee ceremony that should have been made obsolete along with the Inquisition!

Let's review it.

Here's the code, with line numbers.

01    <?
02      function do_wed() {
03        if ($objections != true) {
04          function do_vow() {
05            $vow = 1;
06            do {
07              if ($richer === 1
08                  && $poorer === 1
09                  && $sickness === 1
10                  && $health === 1) {
11                function have_hold($a,$b) {
12                  ini_set('session.gc_maxlifetime','forever');
13              }
14              have_hold('husband','wife');
15              define('friend', true);
16              define('partner', true);
17              define('faithful', true);
18              if ($i = 'do') {
19                   $f = 'finger';
20                   $r = 'ring;
21                   $f = $f + $r;
22                   }
23               }
24               $vow = $vow + 1;
25              } while ($vow != 2);
26            }
27            do_vow();
28            $register = array_fill($details);
29            print_r($register)
30            return $kiss;
31            }
32          }
33        do_wed();
34    ?>

Let's go!

line 1

We use long tags here. <?php

line 3

Undefined variable $objections.

$objections != true better written !$objections. But this is not what you meant; you meant count($objections) == 0, since it will be an array of them

line 4

Don't define functions inside other functions.

lines 6, 25

You know how many vows you want. Use a for loop. Better, use an array of vows and populate it with two Vow objects, which represent the conditions each person agrees to. This means you can marry more than 2 people. The do_wed() function should take the people to wed as arguments. Use func_get_args() to loop over all of them, or (...$parties) in the next version of PHP.

Useless loop anyway. do_vow() should be called twice with the person currently vowing.

"Twice" is a western concept. This code is not internationalised.

lines 7-10

Undefined variables. None of these equals 1. It is unlikely that all four of these things would equal 1 at the same time. You want to test the party's agreement to these concepts, not the value of these variables. You need Person objects.

line 11

A function in a function in a function? This function takes two parameters and uses neither. Get rid of them.

line 12

This ini parameter takes an integer. 'forever' is not an integer.

line 13

This closing brace does not line up with the function definition on line 13. It does line up with the if on line 7, which implies you've forgotten to close the function, but scrutiny shows that you've misaligned the brace.

line 14

have_hold does not take any parameters any more.

This is exclusivist. Not all marriages are between a husband and a wife. These should be parameters to do_wed().

This function is run twice, both times with the same parameters. It should swap over for the second iteration.

line 16

'partner' is presumably the person we are not currently dealing with.

line 17

'faithful' is not a boolean value and should be configured per app. It needs to be a data structure containing parameters of faithfulness, i.e. boundaries.

line 18

This is always true. Remove this condition. $i is never used, so remove the assignment too.

lines 19, 20

Useless variables. Either accept them as parameters or use the literal strings directly.

line 21

If you'd not used these useless variables you'd realise you're trying to numerically add strings. . is the concatenation operator. What is a 'fingerring'?

$f is discarded. Just omit this entire block.

line 22

What is this supposed to line up with?

line 23

This closes the if that looks like it is closed on line 13. But it does not line up with it.

line 24

Better written $vow++, but we've replaced this with an array of Vow objects containing agreement parameters, so don't do this any more.

line 25

The only reason this would be a while loop is if you're just going to keep asking until both (all) parties agree. This is not how one should enter into a marriage.

line 26

This closes do_vow() but does not line up with it.

line 27

This is what should be run n times, once per party in the agreement.

line 28

array_fill takes three parameters. Register should be an object.

line 29

Syntax error - missing semicolon.

print_r is not the best thing to use here. Serialise this properly, perhaps with JSON so it can be consumed by an API or HTML so it can be styled and displayed properly.

line 30

Undefined variable $kiss. Kiss is a verb and should be a function.

lines 31, 32

These braces should line up with what they close.

line 33

Don't run a function when it is defined - that's not how you create a library.

This function could at least be parameterised with the names of the people being married. Isn't Etsy about crafts and hence personalisation?

2014-02-06

Model student

Models! Model trains, model students, model aeroplanes, model citizens. Fashion model, data model, business model. Ford Model T. Model number.

All these different uses of the word model have a commonality, the understanding of which is important to the understanding of what it is we mean when we talk about models in computing. This commonality may be considered the abstract meaning of "model": the meaning that exists behind all the real-world uses of it.

This concept is that of representation. Physical models are scaled-down representations of the things they model. A fashion model is really the representation of real people who would wear clothes (showing quite how divorced from reality fashion really is). A business model is a wordy representation of how the business will operate. Even the term "Ford Model T" is actually referring to the blueprint of all cars of that type: "Model" is referring to the type, not the car itself.

In computing, then, a model is a representation, a blueprint, a prototype that encapsulates the important details about the thing it is modelling. A good model will be a minimal but sufficient representation of the system it is modelling.

An easy example is the rolling of dice.

1d6

Dice are a familiar system to everyone, I hope. They neatly encapsulate our idea of randomness, at least that one we're taught in primary school, whereby the outcome of the system is not predictable from the input.

When we roll a d6 we expect to see one of its six faces pointing upwards but we don't know which one until it does so. Indeed on most dice we see the number represented as a pattern of dots; the number of dots being the number it shows.

This, if you're not used to thinking in these terms, is very specific. There are many extra features of a d6 that have nothing to do with the randomness of the d6. Every feature of the die except its shape (and mass distribution) can be altered and it would still exhibit the same properties of randomness.

Modelling systems, therefore, requires a keen eye about what are the underlying mechanics that allow the system to work, and what are the superficial parts of it that happen to be the case in this particular instance.

At its barest, a d6 is a system that, when run, produces a random integer from 1 to 6. The random distribution is even across all numbers: which is to say, the more times it is rolled, the more we expect to see the counts for each result become equal.

To model a d6, therefore, we simply need a system that can produce the same result.

Math.ceil(Math.random() * 6)

This piece of Javascript models a 6-sided die. Run it in your browser's console if you don't believe me. Run it lots. Here's what happened when I ran it 50 times1:

[2, 2, 6, 3, 5, 4, 3, 3, 2, 4, 
 1, 5, 3, 4, 6, 1, 6, 6, 4, 5,
 3, 1, 6, 5, 2, 4, 6, 6, 6, 5,
 3, 6, 1, 2, 3, 2, 3, 3, 1, 5,
 2, 5, 3, 2, 4, 3, 5, 6, 6, 5]

And sorted:

[1, 1, 1, 1, 1, 2, 2, 2, 2, 2,
 2, 2, 2, 3, 3, 3, 3, 3, 3, 3,
 3, 3, 3, 3, 4, 4, 4, 4, 4, 4,
 5, 5, 5, 5, 5, 5, 5, 5, 5, 6,
 6, 6, 6, 6, 6, 6, 6, 6, 6, 6]

At this level, Javascript's RNG2 should be roughly uniform in distribution, and with true randomness we should not expect uniform results at such small quantities. This distribution certainly seems random and within parameters for uniform distribution, so we've simplified the concept of a d6 into a minimal and sufficient algorithm.

dn

Not all modelling is about functionality. Much of data modelling is about just that: data!

A model like a d6 is fundamentally fairly useless. Indeed the idea of a d6 is just a very tight constraint on a very useful concept - randomness. It serves little purpose to model a d6 specifically, because the number of uses for a d6 is, in the grand scheme of things, small.

In the real world, we use models in computing for two basic purposes: retrieval and prediction. The first one is used to store representations of things that exist, such as people or products. Those are data models. We store these data models to let people log into a system, or to display a list of the products to customers. The second is used to try to work out what would happen in certain situations, based on the understanding that we have about the system in the first place - such as weather. These are functional models, of which the d6 above is one example.

In both situations the model is useless without the things being modelled having data. Properties of the objects store information about the objects and supply parameters to the algorithms we've devised.

We have hit upon the idea of parameterising algorithms. As noted, the d6 algorithm is somewhat useless because all it does is model a d6, which is of limited utility.

We can increase the utility by modelling the algorithm of any die. This is the second thing to be aware of when learning to abstract away the fundamentals from the real-world example. Earlier, we learned that we can turn a gazillion atoms' worth of die into a few electrons' worth of RNG by simply taking a number between 1 and 6 - this is the fundamental behaviour of a d6.

Now, we can look at other real-world dice and see how their behaviour relates to the d6:

  • A d4 picks a number between 1 and 4
  • A d6 picks a number between 1 and 6
  • A d12 picks a number between 1 and 12
  • A d20 picks a number between 1 and 20
  • A d100 picks a number between 1 and 100

It doesn't take a complex neural network to see the pattern here. A dn picks a random number between 1 and n.

If we wanted to model a d4 we could amend our d6 model:

Math.ceil(Math.random() * 4)

And we're done. Well done! You've invented job security. Now we've got two models for two different scenarios, and we know how to repeat the process for any die we like.

You should at least by now have the feeling I'm leading you to a point; and if you haven't guessed it yet I'll make the point.

We haven't modelled the pattern.

You can model dice until you're blue in the face but a good model captures the fundamental principles. The d6 model captured the fundamental principles of a d6, but we want a model that captures the fundamental principles of all dice. We need to model the abstract; the pattern that we spotted when we listed our dice.

Abstraction

"Abstract" is another one of those words that no one understands until they're faced with it, and then it confuses them until they understand it, and then they realise why it's been used all along. Most people know abstract as a form of art, and therefore associate it with meaningless shapes and random colours or something.

The abstract of something is those features about the thing that remain behind when you take the actual thing away. The abstracts are those conceptual things that mean you can describe it without actually having one; but which, if you had never seen one, would mean you may recreate a different thing.

This is what we did with the d6. We took the abstract concept of a d6, which is to randomly generate a number between 1 and 6, and then we recreated it in an algorithm that looks nothing like a die. It's a string of characters on a screen, now. It doesn't even roll. Or bounce.

Abstracting across many things is an art form in itself. For a start, the things have to be related, or else there's no real abstraction to make. Secondly, the degree to which things are actually related to one another can vary wildly, so knowing what level of abstraction to make is also a challenge. Thirdly, abstractions themselves may be similar; in which case you can start relating things that look the same in the abstract but are entirely unrelated in real life.

Now that I've thoroughly lost you, let me bring you back to earth. When we laid out all the dice we know and examined how they work we saw a pattern, which is that a die with n sides is an RNG between 1 and n. A pattern is something we can model; we model it with parameterisation.

Parameterisation is when you take a series of concrete examples and you remove one of the things from it and replace it with a variable; in this case, we replaced all the numbers with n3. The multiple types of die have been reduced to a single type, whose number of faces is now variable.

The number of faces the die has is now a property of the die. We have a model with data!

How do we represent it? Well in Javascript terms, parameters are given to functions, and objects have properties. We can divide the model into the two parts, functionality and data, by using a function to represent rolling a die and an object to represent an actual die.

function rollDie(die) {
    return Math.ceil(Math.random() * die.sides);
}

var d6 = { sides: 6 };
var d12 = { sides: 12 };

Here we have one function that will roll a die and return the result. Then we have two dice, each of which is a simple object with the property sides. Inside the rollDie function we use the sides property of something called die, which we can see is mentioned in the parentheses in the function definition. This together means that whatever is given to rollDie is assumed to be a model of a die, and to have a property sides that represents the number of sides it has.

rollDie(d6);
rollDie(d12);

If we provide a die model as a parameter to the rolling function, the rolling function can inspect the property of the model, extract the data, and use the data in the original algorithm. The algorithm has not, fundamentally, changed. It is simply the case that now it is parameterised; which is to say that instead of duplicating the function for every possible invocation, we can create data models that represent the thing we are dealing with, and provide the data to the function. We have abstracted the pattern (1dn returns a number between 1 and n) by making the variable, n, well—variable!

Verbs and nouns

The world is made of verbs and nouns. Systems verb nouns. People roll dice. People buy products. Computers authenticate passwords. Ecommerce systems suggest related products. Search engines search documents. URLs refer to resources.

Our data models therefore comprise verbs and nouns. Our d6 model was a verb4, but the noun was hard-coded. Hard-coding is the failure to parameterise. Instead of accepting a parameter, the noun - d6 - was assumed by the verb, because the verb was the whole of "roll a d6".

Our later model had a verb, rollDie, which could roll any noun that looked like a die. It had two dice, d6 and d12, which represented 6- and 12-sided dice, respectively. But the rollDie verb did not rely on those dice. The verb was abstracted from the nouns because with the new verb, anyone can create a die of any size and roll it:

var d27 = { sides: 27 };
rollDie(d27);

... so long as they have access to the verb part - the functionality - of our model.

By parameterisation we can turn a verb into a verb and a noun - "roll a d6" turns into "roll" and "a d6". By doing the opposite, we can turn a separate verb and noun into a single verb. Good modelling comes from learning when it is right to include the noun in the verb, and when the noun is a parameter. In some cases, the noun is fetched from somewhere else - a different verb (to fetch) and a different part of the model, with its own nouns.

In the real world, computer modelling is much more involved than this. Data are often linked to other data, such that if one changes another must reflect it. A shopping basket, for example: if you add an item to the basket, the total must increase. If you change the quantity of an item, the subtotal for that item must increase, and so must the basket total.

In that example, we already introduced nouns and verbs that we can model. Basket; item; total; subtotal; quantity. Some of these are things, and some of them are properties. Some are both! Items are real things, but the list of items is a property of the basket. The total is a property of the basket, and the subtotal is a property of the item when in context of a basket and having a quantity!

Sometimes we replace nouns with verbs: instead of storing the total, we may choose to calculate the total on demand based on the items.

Sometimes we replace verbs with nouns: when you roll a die, its value remains the same until you roll it again, but you should be able to ask it what value it shows. Our model could not do this. Alas! Our simple and sufficient model is no longer sufficient.

Sometimes we separate a verb into a verb and a noun: we turn rolling a d6 into rolling, and create a d6 to roll. This allows us to either roll a different die, or do something different to the die.

Sometimes we combine a verb and noun into a single verb: when we get the total of a basket, we don't separate it into "get" and "total"; if you change the noun here, the verb makes no sense!

Even a simple example like a die can escalate, and it is easy to get overwhelmed by the interactions—imagine the complexity of a "simple but sufficient" model of an entire shop!—but ultimately we are modelling nouns and verbs; all we have to do is parameterise correctly and find the correct abstractions.

Modelling systems

Hopefully you will have, by means of a concrete example and a lot of nebulous ideas, some concept of what it is to model things in computer systems. Ultimately, you will need some way of defining functions - a programming language - and some way of storing data - maybe a database.

Modelling a system therefore involves a good eye for what is a verb and what is a noun. That is to say, if you want to "roll a d6", does this suffice as a verb? Or is "d6" a noun? What if you want to "calculate the total"?

There is no cheat sheet here. Experience is your best recourse. But perhaps we can jot down some things to consider when modelling a system.

  • How big is the system? The d6 system was small, but the shop system was large. Can it be smaller systems?
  • How big are the nouns? A d6 has 6 faces, but the number 6 is enough to model that. Meanwhile, a basket has many items, but more information is needed; items are separate things, but faces are not.
  • Can you de-noun your verb? Does the verb make sense on other things? Does it actually? You can roll anything with sides; but can you get something other than a total from a basket? Can you get a total from something other than a basket?
  • Can you combine a verb and noun? Have you gone too far parameterising? If your shop has only one basket, the basket is not a parameter: the verbs can assume it.
  • Can your verb fetch a parameter, instead of accepting or assuming it? When you roll a die, perhaps you can establish elsewhere which die you are rolling. Perhaps the items on a basket know they are items; and there is only one basket, so you can get the items when you need them.

That's all for now on models. In future posts we will take a look at how data get around inside these systems, how we store them, and the transient nature of data while the system is actually running.

1 var a = [], i = 0; for (i = 0; i < 50; i++) { a.push(Math.ceil(Math.random() * 6)); } a;

2 Random number generator

3 Replacing all the ds with m may be a tempting thing to do here, but we shouldn't. That's because d has been constant across all of our examples; it simply serves to refer to the thing we are modelling in the first place. n is the new variable, because the thing it has replaced varies. d, being constant, is the thing our model is taking away entirely! It serves no purpose to know that we are rolling dice, any more; the d is therefore simply our reminder about what we are aiming for.

4 Commonly one would not copy-paste an algorithm into a console and run it. Instead, the algorithm would be packaged in a function and the user would be told to run the function. We did this later, when we parameterised, but to simplify and save on explanations, we avoided using a function in the first examples.

2014-01-23

Declaring your intent

In Perl it is necessary to declare a variable with my (or our) before using it. This behaviour is enabled with the strict pragma; and recently it has become the default.

Why?

Today's theme explores the idea that, when writing code, there is meaning in every statement. A good portion of code will comprise statements that actually implement the logic that causes the program to do what it does; but often overlooked are the statements such as these my and our declarations, which explain your intention for the variable before it's ever even used.

We'll look at some of the simpler reasons behind it, and later on we shall look at the less apparent ones.

Requesting

In these cases the intention you are declaring is simple: "I want to use this symbol."

The humble typo is the most obvious reason espoused for requesting new variables: it stops you using something else. But in Perl this actually covers at least three separate types of typo, all of which are solved by declaring things before you use them.

Misspelling it later

Misspelling the variable later on is the most common failure.

my $hard_to_spell_name;
$hard_tp_spell_name = 'cats';
Global symbol "$hard_tp_spell_name" requires explicit package name at script.pl line 3.
Execution of script.pl aborted due to compilation errors.

Saying you want to use symbol A and then using symbol B is an error it is trivial to pick up on.

Misspelling it now

This is less common because you usually spell the variable name right when you create it because you've just spent ages trying to come up with the name in the first place. It's the same declaration, except you meant B and B, rather than A and A.

my $hard_tp_spell_name;
$hard_to_spell_name = 'cats';
Global symbol "$hard_to_spell_name" requires explicit package name at script.pl line 3.
Execution of script.pl aborted due to compilation errors.

Forgetting

This requires a module, but declaring your intent allows the warnings pragma to tell you when you didn't use a variable you asked for.

Install warnings::unused from CPAN in the usual way.

use warnings::unused;
use strict;
use warnings;

my $foo;
my $bar = 'cats';

say $bar;
Unused variable my $foo at script.pl line 5.

Typing

By this I mean the type of the variable, not the typing you're doing when you make a typo.

In this case, you've declared an array and then accidentally used a scalar, or forgotten it's not an arrayref, or something along those lines. This is also the sort of protection you get from languages with a more C-style typing system, where you have to declare a variable by defining its symbol name and its type (int i;). Basically even though you spelled the symbol name right, you're using it wrongly.

my @array_of_cats;
push @$array_of_cats, 'cat';
Global symbol "$array_of_cats" requires explicit package name at script.pl line 3.
Execution of script.pl aborted due to compilation errors.

"You're using it wrongly" is a perfectly reasonable statement here. That's because you declared what "right" is: "wrongly" is directly determined by your own my statement.

Overwriting

Reuse

If you are required to declare your variables the first time you use them then you will always do so. This means that the keyword my is not only used to declare that a variable is supposed to be available, but also to declare that the variable is supposed to be new.

Hence, if you try to introduce a variable that already exists, it tells you off, and thus you avoid clobbering an existing variable.

This behaviour is actually only a warning, so comes from use warnings; rather than use strict;. However, it is still a result of declaring your intent.

use strict;
use warnings;
my $cats = 'cat';
my $cats = 'horse';
"my" variable $cats masks earlier declaration in same scope at script.pl line 4.

Clobbering

It is easy to forget that the use of my and our produce lexical variables. These are variables that are only visible within the block in which they are defined (treating a file as a block for this definition).

With my you simply cannot clobber this variable from anywhere else. It is either a compiler error, or a different variable.

# This sub is useless and does nothing
sub one {
  my @cats;
  push @cats, @_;
  return @cats;
}

# This sub can't see @cats from the other sub!
sub two {
  push @cats, @_; # line 10
  return @cats;
}
Global symbol "@cats" requires explicit package name at script.pl line 10.
Execution of script.pl aborted due to compilation errors.

Or:

# This compiles, but is a new, separate array of cats.
# It is fractionally more useful than sub one.
sub two {
  my @cats = ('default_cat');
  push @cats, @_; # line 11
  return @cats;
}

A bonus of my is that when the block has executed, the variable is tidied up. That is, it falls out of scope. This also works in loop bodies, allowing you to trash and recreate data in every iteration by putting a my line inside the loop.

package Cat {

  my @cats;

  # Both of these use the same @cats - the one above!
  sub one {
    push @cats, @_;
    return @cats;
  }

  sub two {
    @cats = ('default_cat'); # whups, overwrote the whole set!
    push @cats, @_;
    return @cats;
  }
}

@Cat::cats = ('cat_one', 'cat_two'); 

Here, @cats is available to be clobbered anywhere in the Cat package1. However, because it is lexical, it is only available within that block2. Line 18 appears to be altering the same variable (@cats within the package Cat), but in fact this is creating a new package variable in Cat3.

The intent of using my to declare @cats therefore is to have a variable available throughout the package, but not to be available without the package.

There is a subtler declaration of intent. The position of this my statement declares that this variable is intended to be used throughout the entire package; therefore it should be applicable to the majority of the behaviour in the package. Were this not the intention, the my statement could be put in a block that encapsulates the variable and any places it is supposed to be used.

our is a similar beast, but it adds the ability for outsiders to also alter the variable, so long as they do so explicitly. The following code differs only in the use of our:

package Cat {

  our @cats;

  sub one {
    push @cats, @_;
    return @cats;
  }

  sub two {
    @cats = ('default_cat');
    push @cats, @_;
    return @cats;
  }
}

@Cat::cats = ('cat_one', 'cat_two'); 

Now, the variable @cats inside the package's block can also be accessed as @Cat::cats from outside of it. This is the intent you declare when using our.

1 Normally, the package would be defined in its own file, but this format is common for single-use packages, especially in tests.

2 When the package is defined in its own file, the file itself is the scope for such variables.

3 The reader should be aware that this is the reasoning behind the message Global symbol "$foo" requires explicit package name when strictures tells you off for an undeclared variable. Any variable name can be used, so long as it explicitly declares a package name like in this example. The difference between a lexical variable and a package variable is not in scope of this blog post.

2013-10-18

Fixing PHP

PHP is not a bad language.

Come back. Let me rephrase that.

PHP is a terrible implementation of what under the surface is a perfectly adequate, dynamic scripting language. Unfortunately it is implemented as a poorly-thought-out, logically bereft templating language, peppered with pitfalls and irritating inconsistencies.

But it can be fixed. It can be fixed with some simple, non-backwardly-compatible, sensible, welcome-to-the-real-world, feasible alterations. Let us begin.

1. Get rid of <?php ?>

The fact that PHP used to be a templating language is archaeologically apparent in this vestigial remnant from a bygone era. These tags are still all over the place because PHP is trying to be two things at once: both a templating language and a scripting language.

Once you grow up (or metastasise) and become a real language, you have to put away childish things.

These break-in-break-out tags were fine when PHP was designed to be parsed by a Perl script and run as a simple if-this, for-each-that dynamic HTML page generator. They remain fine, if you want to use PHP as the templating language it is. But if PHP wants to be taken seriously, the first thing it needs to do is stop hanging on to that I-can-do-templates-me attitude, and hand over to one of the many modern alternatives that have come along since the Internet was still finding its feet.

In fact there's no real reason PHP should not remain a templating language. After all, Mason (and indeed Template Toolkit) allow you to inject actual Perl into your web templates for those times when you simply can't be arsed to abstract your logic to where it's supposed to go. However, if PHP is going to behave like this, it needs to understand there is a difference between a PHP template and a PHP script.

Therefore I propose

1a. Create .php and .phpt file types

Or suchlike. .php files would naturally be PHP scripts and do away with that ridiculous <?php header that persists throughout PHP projects like a blight. .phpt or suchlike would be recognised as text files containing PHP segments, and they can use the old break-in-break-out paradigm to inject program logic into the template.

Of course it is not recommended in Mason or TT2 that you use actual Perl in your actual templates, because then the temptation is just to merge your views with your controller logic, and then you get into a Right Mess. Better would be simply to have a PHP port of TT2 or Mason, or use Twig or Smarty, and allow those to have their own this-bit-is-PHP-and-I'm-sorry directives.

1b. Make it a decent templating language too

It's a bit of an issue that PHP is stupid, as well. Modern templating languages offer myriad text processing options as part of the language itself. An example is the way Template::Toolkit allows you to filter output text through, e.g., the HTML filter, sanitising the data just before it's output.

PHP's best answer to this so far is user-written PHP classes that render PHP templates (two entirely different things written in the same language) by sanitising the data assigned to them at some time or other just before the template file itself is actually rendered.

That's just one example. PHP is not really a templating language any more either, because templating languages have evolved past the very basic output-string behaviour that PHP was originally tasked with. PHPT would need to catch up as well, and separate itself from PHP proper.

2. Stop pretending everything is an HTTP request

That PHP never left its template roots shows when you try to write command-line interfaces into your business software. You realise that you've been assuming throughout the code that the $_SERVER variable actually contains a URI of some description; that there's a protocol; that you're outputting HTML.

As soon as the first file that started with <?php and didn't contain a ?> was created, PHP was broken. As soon as you create a file that contains utility functions, or classes, you have a file that you can run without a webserver . As soon as you have that , you have a scripting language. That was the point at which people should have stood back, taken a look, and dived in to PHP 4 or whatever with the attitude that this time we're going to do it right.

No one did.

PHP still outputs HTML whenever it feels like it - see var_dump . It still has global, HTTP-centred variables. It doesn't do exit codes properly. The fact that exit and die are the same damn thing just shows that someone somewhere has completely misunderstood the point of these things. Heck I don't even know whether error messages actually go on stderr.

At about the time PHP was swapping its soft teething toy for its first big-boy spoon, the rest of the world was discovering that if you interface your HTTP server with your scripting language via stdout, you can maintain a separation of interests wherein your entire business logic is a collection of useful modules or classes or whatever, which when used in a web environment can be wrapped in an HTML layer and called a website - the layer being swappable for a CLI one that outputs the same information in a salient format. Or a JSON one, for public APIs, or even private, socket-based APIs that don't touch either HTTP or even TCP!

Nope. In PHP's land of unicorns and rainbows the whole world is an HTTP request. The world springs into existence when the request begins and disappears when the response is sent, and if anything happens to be left around since the last universe's brief lifespan came and went then that's just something we have to deal with as part of our new one. Trying to leverage command-line support, or non-HTTP support, into this assembly of spit and chewing gum is baby's first steak knife to PHP.

3. Use your own exception mechanism

Nothing is as irritating while working with PHP as when it throws its toys out of the pram. Now, I'm quite happy to accept that a parsing error is completely unrecoverable, but that is it, and absolutely it. Anything and everything that happens at runtime should be tryable, and anything that ever goes wrong should be catchable.

This expected feature of the language should not be taken as a comment on the sense in doing so. Trying to call $app->run() and catching it when it fails is going to be a bit less useful than letting it fail and tell you what was wrong.

But being able to catch it - now that's a tool we need. Since the original error mechanism was put in place a new, superior nonlocal return is available, and one which puts control in the hands of the user (without horrible set_error_handler hacks). Might as well use it.

4. Tidy up the root namespace

We get it. You like functions. Well, take stock and look around you. Not only have you implemented exceptions and then completely failed to use them, you've also implemented classes, interfaces, namespaces, closures and traits and failed to use those as well!

Right. For a start, having all those functions is confusing because there's no consistency in them. I'm not going to rewrite the entirety of A Fractal Of Bad Design , but I'm going to borrow from it here. Some of the functions have underscores, some don't ( strpos / str_rot13 ). Some take arguments one way, some the other ( array_filter($input, $callback) / array_map($callback, $input) ). Every time we use a built-in function we have to look up how it's spelled and what order the arguments are in and there are so. Damn. Many.

Secondly, certainly PHP has to lookup every called symbol in both the user's own symbol tables as well as the language's. That sort of thing is surely expensive, especially if this language is aimed at beginner programmers who are only ever going to use 10% of the functions 90% of the time.

Thirdly, every single built-in function or class is just another name that the user can neither use for their own functions nor override to replace. Sure, PHP has modules that you can jump through hoops to install at the C level, but who needs that?

All of this might be forgivable if this overabundance of global functions covered literally every possible operation a user could conceivably want; but it doesn't! Worse still, a majority of them can trivially be abstracted into one generic function that takes a callable. All the array_* functions, for example: the sort functions are all just user sort with different sort procedures passed in. The filter functions are all the same with different identity functions passed in - and, for a specific example, recently I needed a version of array_search that took a custom identity function! How dare I want the key of a value that has a sub-value that matches my input! PHP says I may not do that and therefore I may not do that.

Ridiculous. The fact the PHP team haven't abstracted this stuff sensibly does not speak in favour of their ability to write the code behind PHP in the first place, does it? It doesn't take a genius to tidy all this up, and yet no one has - nor has anyone written the tidied version alongside. That attitude of constant implacability hurts the language and the community and the reputation of the people behind it, and damages confidence.

Hypothetical inefficiency aside it's just poor maintenance. The language has a mechanism by which to automatically find class files when a non-existent class is requested. So, put all the less-common functions in autoloaded classes and put those classes somewhere discoverable. Everyone else is modular these days. Is it stubbornness or incompetence that's leaving PHP behind?

Also, quit adding useless prefixes or suffixes to your functions. I know you're going to push onto an array because you push onto arrays. So call it push , not array_push .

Also also, don't fob us off with mb_ crap. Fix your Unicode. There's no excuse whatsoever for a language prevalent in the 21st century to be coded by people who can't cope with Unicode, or its various representations. I know, it's hard. Writing a language is hard. If you can't, don't.

5. Expressions, for the love of god

PHP's compiler is apparently written by chimps. Do we still really believe that there is a difference between a statement and an expression? Do we really still have to have "language constructs" (PHP's term) that are parsed and treated differently from any other expression?

No. Maybe back in the stone age we did things that way but here in the age of enlightenment we have come to realise that the only real difference between a statement and an expression is that a statement actually has a persistent effect.

In PHP, for example, the x or y construct has become possible. Except when y is not an expression - which is 90% of the bloody language. return is not an expression. continue is not an expression. die is not an expression, but it is special-cased to work with or , and has been since before we even had the x or y construct in the first place. Because Perl did it. exit is not an expression and does not have the same special-casing in the language that die does, even though it is the exact same thing .

Another example. Normally, () is used to group things, i.e. to override precedence. I'm quite OK with the way it's required for function calls, conditions etc. In PHP, however, these seem to form a magical, ref-breaking construct that is parsed under its own rules. That is to say, in PHP, $a is not guaranteed to be the same as ($a) . That's because PHP is a language whose every feature is a special case in the parser. If $a is a ref, ($a) is not any more.

So what's the point of all these examples? Well hopefully they all bring up the obvious question: why? Why are these things different? For a given X, why does the way you use X have to be allowed by the compiler?

A language built out of expressions is obvious - expressions are what make the operands to operators. And an operator is itself another, larger expression. Suddenly the parsing should seem trivial; you look at a line of code, decide which operators and expressions it contains and run them in a well-defined order. You can see it in the language that when you use an expression it behaves exactly like you'd expect any other expression to behave. At least, it compiles like that - runtime behaviour may be bizarre.

It's trivial to draw up a simple table of PHP's main features in terms of expressions; in all of this the reader is invited to consider in what situations these do not work in PHP's current implementation, and what it means about the compiler for that to be the case. In the table, X and Y mean any expression, i.e. literally anything that compiles.

Construct Meaning Examples Notes
${X} The value referred to by X ${$foo} # $$foo

${f()}

$a = &$b; ${$a}
When X returns a string, look up that variable. Otherwise, treat it as a reference. When X is another variable, the {} can be omitted.
X [Y] Return the element Y from the array X $array['foo']

f()['foo']

x()[y()]

['a', 'b', 'c'][0]
This implements the "feature" that is "special" in PHP 5.5 of array literal dereferencing (example 3)
X() Run the closure X f()()

$x()

['a' => function() {}, ...][$x]($y)
Actual functions like f() are separate, since f is not a valid expression.
X or Y If X is false, run Y $type = $config['type'] or continue;
X and X If X is true, run Y $val = $config['x'] and return $val;

The reader should take away from this at least the awareness that all of the examples in this table would already work if PHP used a proper expression-based grammar; but instead we have been sold these things piecemeal over the past few versions as new features important enough to go on the front page of the release notes.

6. Complete the complement of magic methods

__toString is a pretty good method. It uses an established consistent convention that double-underscore means special-to-PHP. It uses dynamic dispatch so that if it exists it's used, and if it doesn't there's no "default" behaviour - it just complains.

There are also __isset , __set , __get etc. These do what you'd expect: test for setness, default setter, default getter...

Where's __toInt ? __toFloat ? __toArray ? Why is __toString represented and not the others? Furthermore, if you can use a string as an integer and only complain after this conversion, why don't you use __toString first and then try to turn the result into an integer?

Consistency is paramount in a structured, logical world such as programming. Expectations being formed and then violated is the worst of things. It's the Principle of Least Astonishment . Use it.

7. Stop pretending you have types. Or: Have proper types.

What in god's name is this? (int) $val

"Casting," I hear you cry. "It is casting the type of $val to int !"

"Rollocks," I reply in a PG way. For casting is the act of converting a type through known mechanisms to another type. But we don't have __toInt to convert all possible $val s to int , and we don't have mechanisms to convert all possible types in place of int in the first place.

Nope, it is another special case in the PHP compiler, where someone saw another language doing something and implemented the same syntax but completely failed to understand what it was doing, and implement the theory rather than the practice .

What about this? function foo(array $arg)

"Type hinting!" comes the call from the thousands-strong crowd. But if I ask them to explain this mechanism they roll out the usual approximately-right answers they read in the documentation but cannot explain the concept.

PHP is a dynamic language; that's one of its strengths. Dynamic means that PHP exhibits certain runtime features that static languages require at compile time. For the purposes of this section the dynamic features we are interested in are:

  • Runtime method lookup. If an object can perform a method, the method will be performed. If not, a runtime exception is thrown. Inheritance introduces methods from other classes into the object's symbol table, assisting DRY, but otherwise there is no reason every method could not simply be dynamically dispatched to a function somewhere using magic.
  • Automatic type conversion. If an operation requires a string and an integer is provided, or an integer and a string is provided, or a string and an object is provided, PHP will transparently perform the conversion at runtime and only complain if it didn't work.

Now apply your theories about type hinting to this. What can it do but cripple PHP's dynamicity? Duck typing is the principle by which, if you have dynamic method lookup, an object only has to be able to perform a task in order to be considered suitable for the task. That is, until runtime, until you actually try to run the method on the object, there is no way to know that the object cannot do it. If there were you would have sacrificed dynamic method lookup for static compilation already. Type hinting for classes is completely non-semantic if you have the option of duck typing, because there is literally nothing special about your particular class that makes it important that an object is of this type.

How about non-object type hinting? Well you can't actually do that, because int and string aren't types to hint about - probably because any scalar can be used as a string! And any string can be used as an integer! So why enforce the check? Or, from the other perspective, why aren't they types? I can cast to them; why can't I require them?

And why can I require classes but not cast to them?

If we look at the whole type system of PHP as a looser concept than PHP makes it, it makes a lot more sense.

Classes are not some promissory aspect of a piece of data that ensure the datum can perform tasks, but an organisational structure allowing you to introduce functionality from other classes into new ones by inheritance or merging traits. From this perspective, duck typing makes sense - you don't need a specific class to ensure an object can perform tasks; any class can theoretically do it, especially if it consumes a trait that provides it. Type hinting for classes, from this perspective, is logically inconsistent with traits - which are considerably more useful - because you can't test for what a class can do , which is the only thing that's important.

Similarly, basic types are not remotely based on reality either: even if you could ask for a string or an integer, assuming we get the rest of the family of magic methods, any object could have __toString or __toInt . And even if we don't get __toInt , a string can be an int . So if you ask for an int, you could give a string, and you won't know the data the string contains are bad until you try to use it as an int. And you should be able to give an object to a parameter that wants an int simply by casting it to a string and then an int - something PHP should be doing for us already.

Hopefully the reader has spotted the inconsistency between type hinting and a dynamic language: the language cares about what the datum can be , but the type hinting cares about what the datum is . There is absolutely no logical association between what the datum is and what it can do , because Dyamic Point 1 allows for any object - independently of class , thanks to traits and __call - to be able to perform any task; and Dynamic Point 2 allows any type - thanks to __toString and the proposed __toInt and __toArray - to be any other type.

If you're going to have type hinting, therefore, you have to have statically compiled types: you have to enforce the relationship between type and behaviour; otherwise, your type hints are just extra bytes in a file that are going to appear in a commit log at some point in the future deleted by some frustrated developer trying to implement a trait and use it in a method that doesn't expect it.

That's all

I'm sure I could find many more examples of things PHP can fix at a basic level and stop being so irritating about simple things. You'll note I didn't complain about the tiresome conflation of array and dictionary, despite it being the biggest misunderstanding in programming history.

But surely this is a start? We can keep most of the PHP grammar; the syntax doesn't change (much); and so many of the pitfalls and gotchas that a programmer falls into will be resolved in one fell swoop!

As with many things PHP has reached sufficient mass that nothing important will ever change, because the politics of the mailing lists drag everything down, with half-right people expressing their ill-informed opinions on stuff that really, actually matters.

And there's the rub; the alternative is to start again. Start a new, similar language, on the right foot. A language that doesn't have those tags; a language that interfaces with the standard streams properly; a language detached from the web server, that doesn't assume a web environment; a standalone, dynamic, modular language, easy to learn, easy to stick together, easy to run on any decent OS and the not-decent one.

But why? We already have Perl and Ruby and Python. The amount of changes required to PHP means that literally the only reason to improve it at all is that it's associated with the name PHP. Installing it, upgrading it; these things would take an identical amount of effort as simply using an alternative. It wouldn't be sufficiently backwardly compatible that existing PHP code would run, because all the crap you have to do in existing PHP code wouldn't be possible or necessary.

It can still be done, though. But it won't.

2013-07-11

Introducing Pod::Cats

You may notice the title of the blog has changed to Pod::Cats

Pod::Cats is a module I wrote for the original incarnation of the blog at podcats.in (no longer a thing).

The module extends POD conceptually, allowing for arbitrary C<elements> and =commands , and adding new +begin and -end commands.

Check out the docs , and the github repository if you want to help out.

2013-05-31

Pain-Based Learning intitiative

Google will wrongly determine PBL to mean Problem-Based Learning.

I wish to promulgate the observation that pain is a much more effective teaching mechanism than mere adversity.

Pain-based learning has proven to be a great deterrent for git push -f and logging of stupid bugs, and is widely encouraged for any organisation finding itself with rogue behaviour.