Pod::Cats

Bootstrapping Perl

2017-02-10T13:16:00.001+00:00

This blog post shows a simple, hands-off, automated way to get yourself a Perl environment in user land. If you already know enough about all of this to do it the hard way, and you prefer that, then this post is not aimed at you.

Here's what we are going to achieve:

Set up a Perl 5.24 installation
Set up your environment so you can install modules
Set up your project so you can install its dependencies

These are the things people seem to struggle with a lot, and the instructions are piecemeal all over the internet. Here they are all standing in a row.

Perlbrew your Perl 5.24

As this blog post becomes older, that number will get bigger, so make sure to alter it if you copy this from the future.

Do this as root:

Debian

apt-get install perlbrew

FreeBSD

fetch -o- https://install.perlbrew.pl | sh

Whatever else

curl -L https://install.perlbrew.pl | bash

Windows

Haha, yeah, right.

Once you've installed perlbrew, log out of root and init it as your user. Then install a perl. This will take a while.

perlbrew init
perlbrew install perl-5.24.0

There, you now have a Perl 5.24.0 installation in your home folder. which perl will still say /usr/bin/perl so you can change that:

perlbrew switch perl-5.24.0

It will have already told you that you need to alter your .bashrc or suchlike, with something like this:

source $HOME/perl5/perlbrew/etc/bashrc

You should do that.

Perlbrew does other stuff - see https://perlbrew.pl for details.

`cpanm`

You want to be able to install modules against your new perl.

You will have to reinstall modules under every perl you have if you want to use the same modules under different versions. This is because of reasons.¹

perlbrew install-cpanm

Now you can use cpanm to install modules. If you install a new Perl with perlbrew, you will have to

perlbrew switch $your_new_perl
perlbrew install-cpanm

All over again. If you're dealing with multiple Perl versions for a reason, you've probably already read the docs enough that you know which commands to use.

`cpanfile`

A cpanfile is a file in your project that lists the dependencies it requires. The purpose of this file is for when you are developing a project, and thus you haven't actually installed it. It looks like this.

requires "Moose";
requires "DBIx::Class" => "0.082840";
requires "perl" => "5.24";

test_requires "Test::More";

You use it like this

cpanm --installdeps .

The . refers to the current directory, of course, so you run this from a place that has a cpanfile in it.

The full syntax is on CPAN.

Purpose of `cpanfile`

A "project" here refers to basically anything you might put on CPAN - a distribution. It might be a module, or just some scripts, or a whole suite of both of those things.

The point is it's a unit, and it has dependencies, and you can't run the code without satisfying those dependencies.

If you install this distribution with cpanm then it will automatically install the dependencies because the author set up the makefile correctly so that cpanm knew what the dependencies were. cpanm also puts the modules in $PERL5LIB and the scripts in $PATH so that you can use them.

If you have the source code, either you are the author, or at least you're a contributor; you don't want to run the makefile just to install the dependencies, because this will install the development version of the module too. Nor do you want to require your contributors to install the whole of dzil just to contribute to your module. So, you provide a cpanfile that lists the dependencies they require to run or develop your module or scripts.

^{1 The primary reason is that every Perl version has a slightly different set of development headers, so any modules written in C will be incompatible. It's too much effort to separate them and disk space is cheap; so we just keep separate libraries and avoid the problem.}

Extending Catalyst Controllers

2015-07-29T10:53:00.001+01:00

Our API is versioned. Any change made to the API requires a new version at some level or another.

/api/v1/customers
/api/v1.1/customers
/api/v1.1.1/customers

Additionally, some of the URLs may want to be aliased

/api/v1.0.0/customers

When I got to the code we had Catalyst controllers based on Catalyst::Controller::REST, which looked somewhat like this:

package OurApp::Controller::API::v1::Customer;
use Moose;
BEGIN { extends 'Catalyst::Controller::REST'; };

sub index
    : Path('/api/v1/customer') 
    : Args(1)
    : ActionClass('REST')
{
    # ... fetch and stash customer
}

sub index_GET
    : Action
{
}

1;

In order to extend this API, well, I faffed around a bit. I needed to add a new v1.1 controller that had all the methods available to this v1 controller, plus a new one. It needed to be done quickly, and nothing really stood out as obvious to me at the time.

So I used detach.

package OurApp::Controller::API::v1_1::Customer;
use Moose;
BEGIN { extends 'Catalyst::Controller::REST'; };

sub index
    : Path('/api/v1.1/customer') 
    : Args(1)
    : ActionClass('REST')
{ }

sub index_GET
    : Action
{
    my ($self, $c) = @_;
    $c->detach('/api/v1/customer/index');
}

1;

This had the effect of creating new paths under /api/v1.1/ that simply detached to their counterparts.

The problem with this particular controller is that in v1.0 it only had GET defined. That meant it only had index defined, and so the customer object itself was fetched in the index method, ready for index_GET. I needed a second method that also used the customer object: this meant I had to refactor the index methods to use a chained loader, which the new method could also use.

sub get_customer
    : Chained('/')
    : PathPart('api/v1.1/customer') 
    : CaptureArgs(1)
{
    # ... fetch and stash the customer
}

sub index
    : PathPart('')
    : Chained('get_customer')
    : Args(0)
    : ActionClass('REST')
{ }

sub index_GET
    : Action
{
    my ($self, $c) = @_;
    $c->detach('/api/v1.1/customer/index');
}

sub address
    : PathPart('address')
    : Chained('get_customer')
    : Args(0)
    : ActionClass('REST')
{}

sub address_GET
    : Action
{
    # ... get address from stashed customer
}

The argument that used to terminate the URL is now in the middle of the URL for the address: /api/v1.1/customer/$ID/address. So it's gone from : Args(1) on the index action to : CaptureArgs(1) on the get_customer action.

The problem now is that I can't use detach in v1.1.1, because we'd be detaching mid-chain.

I had¹ to use goto.

package OurApp::Controller::API::v1_1_1::Customer;
use Moose;
BEGIN { extends 'Catalyst::Controller::REST'; };

sub get_customer
    : Chained('/')
    : PathPart('api/v1.1.1/customer') 
    : CaptureArgs(1)
{
    goto &OurApp::Controller::API::v1_1::Customer::get_customer;
}

#...
1;

This was fine, except I also introduced a validation method that was not an action; it was simply a method on the controller that validated customers for POST and PUT.

sub index_POST
    : Action
{
    my ($self, $c) = @_;
    my $data = $c->req->data;

    $self->_validate($c, $data);
}

sub _validate {
    # ...
}

In version 1.1.1, the only change was to augment validation; phone numbers were now constrained, where previously they were not.

It seemed like a ridiculous quantity of effort to clone the entire directory of controllers, change all the numbers to 1.1.1, and hack in a goto, just because I couldn't use Moose's after method modifier on _validate.

Why couldn't I? Because I couldn't use OurApp::Controller::API::v1_1::Customer as the base class for OurApp::Controller::API::v1_1_1::Customer.

Why? Because the paths were hard-coded in the Paths and PathParts!

This was the moment of clarity. That is not the correct way of specifying the paths.

To Every Controller, A Path

There is actually already a controller at every level of our API.

OurApp::Controller::API
OurApp::Controller::API::v1
OurApp::Controller::API::v1_1
OurApp::Controller::API::v1_1_1
OurApp::Controller::API::v1_1_1::Customer

This means we can add path information at every level. It's important to remember the controller namespace has nothing to do with Chained actions - The : Chained(path) and : PathPart(path) attributes can contain basically anything, allowing any path to be constructed from any controller.

In practice, this is a bad idea, because the first thing you want to know when you look at a path is how it's defined; and you don't want to have to pick apart the debug output when you could simply make assumptions based on a consistent association between controllers and paths.

But there is a way of associating the controller with the chained path, and that's by use of the path config setting and the : PathPrefix and : ChainedParent attributes. Both of these react to the current controller, meaning that if you subclass the controller, the result changes.

First I made the v1 controller have just the v1 path.

package OurApp::Controller::API::v1;
use Moose;
BEGIN { extends 'Catalyst::Controller'; };

__PACKAGE__->config
(
    path => 'v1',
);

sub api
    : ChainedParent
    : PathPrefix
    : CaptureArgs(0)
{}

1;

Then I gave the API controller the api path.

package OurApp::Controller::API;
use Moose;
BEGIN { extends 'Catalyst::Controller'; };

__PACKAGE__->config
(
    path => '/api',
);

sub api
    : Chained
    : PathPrefix
    : CaptureArgs(0)
{}

1;

This Tomato Is Not A Fruit

You may be wondering, why isn't ::v1 an extension of ::API itself? It's 100% to do with the number of path parts we need. The ::API controller defines a path => '/api' , while the ::API::v1 controller defines path => 'v1' . If the latter extended the former, it would inherit the methods rather than chaining them, i.e. v1 would override rather than extend /api.

So we have one controller per layer, but things in the same layer can inherit.

package OurApp::Controller::API::v1::Customer;
use Moose;
BEGIN { extends 'Catalyst::Controller::REST'; };

__PACKAGE__->config
(
    path => 'customer',
);

sub index
    : Chained('../api')
    : PathPrefix
    : Args(1)
    : ActionClass('REST')
{}

sub index_GET {}

1;

package OurApp::Controller::API::v1_1;

use Moose;
BEGIN { extends 'OurApp::Controller::API::v1'; };

__PACKAGE__->config
(
    path => 'v1.1',
);

1;

The reason we can inherit is that everything we've done is relative.

ChainedParent

This causes ::API::v1::api to be chained from ::API::v1::api, but when inherited, causes ::API::v1_1::api to be chained from ::API::v1_1::api.

Chained('../api')

This causes ::API::v1::Customer::index to be chained from ::API::v1::api, but when we inherit it, the new ::API::v1_1::Customer::index will be chained from ::API::v1_1::api.

PathPrefix

This causes these methods to have the PathPart of their controller's path_prefix. The most important example of this is in ::API::v1. Here, we see the api method configured with it:

sub api
    : ChainedParent
    : PathPrefix
    : CaptureArgs(0)
{}

This last is the central part of the whole deal. This means that the configuration path => 'v1' causes this chain to have the PathPart v1. When we inherit from this class, we simply redefine path, as we did in the v1.1 controller above:

__PACKAGE__->config( path => 'v1.1' );

The code above wasn't abbreviated. That was the entirety of the controller.

We can also create the relevant Customer controller in the same way:

package OurApp::Controller::API::v1_1::Customer;
use Moose;
BEGIN { extends 'OurApp::Controller::API::v1::Customer'; };
1;

This is even shorter because we don't have to even change the path! All we need to do is establish that there is a controller called ::API::v1_1::Customer and the standard path stuff will take care of the rest.

Equally, you can alias the same version with the same trick:

package OurApp::Controller::API::v1_0;
use Moose;
BEGIN { extends 'OurApp::Controller::API::v1'; };
__PACKAGE__->config( path => 'v1.0' );
1;

And of course the whole point of this is that now you can extend your API.

package OurApp::Controller::API::v1_1::Customer
use Moose;
BEGIN { extends 'OurApp::Controller::API::v1::Customer'; };

sub index_PUT { }

sub _validate {}

1;

This is where I came in. Now I can extend v1.1 into v1.1.1 and use Moose's around or after to change the way _validate works only for v1.1.1, and thus I have extended my API in code as well as in principle.

CatalystX::AppBuilder

We're actually using CatalystX::AppBuilder. This makes subclassing the entire API tree even easier, because you can inject v1 controllers as v1.1 controllers.

after 'setup_components' => sub {
    my $class = shift;

    $class->add_paths(__PACKAGE__);

    CatalystX::InjectComponent->inject(
        into      => $class,
        component => 'OurApp::Controller::API',
        as        => 'Controller::API'
    );
    CatalystX::InjectComponent->inject(
        into      => $class,
        component => 'OurApp::Controller::API::v1',
        as        => 'Controller::API::v1'
    );
    CatalystX::InjectComponent->inject(
        into      => $class,
        component => 'OurApp::Controller::API::v1_1',
        as        => 'Controller::API::v1_1'
    );

    for my $version (qw/v1 v1_1/) {
        CatalystX::InjectComponent->inject(
            into      => $class,
            component => 'OurApp::Controller::API::' . $version . '::Customers',
            as        => 'Controller::API::' . $version . '::Customers'
        );

        for my $controller (qw/Addresses Products/) {
            CatalystX::InjectComponent->inject(
                into      => $class,
                component => 'OurApp::Controller::API::v1::' .  $controller, # sic!
                as        => 'Controller::API::' . $version . '::' .  $controller
            );
        }
    }
};

Now we've injected all controllers that weren't changed simply by using the v1 controller as both the v1 and the v1.1 controllers; and the Customer controller, which was subclassed, has had the v1.1 version added explicitly.

The only thing we can't get away with injecting with different names are subclassed controllers themselves. Obviously that includes the v1.1 Customer controller because that's the one with new functionality, but don't forget it is also necessary to have a v1_1 controller in the first place in order to override the path config of its parent.

We would also have to create subclasses if we wanted to alias v1 into v1.0 and v1.0.0. That is the limitation of this, and it's a few lines of boilerplate to do so; but it's considerably better than an entire suite of copy-pasted controllers using goto.

I expect there's a good way to perform this particular form of injection without CatalystX::AppBuilder, but I don't know it. Comments welcome.

¹ Chose.

CPAN installation order

2015-05-27T12:32:00.000+01:00

At work we use Catalyst. Catalyst apps can be (should be?) built up from multiple modules, in the sense of distribution. This allows them to be modular, which is kind of why they're called modules.

That means each project is a directory full of directories, most of which represent Perl modules, and most of which depend on each other. In order to deploy we throw this list at cpanm (http://cpanmin.us) and let cpanm install them all.

This works by accident, because they're all installed already, and so module X depending on module Y is normally OK because Y will be updated during the process.

For a fresh installation, cpanm will fail to install many of them because their prerequisites are in the installation list:

$ cpanm X Y
--> Working on X
...
-> FAIL Installing the dependencies failed: 'Y' is not installed
--> Working on Y
...
-> OK
Successfully installed Y

Now Y is installed, but not X.

I wrote a script to reorder them. https://gist.github.com/Altreus/26c33421c36cc1eee68c

$ installation-order X Y
Y X

$ cpanm $(installation-order X Y)
--> Working on Y
...
-> OK
Successfully installed Y
--> Working on X
...
-> OK
Successfully installed X

This will use the same information that cpanm used in the first place to complain that Y was not installed; which is to say, if a dependency is missing, the original cpanm invocation would not have failed anyway.

Update

It is worth noting that cpanm can install from directories; and it will always try this if the module name starts with ./.

Therefore, X and Y above can be the result of a glob, so long as you include the ./ in the glob:

$ echo ./*
./Module1 ./Module2
$ installation-order ./*
./Module2 ./Module1

This also works with absolute paths.

Catalyst Models

2015-04-14T17:49:00.000+01:00

A Catalyst model is simply a package in the MyApp::Model namespace.

$c->model('DBIC')

simply returns

"MyApp::Model::DBIC"

I recently spent some time at work trying to work out quite how Catalyst models work with relation to, well, everything else.

Our app structure is based on CatalystX::AppBuilder, and I needed to add a model to one of the components, in order to provide a caching layer in the right place.

The mistake I'd been making was that the Schema subclass is not the same thing as the model. Rather, the model is an interface into the Schema class. Essentially, I had one class too few.

You can determine that by creating a new Catalyst application and then running the helper script that creates a model from an existing schema. You get a class like this:

package MyApp::Model::FilmDB;

use strict;
use base 'Catalyst::Model::DBIC::Schema';

__PACKAGE__->config(
    schema_class => 'MyApp::Schema::FilmDB',

    connect_info => {
        dsn => 'dbi:mysql:filmdb',
        user => 'dbusername',
        password => 'dbpass',
    }
);

A Model class is created and it points to the Schema class, being your actual DBIC schema.

Once I'd realised the above rule it was easy enough to create MyApp::Extension::Model::DBIC to go alongside MyApp::Extension::Schema.

Further confusion arose with the configuration. There appeared to be no existing configuration that matched any of the extant classes in the application or its components. However, it was clear which was the DBIC model configuration because of the DSN.

I wanted to follow suit with the new module, which meant that some how I had to map the real name to the config name.

<Model::DBIC>
</Model::DBIC>

This makes sense; if I do $c->model('DBIC') I'll get "MyApp::Model::DBIC", and that'll be configured with the Model::DBIC part of the config.

What I'd missed here was that we were mixing CatalystX::AppBuilder with CatalystX::InjectComponent:

package MyApp::Extension;
use CatalystX::InjectComponent;

after 'setup_components' => sub {
    my $class = shift;

    ...

    CatalystX::InjectComponent->inject(
        into      => $class,
        component => __PACKAGE__ . '::Model::DBIC',
        as        => 'Model::DBIC',
    );
}

This was the missing part - the stuff inside the CatalystX::AppBuilder component was itself built up out of other components, aliasing their namespace-specific models so that $c->model would return the appropriate class.

Now, Model::DBIC refers to MyApp::Extension::Model::DBIC, which is an interface into MyApp::Extension::Schema.

User groups in Odoo 8

2015-01-05T16:43:00.000+00:00

Odoo has a user group concept that, if you Google for errors, crops up all the time. Odd that when you first run Odoo, you can't assign users to groups.

The answer is you have to give the Administrator user the "Technical Features" feature in Usability. Navigate to Settings > Users, click Administrator, click Edit, check the relevant box, click Save, and finally refresh.

If you Google for it, there's hardly any information on the subject. However, Odoo is quite happy to occasionally tell you what groups you need to be a part of in order to access something.

User groups are access control, so it's common that you'd want to create levels of access and assign the user to them. I first discovered an issue with this when trying the Project Management module - trying which was the entire point of me running Odoo 8 in the first place. (I can't reproduce the problem now that it's a new year. Maybe Odoo's NYR is to be less whiny.)

You can run a Docker container with Odoo 8 in it from the tinyerp/odoo-docker github repo; either the Debian or the Ubuntu version should work fine.¹

¹ I recommend the Debian version, since Ubuntu is just Debian with extra, irrelevant stuff bundled in, making it not entirely useful to have an Ubuntu version in the first place. Licensing is probably involved.

Day 22 - The nth Day Of Christmas

2014-12-22T23:24:00.000+00:00

How many presents were given, in total, on the 12th day of Christmas, by the so-called "true love"? How many for the nth day?

For each day we know that each other day was done again, so we have a shape like this:

Each column is as tall as the number of rows, and the number of rows is 12.

This means the 1 column is 12 tall, the 2 column 11, and so on.

This is 12 * 1 + 11 * 2 + 10 * 3 ...

That's boring. That's not what computers or maths are for. Let's generalise.

We can see that each section of the summed sequence follows a pattern of x * y, where x + y = 13.

It is common, when analysing sequences, to forget that the order matters, and the row number can be used as a variable. If we call that variable i then each section is (13 - i) * i, and the total is the sum over 1, 12.

 12
  Σ (13 - i) * i
i=1

13 is suspiciously close to 12. What happens if we do this?

 12
  Σ (12 + 1 - i) * i
i=1

And then replace the 12 with our n to answer "What about the nth day?"

  n
  Σ (n + 1 - i) * i
i=1

Does it work? Let's Perl it up. Each value of (n + 1 - i) * i can be produced by a map over the range 1..$n, using $_ in place of i, since that's exactly what it is in this case.

sum0 map { $_ * ($n + 1 - $_) } 1 .. $n

sum0 comes from List::Util, and does a standard sum, except the list is assumed to start with zero in case the list is empty - this just avoids pesky warnings.

Try it. Using $ARGV[0] for $n we can give our n on the command line:

perl -MList::Util=sum0 -E'say sum0 map { $_ * ($ARGV[0] + 1 - $_) } 1 .. $ARGV[0]' 12

Vary the 12 to solve for different values of n.

The answer, incidentally, is 364.

Day 18: The URI

2014-12-22T13:39:00.000+00:00

I've talked a lot about this resource-first way of dealing with the web, and really the internet in general, but it isn't a tool that fits all things. For instance, today I was looking at the point-of-sale module in Odoo, which is essentially an HTML representation of the index resource of the products in the system, but is actually more complicated than that, because it includes that resource, a numeric input box, the bill of items so far, a search box, and a few other twiddly bits to improve the cashier's use of the system. Plus, it is designed with tablets in mind.

This is quite different from the list of products you get when you look for the list of products in Odoo itself.

However, we must construct a URI that refers to this view of the data if we're to be able to access that view of the data in the first place. That means that we somehow have to shoehorn this not-a-resource idea into the everything-is-a-resource idea.

Today I'm going to deconstruct the URI and explain how each part can be used, in order to avoid too much in the way of special behaviour. Ideally we'd like every resource to be represented by a single URI, but that's clearly not going to work.

Allow me to state up front that I consider Odoo's URI scheme to be utterly shocking. But it appears to be a legacy from back in the old days when more people made web things than really understood what URIs were for.

The URI

The URI is made up of several parts. Here is what I consider to be the simplest URL that contains all common parts¹:

http://www.example.com:8080/resource/id?query=data#part-of-document

|_____|___|_______|___|____|________|__|__________|_______________|
   1    2     3     4   5      6     7      8             9

1. Schema
2. Subdomain
3. SLD²
4. TLD
5. Port
6. Resource (type) name
7. Resource (instance) identifier
8. Query string
9. URI fragment

Together, 2, 3 and 4 comprise the hostname; 6 and 7 are the path.

Breaking down the URI

Schema

The schema is the first place where you restrict yourself. Often referred to as protocol, the schema usually determines how the URI should be used. In this example http is the assumed protocol by which web requests are made. The http schema tells the client to use the HTTP protocol to make the request.

This is very useful because it means we can immediately assume a large quantity of knowledge about the system that we wouldn't have without the schema. Particularly useful is that we know what sort of programs can be used to actually access this URL³. This is, if you think about it, what the word protocol means: it is those things that are assumed to be the case, given a certain situation. When we all follow protocol, we don't need to explain why we're doing what we're doing.

Mostly we come across URLs specifying the HTTP schema; in fact, it's assumed, in many cases, that a URI with no schema is an HTTP URL, because if you click on it, it opens up in your browser. However, some places have started using their own schemata, such as the spotify: schema, which opens URLs in the Spotify client, or the steam: schema, which opens things with Steam.

It's worth noting that the entire hostname can also be omitted from a URI, but this usually means you get three slashes, not two. This is commonly seen with the file protocol, such as file:///home/user/documents/example.html; where the third / is actually part of the path. For this reason it can be observed that the steam: schema does not quite follow the normal URI standards, since the part immediately following the schema is an action - arguably a resource - and not a hostname.

By inventing our own schemata like this we can create entire applications with a new way of communicating, but we're focusing on the web here, which means we're going to use HTTP(S), like it or lump it.

Subdomain

The term "subdomain" is a bit of a colloquialism. Each section of the hostname is a subdomain for the part to the right. The host name is a hierarchy with, in this case, com at the top. We usually call this part the "subdomain" because it's the first subdivision that is really relevent to a human.

When we have a subdivided subdomain we sort of stop talking about them and start mumbling and saying "that bit" and pointing.

The subdomain is a tool we can use to do many things. Traditionally the web is in the www subdomain, but the http protocol is usually sufficient to assume web, these days. However, that's starting to change, as we start to send non-web things over HTTP. These non-web things are, e.g., the API, or the CDN.

Really consider using an api subdomain for your API. You'll find that if you have an api and a www, then your website can have, in the majority, the exact same URI structure as the API. This is more often the case than it appears to be, because people don't tend to think of their web pages as representing a resource in HTML format.

Domain

The SDL is the part of the domain that really, to a human, represents where the site is. This is usually your company or organisation name, or some other thing whose entire purpose is to say what this whole web site is about.

You can install a system under multiple domains and thus they would all have the exact same URI scheme, except that, because they're in different places, the records that you get would be different.

Because yoursite.com/user/1 is not the same person as mysite.com/user/1, except by coincidence.

I've lumped the TLD in here too, because the TLD is, to most people, part of your domain name - which is why we call the subdomain the subdomain regardless of where it appears on the actual hostname.

Port

When designing URI schemes it's helpful to drink a lot of port, for inspiration.

Commonly there are alternative services associated with your website, meaning they're on the same domain, and you can't use the subdomain because these other services need api and www subdomains of their own.

One trick is to mount these services under a part of the path, and consider them a big resource with sub-resources; but easier is to install them on a different port.

For example, your Elasticsearch instance - which communicates entirely via HTTP - can be running on the same hostname as your website, but a different port. Elasticsearch's default port is 9200, going up to 9300 as you add instances on the same machine.

Resource name

The first part of the path of the URL I'm calling the resource name. That's because this is where the actual resource you're requesting starts. Everything before the path is defining whose resource you are asking for, but once the path starts you're starting to get a handle on the actual information.

The resource name, when requested, can have multiple behaviours, depending on the purpose of the resource, but common is simply to be an index of all the items of that type. Since that can be cumbersome, it is perfectly legitimate to both paginate this list and summarise the entries. That sort of stuff is well out of scope of this article, though.

Other uses of the first part of the path are organisational, and may be handled better as a subdomain. For example, having an api part of the path here is not as useful as it would be to have an API subdomain, because if the paths to the resources can be consistent then we don't have to ask questions about what they should be.

https://www.example.com/resource
https://api.example.com/resource

Other times, you may want to use a different port. For example, if the web stuff is on port 80 then the administration part could be on port 8080. This also allows you to control access to the different parts of the site at the kernel level, using routing rather than soft authentication.

https://www.example.com/admin
https://www.example.com:8080

Doing this also means that it's harder to guess the correct path to the admin area, since you can use an obscure port. Denying access based on IP rules means you'd never report to unauthorised users when they guessed right in the first place.

But really, there's no exact reason why you would or would not add parts of the path to the URL in order to divide it up into separate logical zones. This can certainly help with human comprehension of the purpose of your URL. Sometimes you may even want to provide dummy paths - paths that refer to the same resource as other paths, but assist with conceptual compartmentalisation by having different subpaths.

https://www.example.com/shop/product/1
https://www.example.com/blog/post/1

In these examples, the first part of the path could be omitted, provided that post is always the blog post and product is always a shop product. Consider also that you could still use subdomains for these.

https://api.shop.example.com/product/1

The important part would be to ensure that your uses are consistent. Always have each part of the URL refer to the same logical division of your resource structure.

Item ID

Once you've decided at which point of the path to put the resource type, you should probably put the next part as an optional ID field.

The combination of a resource name and an item ID should be entirely sufficient to retrieve all the information about that specific instance of that type.

This is a reasonably central principle to the resource-first model of your system - all your things have a type and an ID and that's all you need to provide to retrieve it, or at least a representation of it. Everything else is your organisational whimsy and the system really shouldn't have to know.

More formally than dismissing it as whimsy, I should point out that even the type names and shapes can change, and that's difficult enough to deal with. Every level of organisation you add on top of this is another changeable shape of the system that at some point you're going to have to adapt. The fewer of those you have, the better.

The actual format of your identifier is up to you, but there's really nothing else you can put after the resource name that is relevant at this point.

Query string

If I catch you using a query string to tell a dynamic resource to load a specific other resource I will murder you in your sleep.

https://example.com/index.php?type=resource&id=1234

Seriously, this sort of crap is all over the internet. Yes, it's usually PHP.

You are using a URI - at least put the resource identifier in the resource identifier.

It is important to note that the query string is not the same thing as the "GET parameters". A query string does not have to be in the format key=value&key=value - the web server passes the query string straight to the app, and it is the application that decodes it in its own way. It is common to use the key=value&key=value structure but not required.

The query string's most obvious purpose is to pass a query to a resource that expects one, or that at least accepts one. Often the index resources will allow for some sort of search or filter functionality, and if that's not the case then special resources designed to search and filter - and possibly concatenate - other resources will accept search parameters.

Further specialisation of resources would not even use the KVP format of "GET parameters", and simply take the query string as instruction. These types of resource are drifting away from the "object" type of resource and moving towards "function" resources, which are a separate discussion.

The thing about the query string is that it is usually only relevant to GET requests, which is why it is sometimes called the GET string. But GET is an HTTP verb and the query string is part of the URL; and URLs don't have to be http://, so the query string can really be used against any scheme.

It is often said the query string should not be used to send data to the server, but I'm really not sure that's the case. The server should not store data as the result of a read request (HTTP's GET), but it is welcome to store data as the result of a write request (HTTP's POST or PUT). In which case it is entirely up to the server the mechanism by which the data are provided to it.

These are why you should call it the query string, not the GET string.

Fragment

The part of the URL after the # is called the fragment. This is not actually part of the resource identifier, but is provided for the client's benefit.

If you click on any of the footnote marks in this document⁴, most browsers I give a toss about will jump to the footnote, and back again when you click on the number of that footnote.

No new page request is made. The browser is not being instructed to access a different resource. In the example earlier, the fragment is #part-of-document. The fragment is usually used to refer to a part of the document. In HTML and XML, this is either by the id or name attributes of the elements.

In this document, the a tags that jump around the page have name attributes that the browser uses to scroll to them when the URL fragment changes, i.e. in these blog-post resources, the parts-of-the-document that I refer to with URL fragments are the footnotes and the places the footnotes refer to.

Using the document fragment to refer to specific resources is a crime committed by many "JavaScript apps" today. The reason this is a crime is that it is not identifying the resource; it's identifying the resource proxy, which means the correct client must be used to actually access the resource itself. It's like having a proprietary browser that only understands a completely different URI format.

It's a crime because browsers are more than capable of intercepting URI requests inside an application and getting the application to update as necessary, and servers are more than capable of returning a javascript-app-with-resource-in-it as the HTML representation of the resource.

There is no reason besides lack of imagination to trample all over that URI system just to avoid reloading the page every so often.

TODO

Not mentioned is the idea of a "related resource". This can be a third part of the URI path whereby you request an index of a separate resource based on the current one:

https://www.example.com/blog/post/1/comments

This is, conceptually, the same as

https://www.example.com/blog/comments?post=1

but you may wish to return the results differently, e.g. with more expanded objects rather than just URLs to the results.

In upcoming posts we'll probably have a look at those "functional" resources I mentioned in passing. This post has been entirely about "object" resources, i.e. those resources that simply represent some representation of a real-world object, or a fake-world object, but ultimately something that can be represented as a JSON object with fields and values. I will also try to discuss the resource-first view of website building using the aforementioned point-of-sale in Odoo as an example.

We also haven't discussed how it is that you would relate resources to one another in knowable ways. This ties in with the hyperlink concept and is the thinking behind Web::HyperMachine - HTML pages are already linked together with <a href="related-link">, but there are myriad other ways even those use hyperlinks to refer to other resources, and even more ways in HTTP itself.

¹ I've omitted from this the user:pass@ part that can be used before the hostname, because it's not very common.

² The "second-level domain" is colloquially the "company" part of the name, i.e. the first part that actually identifies at a human-readable level what it is the URI refers to. In some cases, such as .co.uk, the TLD is actually the SLD (co) and TLD (uk), and it is the third-level domain that is the company part. Colloquially, we can refer .co.uk as a TLD, so that this remains the SLD.

³ A URL is basically a URI that you can actually use. That is, there exist URIs that refer to resources but that cannot actually be used to access that resource; for example the ISBN URI schema cannot be used to get an actual book.

⁴ Like this one.

Day 17: A complex and detailed investigation into the various merits and faults of the assorted combinations of codepage, character set and byte encoding of human-readable text.

2014-12-17T14:45:00.001+00:00

There are 127 characters in ASCII and tens of thousands of characters in the real world. It is probably an interesting debate, trying to come up with the most efficient way of encoding non-ASCII characters without screwing everything up.

Don't waste your time. Use UTF-8 and Unicode.

"But what about UTF-16?" No.

"But what about--" NO.

ASCII is included in UTF-8 Unicode. So is everything else. Everyone understands it, everything's assuming it, and all the other encodings and charsets are more obscure and therefore harder to deal with.

Everyone (except PHP) has UTF-8 Unicode built in to whatever programming language they're using.

Unless you're writing for devices with memory measured in bytes and a network connection measured in baud then you have time and space to use the bloating of UTF-8 Unicode. So suck it up, be inefficient, and accept the VHS of UTF-8 over the Betamax of whatever you're looking all cow-eyed at today.

And, in case you were wondering, ASCII is never the right answer.

Day 16: Web::Machine

2014-12-16T17:18:00.003+00:00

Web::Machine is pretty cool because it reorganises the way you think about your website's structure, focusing on the perspective you should really be starting with in the first place.

Web::Machine encourages you to construct several objects, each of which handles a URI by representing the resource to which that URI points.

Remember that URI is a Uniform Resource Identifier. We've had this discussion. The parts of the internet that use URIs are based on the assumption that they are sharing information about resources, and hence the focus is on the resource.

Web::Machine starts with the resource. You construct an object and mount it as Plack middleware to handle the URI to that resource. These objects are actually the machines. You construct a Web::Machine with a subclass of Web::Machine::Resource, and if that's all you want to do, you call ->to_app on it and plack it up.

Each Web::Machine so constructed is a Plack::Component. That means you can bring in a Plack::Builder and mount machines in it.

my $builder = Plack::Builder->new;
$builder->mount(
    '/resource' => Web::Machine->new( 
        resource => 'MyApp::Resource'
    )
);

Alternatively, you might prefer to use something like Path::Router, providing subs that build Web::Machines based on arguments.

my $router = Path::Router->new;
$router->add_route('/resource/:id' => sub {
    my ($req, $id) = @_;
    Web::Machine->new(
        resource => 'MyApp::Resource',
        resource_args => [
            id => $id,
        ],
    )
    ->call($req->env);
});

Two things are notable about this particular invocation. First, it is necessary to run call on the resulting machine manually. The second is that, now that we have actual args coming in, we're seeing how Web::Machine takes an array ref for these, not a hashref; i.e. it's an argument list and not required to be hash-shaped.

MyApp::Resource is what handles the actual magic: Web::Machine expects certain subroutines to be overridden from the base class Web::Machine::Resource that define what this resource can do.

The sensible ones to provide are content_types_provided and the to_* filters that define how to represent this resource as the various content types it supports.

The documentation lists all of the functions that can be overridden to provide behaviour specific to this class.

RFPR: Web::HyperMachine

I've started taking this a step further. Resources are only part of what makes the interwebs work. The other part is the fact the resources are related to each other: hypermedia.

Up on the githubs is a start to the module Web::HyperMachine, which tries to wrap Web::Machine in an understanding of how the resources relate to one another. By adding a couple of DSL-like functions to the Resource class it is possible to automatically construct the URI schema for the system, using the declared names of resources and relationships within the resource classes themselves.

The user simply mounts those resources and the machine does the rest:

#!/usr/bin/perl
use strict;
use warnings;
use Web::HyperMachine;

my $app = Web::HyperMachine->new;
$app->with('MyApp::Resource');      
$app->to_app;

And the resource would be e.g.:

package MyApp::Resource;
use strict;
use warnings;

use parent 'Web::HyperMachine::Resource';

__PACKAGE__->uri('resource');

our @data = qw( hello hi hey howdy );

sub content_types_provided { [{ 'text/html' => 'to_html' }] }

sub fetch {
    my ($self, $id) = @_;
    return $data[$id];
}

sub to_html {
    my $self = shift;
    my $resource = $self->{resource};

    q{<h1>} . $resource . q{ world</h1>}
}

1;

If you plackup that script, you'll find that /resource/0¹ will return an HTML page with "Hello world" in it; and other values will correspondingly index into the array.

Feedback on this concept is encouraged; it's not been worked on for some time, like most things I do, because I got bored of it, because I didn't have an actual use for it.

¹ If 0 doesn't appear to work, you may have an outdated version of Path::Router. The issue tracker says it is fixed on CPAN now.

Day 15: Crime and Punishment

2014-12-15T23:47:00.002+00:00

In today's post I'm going to try to convince you to think of the interfaces you make in terms of punishment, in order to find the path of least punishment.

Here's a perspective for you to consider: when someone uses your system, they are doing you a favour. Don't try to yes-but-what-if your way out of this; I'm not asserting that it is the case. I am saying that is how you should consider it to be. Assume that the user, given the option, will pick an alternative system. Design the interface from the point of view that it is the very fact people use the system that is the currency that measures its success. If people don't like using it, if you make it hard to do, they simply will stop doing so.

This is an important perspective if you are a business, because your system needs to get the user from state 1, wherein they have their money, to state 2, wherein you have their money. If you make that difficult to do, then they won't do it. You are not doing them a favour; don't treat them like you are.

Punishment

Punishment probably makes you think of unwanted tasks doled out to people for correction or restitution of some misdemeanour or other. This is a bit of a goal-oriented definition, because it implies a perpetrator in the first place; i.e. it expects that some misdeed has been undertaken for which recompense needs to be made.

People are, of course, falsely accused and given punitive action nevertheless. The focal point of the above definition is that of an unwanted task; some chore that must be gone through, which one is inconvenienced, perhaps embarrassed or humiliated, to do. The concept is one of a strong antipathy or disinclination to do the thing; hence it is considered punitive to require that the person do it.

Crime and Punishment

When you design an interaction between a human and a computer you are establishing a sequence of events that will allow the user to eventually find themselves in a situation whereby the thing they set out to do has been done. Within this highly abstracted scenario there are three players:

You (the entity with which the task is being performed)
The user (the entity trying to perform the task)
The task (the sequence of events by which the thing moves from not-done to done)

This set of three players has implied with it several types of tasks:

Expected but trivial; these things do not inconvenience
Expected but undesirable; the user has prepared for this
Unexpected but trivial; these things are minor inconveniences
Unexpected and undesirable; necessary evils
Unexpected and undesirable and avoidable; punishment

When you design an interface and you've added something to that interface, seriously consider whether that thing can be considered punishing the user for something they didn't do wrong.

Especially consider whethere it is punishment for something out of their control. In many cases it is necessary to inform the user that there was a problem; this may seem like punishment, because it is quite undesirable to have to go through all that again.

Well, it is. Reduce the impact of problems by not discarding all the information the user has entered. If the problem is on your side, don't force the user to pick up the pieces, because they won't. If the problem is on their side, only require the re-entry of that information - not the entire thing.

And if there isn't a problem, why are you making one?

Amazon

Amazon punished me recently. They have this 1-Click registered-trademark button that allows you to find something you want and have it on its way to you just by pressing a button. That's a great feature - they are absolutely doing me a favour by having it. And they do me a second favour by letting me amend the order for up to 30 minutes after it's created.

Then they punish me for wanting to do that.

If you try to change the delivery address of such an order you are required to "confirm" your payment details. Why? They told me (on Twitter) that it was a security precaution to prevent others from accessing my personal information.

What utter, rotten bullshit. This is rubbish design, pure and simple. If I didn't change my delivery address, I would not have to confirm anything! This is unexpected, undesirable, and completely avoidable. It is punishment for wanting to have it delivered somewhere else. That is not a punishable offence.

SimplyBe

I get very upset sometimes. SimplyBe are absolutely not the sort of company that want me to give them any money. Every single step in between me selecting a product and me paying for the product was a pain in the arse.

Here are the necessary evils of buying something online:

Entering your payment details
Telling them where to send the product

That is it. Everything else beyond that is you not doing me a favour. Sometimes we accept certain things, like do you want to sign up for the newsletter? (No.) But there are really only two things a place needs to know about you in order to get your money from your pocket and into theirs. If they punish you for trying to do that, go somewhere else.

For the curious, my tirade can also be seen on Twitter, written live as I came across the problems with the checkout. Finding it is left as an exercise to the reader. Every single tweet in that set is about something I consider a punishment, and I consider myself as having been punished for wanting to give them money.

Metro 2033

I first started thinking about interfaces in terms of punishment while playing this game, Metro 2033, of which many readers may have heard. It was touted as one of the best games of whatever year I missed it in when it first came out. It's set in the subway of Moscow - the Metro - where humanity has retreated from whatever disaster has yet to be revealed.

The game goes, by stages, from stealth to survival to legging it to brawling to just wandering around in a township buying stuff. And it punishes you.

Progress in the game is saved by a checkpoint mechanic, although it doesn't tell you where the checkpoints are. All you know is that, if you die, you're going to be set back some arbitrary distance; although once you've failed once, you know where you're going to go back to.

The game is therefore, at the abstract level, a series of challenges that must be overcome in order to progress; failure in a particular challenge sets you back to, at best, the start of that challenge or, at worst, the start of the level. You don't know where until you fail a challenge, but when you've failed a challenge you have some idea of the new worst-case scenario.

The problem is that some challenges are more, well, challenging than others, but failing them causes you to have to repeat the less obnoxious ones in order to retry the difficult one. In a save-when-you-want game you would simply save before you reached the difficult challenge, in order to avoid repeating the easy ones more than once.

This reduces the easy challenges to chores, trivial tasks that you gradually become adept at and simply have to slog through to try the part you keep failing at, until eventually you find the secret to the difficult part. This quickly stops being entertaining.

Games should not be chores. Chores are punishment.

Incidentally, the game (so it calls itself) has another punishment mechanism: traps. Consider the welcome form of punishment, whereby you are set back for failing a challenge - this is the expected function of a game, since a game is supposed to be entertaining by presenting a challenge, and a challenge you can't fail is not a challenge at all. The trap I'm talking about is not a trap for the character in the game, but a trap for the player. In the game, traps are visible and have a disarming mechanism; but traps for the player are unexpected, random events. Unexpected, undesirable, but avoidable by the designer.

Twice, so far, the game has required me to be discreet, quiet, stealthy - this means light off - and then punished me by leaving traps in the dark. Things I cannot have avoided by using skill - points in the game where the only two approaches to the challenge would have caused me to fail. Damned if you do, and damned if you don't. The only way to beat the challenge is to have failed it at that point once already. How do I know there won't be another trap ahead? This challenge has become a chore.

Codec¹

Maintain flow. Most of the things I've listed as examples of punishment are flow-breaking. Most of the time, the user doesn't want to have to know how to perform the task; they need to be prompted to enter information, and as little information as possible. Every step along the way is a step further away from them achieving their goal, and the value of your system is entirely measured in how many people use it to achieve their goals.

Common punishments include:

Forcing the user to manually type information they use a computer to automate in the first place (autofill forms, or refusing to let me paste my generated passwords into the confirmation box).
Repetition of trivial tasks that shouldn't have to be done at all.
Requirement of information you don't strictly need.
Considering valid data to be invalid because your validation is broken (or vice versa).
Similarly, rejecting sensible input because you're scared of it (like most of my randomly-generated passwords).
Pretending to let you do something, and then moving the goalposts and not actually doing it.
Not providing sufficient information to help the user rectify the problem.
Fragmenting input forms across multiple pages.
Cramming a single page with too much input.
Discarding information because your fragile system shat itself.
Choosing difficult fonts and colours to read.
Making the user hunt for the next thing they have to do.
Related, leaving the user at the end of a process with no confirmation or failure message, so they don't know that they're done, or feeling that they have to do it all again.

I'm sure if I use the internet for another day I'll be able to double this list but you get the idea. For every action the user has to take, is it something they've prepared for, and do they actually have to do it?

¹ [sic]

Day 12ish: PERL

2014-12-15T14:43:00.003+00:00

PERL is wrong. It was invented at some point to mean Practical Extraction and Report(ing) Language but Perl was never called that originally.

Although I do quite like the interpretation Poor Excuse for a Real Language, which unfortunately doesn't initialise to PHP.

There's also a swathe of awful, ancient code written in Perl.

This legacy dogs Perl's steps, despite the recent rise of Perl like an X-Wing rising out of Dagobah swamps.

Thus I propose a naming convention: Anything that can be considered to be dragging Modern Perl down be referred to as PERL code. It's clear how PERL is indeed a pathetic excuse for a real language. Perl resembles PERL as much as Episode IV resembles Episode I.

PERL is dead. Long live Perl.

Day 11: List context and parentheses

2014-12-11T16:28:00.001+00:00

It's common to start off believing that () make a list, or create list context. That's because you normally see lists first explained as constructing arrays:

my @array = (1,2,3);

and therefore it looks like the parentheses are part of list context.

They aren't. Context in this statement is determined by the assignment operator. All the parentheses are doing is grouping up those elements, making sure that all the , operators are evaluated before the = is.

There is exactly one place in the whole of Perl where this common misconception is actually true.

LHS of `=`

On the left of an assignment, parentheses create list context. This is how the Saturn operator works.

$x = () = /regex/g;
#   |______________|

The marked section is an empty list on the left-hand side of an assignment operator: the global match operation is therefore in list context.

LHS of `x`

This is a strange one. The parentheses do construct a list, but the stuff inside the parentheses does not gain list context.

my @array = (CONSTANT) x $n;

In this case, CONSTANT - presumably sub CONSTANT {...} - is in list context; x gains list context from the =, and CONSTANT inherits it.

my $str = (CONSTANT) x $n;

Here we have x in scalar context because of $str, and CONSTANT in scalar context because of that. This is not really a whole lot of use, however.

Various Contexts

This sub reports whether it's called in scalar, list or void context¹:

sub sayctx { say qw(scalar list void)[wantarray // 2] }

Now we can test a few constructs for context:

# void
sayctx;

# scalar
scalar sayctx;

# scalar
my $x = sayctx;

# list
my @x = sayctx;

# list
() = (sayctx) x 1;

# scalar
my $x = (sayctx) x 1;

# list
last for sayctx;

# scalar
while (sayctx) { last }

# scalar
1 if sayctx;

# scalar, void
sayctx if sayctx;

# scalar, scalar
sayctx > sayctx;

¹ Understanding it is left as an exercise to the reader.

Day 10: Fixes to DBIx::Class::InflateColumn::Boolean

2014-12-10T17:35:00.002+00:00

I'm finding my new position at OpusVL ever more valuable. We like to put extra time into getting to the bottom of an issue because we rely so heavily on open-source software. Problems we discover in the modules we use are worth investigating for their own sake, simply because the amount of time already put into the modules by other people is years; years we didn't have to spend ourselves.

Today I discovered that, if I ran my Catalyst application under perl -d, it didn't actually run at all.

After much involvement from various IRC channels I came to the conclusion that the problem was in Contextual::Return; or rather, the problem was in the 5.14 debugger, since it seems OK in 5.20.

Anyway, Contextual::Return was employed by DBIx::Class::InflateColumn::Boolean, which I was using because SQLite doesn't have ALTER COLUMN. We test components of Catalyst applications as small PSGI applications with SQLite databases backing them, which has its own problems, but in this case the issue was the column in question being closed boolean NOT NULL DEFAULT false, and SQLite not translating "false" as anything other than the string "false", and then shoving it in a boolean column anyway.

So DBIC faithfully gave me "false" back when I accessed the row, and "false" is true, so everything broke.

So I inflated the column.

This all resulted in a patch to DBIC:IC:Boolean, authored by haarg, removing the dependency on Contextual::Return entirely.

This may be a case of avoiding rather than fixing the problem, but since the problem appears to exist in the 5.14 debugger, the only way to fix that is to update to 5.20 - or whenever it was that it was fixed.

It also prompted me to rebuild the SQLite database to remove that default. Turns out DBIC doesn't fill in default values when creating rows.

Day 9: Scalar filehandles, or IO, IO, it's not to disk we go

2014-12-09T12:13:00.002+00:00

Did you know you can open a variable as a file handle?

This is a great trick that avoids temporary files. You can write to the filehandle, and the stuff written thereto are available in the other variable. I'm going to call the other variable the "buffer"; this is a common term for a-place-where-data-get-stuffed.

Here's an example whereby I created an XLS spreadsheet entirely in memory and uploaded it using WWW::Mechanize. The template for the spreadsheet came from __DATA__, the special filehandle that reads stuff from the end of the script.

This allowed me to embed a simple CSV in my script, amend it slightly, and then upload it as an XLS, meaning I never had to have a binary XLS file committed to git, nor even written temporarily to disk.

In the example below, a vehicle, identified by its VRM (registration plate) is uploaded in an XLS spreadsheet with information about its sale. The $mech in the example is ready on the form where this file is uploaded.

The main problem this solves is that the VRM to put into the spreadsheet is generated by the script itself, meaning that we can't just have an XLS file waiting around to be uploaded. As noted, it is also preferable not to have to edit an XLS file for any reason, essentially because this can't be done on the command line - LibreOffice is required, or some Perl hijinks.

open my $spreadsheet_fh, ">", \my $spreadsheet_buf;       # [1]
my ($header, $line) = map { chomp; [split /,/] } <DATA>;  # [2]
my $xls = Spreadsheet::WriteExcel->new($spreadsheet_fh);  # [3]
my $sheet = $xls->add_worksheet();

# processing

$line->[0] = $vrm;

$sheet->write_col('A1', [ $header, $line ]);              # [4]
$xls->close;

$mech->submit_form(
  with_fields => {
      file => [ [ undef, 'whatever', 
          Content => $spreadsheet_buf ],                  # [5]
      1 ]
  },
  button => 'submit',
);

# [5]
__DATA__
VRM,Price,Fees,Collection,Valeting,Prep costs
,2333,10,0,10,0

The key to this example is in [1], which looks like a normal open call except for the last expression:

\my $spreadsheet_buf;

This is a valid shortcut to declaring the $spreadsheet_buf and then taking a reference to that:

my $spreadsheet_buf;
open my $spreadsheet_fh, ">", \$spreadsheet_buf;

The clever part is that now, $spreadsheet_fh is a normal filehandle that can be used just like any other; just as if we'd used a filename instead of a scalar reference. At [3] you can see a normal Spreadsheet::WriteExcel constructor, taking a filehandle as the argument, as documented.

At [2] you can see DATA in use, which reads from __DATA__ at [5]. This also acts like a normal filehandle; <DATA> reads linewise, and we have to chomp to remove the newlines.

We map over these lines, chomping them and using split /,/ to turn them into lists of strings; and this list is inside the arrayref constructor [...], meaning we get an arrayref for each line.

At [4] we have processed sufficiently to have installed the VRM in the gap at the front of the second line, i.e. the zeroth element of $line, so write_col is employed to write both arrayrefs as rows (yes I know) into the spreadsheet.

When we call $xls->close, this writes the spreadsheet to the filehandle. But no file is created; instead, the data go to $spreadsheet_buf. If we were to print $spreadsheet_buf to a file now, we would get an XLS we can open.

Instead, at [5], we use the trick documented in submit_form (ether++ for reading everyone's mind) to use the file data we already have as the value of the form field.

This trick is remarkably useful. You can reopen STDOUT to write to your buffer:

{
    local *STDOUT;

    open STDOUT, ">", \my $buffer;

    do_stuff_that_prints();

    do_stuff_with($buffer);
}

but that's better written

my ($buffer) = capture { do_stuff_that_prints() };

from Capture::Tiny.

Day 8: Mindset

2014-12-08T18:16:00.000+00:00

It doesn't matter what language you start in. The language doesn't help. The problem is you; you're the new developer, the inexperienced green sapling; you're the one with no instinct, no sense of smell, and no idea where to begin. You probably don't even have a problem you want solving.

Whenever we solve a problem we draw on our knowledge and experience to solve it. Knowledge and experience differ like theory and practice do. Knowledge is the theory. You can know something because you were told it, and it stuck. Arguably, the best way to know something is to understand it; then you know why it is the case, and what you really know is more general, more applicable, and hence more useful. Experience is practice; you've done this before. Experience is the sort of knowledge you need in order to produce a good solution to a problem, because experience tells you what the next problem is, and how to avoid it now.

Experience alters your thought process.

Today's example comes from irc.freenode.com#perl, where we see a green programmer trying to solve a problem:

Report the powers of two that sum to produce a given integer

That is, break down an integer into the powers of two from which it is composed.

Scroll no further if you wish to solve it yourself. In Perl.

No language can provide you, up front, with the knowledge you need to answer this question. Most languages have for loops and while loops, and something that can raise 2 to a power. But that's all you know. You have a few bits of theory, but no experience to draw upon. So your thought process goes something like this:

I can take a number n and find the nth power of two 2 ** $n
I can store a value and compare it to my target num $total > $num
I can loop an indefinite number of times with while
The biggest power of two less than num is definitely part of it

You reach the conclusion, using knowledge, that you can subtract ever-decreasing numbers from your target, in a loop. Any number that leaves you with a positive number simply means you can repeat the process with the new number, having remembered that particular power of two.

use 5.010;
use strict;
use warnings;

my $num = shift;
my $power = 0;

$power++ until 2 ** $power > $num;
$power--;

while ( $power ) {
  if ($num - (2**$power) >= 0) {
    say "$power (" . (2**$power) . ")";
    $num -= 2 ** $power;
  }

  $power--;
}

4 (16)
2 (4)

Reasonable. Now here's my thought process:

They want all powers of two that come together to sum a number
That's how binary works
We can ask the binary representation of num for all the on bits
The positions of those on-bits are the answer.

So we write that.

say for grep { $_ } map { 2 ** $i++ * $_ } reverse split //, sprintf "%b", shift

This is a one-liner. Try it in perl -E'...' 20, in place of the ....

4
16

OK we'll break it down, but you'll see that each section maps roughly to each of the items in that list.

"They want all powers of two"

The answer is going to be a list. say for LIST, and we have to construct LIST. The powers of two have a test for validity, so there's probably a grep. say for grep { CONDITION } LIST.

We should really build an array for LIST, and use it at the end.

use 5.010;

my @bits;
...

say for @bits;

"That's how binary works"

Getting the binary representation of a number is easy; sprintf "%b", EXPR. In the one-liner we used shift to take the first command-line argument. We can put $num here and save the result of sprintf instead of using it directly.

my $num = shift;
my $binary = sprintf "%b", $num;

"We can ask the binary representation for all the on bits"

How? This is a two-parter. First you have to turn the string into bits. Then you have to find the on-bits.

Turning the string into bits is easy - you split it on the gap between characters:

my @bits = split //, $binary;

Not obvious is the finding the on-bits. See, we don't want the actual bits themselves; all the on-bits are 1, so finding them all would simply tell us how many there are. We actually want to know where they are.

Trouble is, sprintf gives us 10100 for 20. The first bit is the high bit, but that has the smallest offset, i.e. it's the 0th digit in that string. And the other 1 is the 2th digit. Knowledge tells us that our 20 working example should report 4 and 16; but 2 ** 0 is neither of those, even though 2 ** 2 is.

The answer to this is actually in the original solution: we have to work backwards, biggest number last. That's why we reverse it.

my @bits = reverse split //, $binary;

"The positions of those on-bits are the answer"

In the final solution I report the powers of two, not the numbers we raise two to, and the positions are the numbers to raise two to, not the power of two to that. Clear?

The positions of the on-bits are found using a bit of a naughty map, which uses a counter outside its scope. map should really not have side-effects. We can work around this in a proper script, however.

By iterating through the bits and incrementing a counter as we go, we can determine the value that this bit represents.

2 ** $i++

$i++

of course returns the value of $ibefore incrementing it, meaning it starts off undefined. We can't have that.

my $i = 0;

Now we can produce a list of all those values:

map { 2 ** $i++ } @bits;

Plug this into say for debugging purposes:

say for map { 2 ** $i++ } @bits;
1
2
4
8
16

We've lost information - what happened to the fact some of the bits were turned off? Although I had this in knowledge, it was experience that reminded me that I can multiply:

map { 2 ** $i++ * $_ } @bits;

That's better - we also should always use $_ in a map because map is supposed to transform $_.

Now we have something we can grep: $_ itself!

my @powers = map { 2 ** $i++ * $_ } @bits;
say for grep { $_ } @powers;

This collects all powers, but only reports those with a nonzero value.

We can fix the $i situation by using keys on @bits. keys on an array returns the list of indices, even though they're not really keys.

map { 2 ** $_ * $bits[$_] } keys @bits

This uses $_ in place of $i (0 to 4), but now that $_ is the index, we have to get the actual bit value by looking it up in @bits.

Answers on a postcard, please

Here's the final script, then

use 5.010;
  use strict;
  use warnings;

  my $num = shift;
  my $binary = sprintf "%b", $num;
  my @bits = reverse split //, $binary;

  my @powers = map { 2 ** $_ * $bits[$_] } keys @bits;

  say for grep { $_ } @powers;

Day 4: RFPR for daemonize.pl

2014-12-05T10:07:00.000+00:00

I've embarked on a new term, RPFR. An RFPR is a Request For Pull Requests: like an RFC, except for when you've already started writing code and you want people to add features or fix it, instead of bikeshedding about the spec for it.

This first one is for my daemonize script at https://github.com/Altreus/daemonize.pl. This script is a wrapper around Daemon::Control (https://github.com/symkat/Daemon-Control), which I wrote essentially so I could type

daemonize starman --something --etc webapp.psgi
^M^M^M^M^M^M^M^M^M

... and end up with an LSB script in init, because all the default answers to the questions were right.

Unfortunately the very first time I tried to use this somewhere else I discovered that it wasn't so straightforward, so now I'd like to collect either patches or issues on the repository for features or changes that would make this script that much more useful.

Essentially the goal is to automate as much of writing the Daemon::Control script as possible, and also to have an option to write it out as an init script instead of a Perl script.

Welp, just a brief one for day 4. They can't all be deep essays on the holistic nature of abstract data.

Day 3: Different shapes of data

2014-12-03T21:45:00.001+00:00

One of the main points of suffrance for PHP is the conflation of what the rest of the world consider to be separate data structures: the array and the hash/dictionary/map/object/etc. Everyone agrees on the name of the array; less so on the name of the hash. We'll stick with hash (but later I'll say object, just to troll you).

This conflation is vehemently defended by PHP programmers, but I sense a certain cart-before-the-horse expectation if you try to get a PHP programmer to realise the problem with it. Which is to say, a PHP programmer has only seen PHP do it, and has seen how PHP works around the limitations of doing it, and therefore doesn't have the experience of languages with separate types to be able to understand intuitively that they are fundamentally different.

I'm not going to directly attack the fact it clearly has limitations, because this is acknowledged and understood; and everything has limitations. If we didn't have limitations, we wouldn't really have things at all, would we?

It is not the limitations of the aforementioned conflation that make it a problem; it is a deeper-seated, fundamental difference; logical in nature. Almost mathematically different, like numbers and vectors are.

I'm going to try to formalise the difference. Properly explain it, and make it plain.

We can start to understand the difference by scrutinising those very workarounds that PHP does use - to cope with the limitations - and the inconsistencies that we expect from any PHP anything at all ever.

Consider the array_merge function:

If the input arrays have the same string keys, then the later value for that key will overwrite the previous one. If, however, the arrays contain numeric keys, the later value will not overwrite the original value, but will be appended.

And

Values in the input array with numeric keys will be renumbered with incrementing keys starting from zero in the result array.¹

Doublethink

It is being recognised that the structure is performing two functions; the first, with string keys, has unique properties. The same value cannot be repeated in the structure, because the identifying property of that piece of information is its string name: if the array were to have two keys of the same name, it would be impossible to distinguish between them on access. We can give this concept formal terminology: it doesn't make sense.

We say it does not make sense to have two keys with the same name. Looking at this under a semantic microscope we come to the realisation that we've accidentally used two different words for the same thing: "key" and "name". The key does not have a name; the key is a name. We can't restructure that sentence to avoid using both words, because whenever we try the thing we end up with doesn't make sense. We're forced to conclude that the reason we can't make the sentence make sense is that the concept we're trying to express cannot be formally expressed. Something that cannot be formally expressed can only be described as wrong, or nonsense, or such other dismissive words. The concept does not exist to be expressed.

The second concession this array_merge makes is that numeric keys are normally sequential. This, at first glance, appears to point to another uniqueness of key; two keys in an ordinal array will never be the same, for the exact same reason: the key is the key, and any access of that key will inevitably refer to the value associated with it.

Why, then, this acknowledgement that numeric keys are expected to be sequential? That is, why, if merging two arrays with numeric keys, do we concatenate, instead of overwrite?

This question starts to show the fundamental difference between the data structures. The principle is that of purpose.

Shape of a hash

String names are often called properties. This is because they:

Tend to refer to a real-world attribute of a real-world concept, such as a person's name or an item's weight.
Don't make sense independently of the item. A person's name isn't a person's name if the person isn't involved. "Name" is meaningless if you don't know what it's the name of.
Together, as a collection, sufficiently define the object being described.

Last things last, because that's important. All the properties of an object together define sufficient information about the object to perform all necessary tasks with that object, within the system. I'm saying object because that's a word we use both in the real world and in programming. An object in an object-oriented system has properties, or attributes. And observe that it is the set of attributes, not their names, that define the data structure.

A hash, or associative array, or whatever, is defining a single thing. The keys of this hash are the properties that are required to capture the important information about that item, just as the properties of an object are.

We will call the set of keys, or properties, that the hash has its shape. We can consider that formal terminology as well².

Shapes of arrays

It is not infeasible that an object can have a numerical property. This is often proscribed by programming languages, who won't let you start property names with numerical values when defining classes, but we're talking about hashes here. They can take any string value and use it as a property for this object.

For example, perhaps this object's keys are all identifiers into other things, and all values are boolean. It's an object representing associations between other things. A node on a graph, perhaps, storing other nodes' identifiers as keys, and boolean values determining whether there's a link to it.

A stretch, but not totally crap.

What of the ordinal array then? This is just it: the index you use to access an item in an array is not a property of the array.

We can actually see this best in a Java scenario: in Java, an array is an object that contains other objects. But the array has properties of its own; a length, a max length, a stored data type. It has functions that can be run on it: push, pop, splice, etc. It does not have a property called 0, a property called 1, etc. It is a completely different thing.

In C++ the same structure (an array with flexible size) is called a Vector. This is apt. Arrays are vector structures. The thing that PHP calls a "key" is actually an index; I already used the word, and so does PHP, interchangeably. But it is not a key! A key is a property of the data structure; an index is a position in the data structure, not a property of the data structure.

The array is a line; a mathematical, one-dimensional structure. At integer points along its length can be found data of arbitrary type. But these are not properties of the array, any more than the values described by a line on a graph are properties of the line. The fact these things are in order - 0, 1, 2, 3 - is a phenomenon that follows on from the fact we're sticking more things onto the end. The ordering of the items in the array is not defined by the indicies; the indices are defined by the ordering. The data in the array defines the shape of the array.

The hash is a bag; a lookup table. There is no graph that can describe a hash, because there is no natural ordering to the keys in it. Strings don't have natural ordering: "a" is only before "b" because we invented "a" and "b" and put them in that order. We didn't invent 1 or 2 and we didn't make 2 bigger than 1.³ Is your name before or after your height? That doesn't make sense!

The fundamental difference is there, then. The keys to an array are defined by the data in it, but the keys to a hash define the data that goes in it.

¹ A salient question at this point is how do you know whether it is a string or not?. Is "0010" a string? If not, is it the number 10 or the number 2 or the number 8? All four things are valid interpretations under commonly-used rules.

² As with all language, it doesn't matter what noises or letter-strings we use to define a concept. The important thing is that we all understand the same thing when we hear or see it. Let this word stand for the scope of this post; but you'll likely see the term "the shape of the data" referred to quite a lot in general.

³ We invented the symbols 1 and 2, but we didn't invent the platonic integers that 1 and 2 refer to. There was 1 earth before we evolved on it and used the symbol 1 to represent this number.

Day 2: Opt::Imistic

2014-12-03T10:51:00.000+00:00

Can't believe I've not made a post about this ancient module. Opt::Imistic is a module I wrote to facilitate the writing of command-line scripts that take options. It was inspired by the node module of the same(ish) name, Optimist (now deprecated).

All Opt::Imistic does is to parse @ARGV for things that look like options (using essentially the same rules as Getopt::Long does with gnu_compat options, i.e. the sensible way of doing it that doesn't cause too much ambiguity.

Long and short options are recognised by default, given GNU style. -xyz is three options and --xyz is one. Use whitespace or = to specify values to options. = can be used if the value looks like an option¹.

As the docs say, this is a 90% module - Getopt::Long is for the other 90%.

Hacky magic

Opt::Imistic relies on a piece of Perl magic the reader may not be aware of, which is that, for all of Perl's global variables, it appears to be the entire typeglob by that name that is global.

Simply put, this means that, because @ARGV exists, so does %ARGV. This is exploited by Opt::Imistic, by putting discovered arguments as the keys to the associated values, if any.

Overload magic

tm604 on IRC suggested that I can be even more magical if the discovered options were actually objects of a class that behaves correctly in different situations.

Since you can't prevent a person from multiply specifying a single-use option, instead of bailing horribly in this situation it's traditional to simply take the last instance of it. This implies the option needs a value; otherwise, it doesn't matter how many times you specify it. Think --config, for example.

Indeed, if the option doesn't take a value, it's usually expected that the script is going to count the number of times it's specified. Think -v, often "verbose", or -vvv, "extremely verbose".

Perl being Perl, the user doesn't have to care whether it was specified once or many times, if all the script cares about is whether it was specified at all. Zero is the false value here.

With a simple class², entirely designed to carry overload magic, we can gather all this information at once.

package Opt::Imistic::Option {
    use overload
        '""' => sub { $_[0]->[-1] },
        'bool' => sub { 1 }
}

This covers the common uses of command-line options:

One or more values - The objects are blessed array refs. Simply deref it for your values.
One value - Treat it as a string, and it'll stringify. This also works for numbers. The overload ensures the last value is taken; all options are arrayrefs with at least one thing in them, or absent entirely.
A countable option - Simply count your arrayref.
A boolean option - Just use it in boolean context. You'll get a 1 if it's there.

Again, this is a 90% solution, but check the docs for the extra functionality I added. You can specify options are required, and specify that at least n arguments must be left on @ARGV at the end of parsing.

¹ I'm not sure whether I just came up with this or not. This might not (yet) be true.

² This package uses the package BLOCK syntax, introduced in 5.14. The module doesn't specify 5.14; this is an oversight.

Day 1: Pod::Cats

2014-12-01T22:01:00.001+00:00

Today is the first day of the advent calendar blog thing, so I thought I'd give it a whirl. Let's see how far I get.

I thought I'd do an easy one and put it out there how I actually do my blog. Well, I don't like writing HTML, and I don't like WYSIWYG editors, but I wanted something easy like blogger to actually do all the hard work for me.

I don't really like Markdown, primarily because it doesn't let me do certain things easily¹. Footnotes are something I do commonly when I'm writing²; they allow a certain second dimension to what would otherwise be a one-dimensional stream of words. In fact it's sort of a hyperlink, from before we had hypermedia.

You'll note, indeed, that my footnotes are hyperlinks. They link to their location on the page; and the footnotes at the bottom of the page link back to their marks. This is the sort of functionality I wanted from a blog markup language.

I decided that POD has a good balance of DWIM³ and expressiveness, so I took the concepts and generalised them.

This led to Pod::Cats being written. It really needs to be rewritten, now that it's something I actually use regularly. It's not my best code.

The name Pod::Cats came from a conversation I had quite some time ago in the #perl-cats channel on Freenode, wherein we thought it would be neat to have a community blog/podcast site called Podcats: the whole discussion started because someone typoed podcast.

Anyway, the module defines the grammar of Pod::Cats documents, but is intended to be extended to provide functionality. PodCats::Parser does just that. This module could also do with a refactor.

The Pod::Cats parser uses a subclass of String::Tagged::HTML (here) whose entire purpose is to just render when stringified. In fact the main module may do this now - I should check!

Bugs exist in String::Tagged::HTML whereby, because there is no inherent ordering to tags in the same place in the string, the order of render is at the mercy of Perl's hashing algorithm. LeoNerd is pawing at a solution to this, so with luck this will solve my footnote issues soon. I've been helping with moral support and distractions.

Anyway, I save my files with the .pc extension and use a reasonably consistent set of Pod::Cats commands to mark up my blog posts. The idea is to maintain semantic structure while minimising the amount of actual meta-stuff in the file itself: something I felt POD was good at, with a few amendments of my own.

Once done I simply run my script, which overwrites or creates the HTML for any .pc file with a later save date than the equivalent HTML, or missing HTML. Then I upload the HTML. This means I can fudge the HTML afterwards without worrying about it being overwritten the next time I run the script.

Images

Currently I have no way of supporting images. I did try to; I looked into how Google uploads the images to Blogger. But there's no easy way of automating this, and I really couldn't be bothered working it out the hard way, so, currently, images are inserted in post-processing.

External images are supported with the =img command with the URL, however.

Sauce

What follows is the entire .pc file for this post up to the end of this paragraph, so you can have a taste of what it looks like⁴ ⁶

Today is the first day of the advent calendar blog thing, so I thought I'd give it a whirl. Let's see how far I get.

I don't really like L<http://daringfireball.net/projects/markdown/syntax|Markdown>, primarily because it doesn't let me do certain things easilyF<1>. Footnotes are something I do commonly when I'm writingF<2>; they allow a certain second dimension to what would otherwise be a one-dimensional stream of words. In fact it's sort of a hyperlink, from before we had hypermedia.

I decided that L<http://perldoc.perl.org/perlpod.html|POD> has a good balance of DWIMF<3> and expressiveness, so I took the concepts and generalised them.

This led to L<https://metacpan.org/pod/Pod::Cats|Pod::Cats> being written. It really needs to be rewritten, now that it's something I actually use regularly. It's not my best code.

Anyway, the module defines the grammar of Pod::Cats documents, but is intended to be extended to provide functionality. L<https://github.com/Altreus/altreus.blogspot.com/blob/master/lib/PodCats/Parser.pm|PodCats::Parser> does just that. This module could also do with a refactor.

The Pod::Cats parser uses a subclass of L<https://metacpan.org/pod/String::Tagged::HTML|String::Tagged::HTML> (L<https://github.com/Altreus/altreus.blogspot.com/blob/master/lib/PodCats/String/Tagged/HTML.pm|here>) whose entire purpose is to just render when stringified. In fact the main module may do this now - I should check!

Once done I simply run my L<https://github.com/Altreus/altreus.blogspot.com/blob/master/parse.pl|script>, which overwrites or creates the HTML for any .pc file with a later save date than the equivalent HTML, or missing HTML. Then I upload the HTML. This means I can fudge the HTML afterwards without worrying about it being overwritten the next time I run the script.

=h2 Images

External images are supported with the C<=img> command with the URL, however.

=h2 Sauce

What follows is the entire .pc file for this post up to the end of this paragraph, so you can have a taste of what it looks likeF<4> F<6>

=footnote 1 Like this

=footnote 2 Because I have a lot to say and I don't want to interrupt the flow of the sentence

=footnote 3 Do What I Mean

=footnote 4 I've artificially promoted the footnotes to this point, since they need to be the last thing in the file to render properly. This is something I need to fix; footnotes should be stored and rendered at the end irrespective of where they turn upF<5>.

=footnote 5 In fact an auto-numbering system came and went and shall come back again at some point.

=footnote 6 Also available L<https://github.com/Altreus/altreus.blogspot.com/blob/master/pod/2014-12-01-pod-cats.pc|here>

¹ Like this

² Because I have a lot to say and I don't want to interrupt the flow of the sentence

³ Do What I Mean

⁴ I've artificially promoted the footnotes to this point, since they need to be the last thing in the file to render properly. This is something I need to fix; footnotes should be stored and rendered at the end irrespective of where they turn up⁵.

⁵ In fact an auto-numbering system came and went and shall come back again at some point.

⁶ Also available here

What's wrong with JavaScript in the template?

2014-10-01T22:26:00.000+01:00

Those of you keeping score will know that I recently started a new job. This one is Perl, not PHP, and so a certain level of standards is expected from the code. What with Perl having all these neato features and excellent web frameworks, I at least consider it on a par with Python and Ruby in its utility.

Perusing the new-to-me codebase I of course discover some of the hysterical raisins that live there, much of which is easily forgiven because the original coder had the foresight to apologise in a comment for doing it in the first place. But one thing stood out to me as a prime candidate for refactoring: JavaScript in the templates.

I said as much and was surprised to be posed the question, "What's wrong with JavaScript in the templates?"

Surprised not to be asked the question, but because I didn't know what the answer was. I've worked enough on the front end of previous jobs to have enough experience in the matter that seeing JS in template code makes me flinch, but never have I been asked to actually introspect this reaction and explain it.

Questions like that are primo blog post material, and it's been a while since I properly got my teeth into one, so on my journey home I put my mind to formalising quite what it was about it that made me want to rip it out and refactor the life out of it.

What it's not

Some obvious answers come to mind, with varying validity.

Is it because it's hard to find? No. Everything's hard to find. ack for it - you'll find it soon enough.
Is it because it violates separation of concerns? No. In fact, you could argue that it improves it, by encapsulating JavaScript only useful to a template inside that very template.
Is it because the only reason most people put JS in a template is so they can use the templating language to build JS? Well yes, but that's just the same question. What's wrong with it?
Is it because it's not reusable? Well, yes and no. Most template JS is not intended to be reusable; it's quite specific to that particular template, and there's little use for it elsewhere. More on this point later.
Is it the same reason we don't put CSS in the template either? Or inline in the HTML? Yes! By Jupiter, yes! We find the answer in the template itself. It's the other, main part of the template that we've not mentioned yet - the HTML.

What lies beneath

To answer the question, we must deconstruct the web page itself and look at the parts. What are we really looking at when we look at a web page? What are we really providing when we build a template? What is the purpose of the HTML, the TT2 or Jade or Mustache code that wraps or creates it?

Most web pages follow a similar structure: There's the <html> with its <head> and <body>; the body has a <div class="header"> or, better yet, a <header>, and some sort of <div id="content">. Then at last there's a bunch of stuff that finally gets to the point, i.e. displays whatever it is the page is displaying.

Most template structures separate all the pre/postamble from the content itself. Even in the CGI days we, naively but with good intent, would have a header.html and a footer.html and we would render the header, then the body, then the footer, to STDOUT. More recently, we have a single file with the pre- and postamble in it, and we import the rendered content into that. We tend to also have a considerable number of satellite template files representing handy widgets and reusable code and all the other things that I've alredy said aren't really the reason why we don't do the title of this article.

We knew then, as we know now, something we always forget to talk about; something implicit in everything we do here. While we make all these templates rendering data in consistent ways we somehow lose sight of the simplest of notions: we are representing resources.

Resource and Framing

"Resource" is a fully-functional word, writ deep into the very clay with which we make our internets; vis-a-vis HTTP. HTTP works with a verb and a noun, i.e. it says "Do this to this". "Framing" is a word I've picked to describe what it is we website-makers do to resources to make them look nice for people using browsers that conform to the standards set out to allow us to do so.

HTTP's nouns are URIs. URI means Uniform Resource Identifier. The R in URI (or URL or IRI) means resource. It means thing; it's identifying the nouns of the internet. We respond to a (request to a) URI with a resource, represented in HTML format for the purposes of this discussion. We know this, but we never say this - and so whenever we get discussions, no one ever uses it as a basis for finding answers. But the concept of resource contains the answer to our question.

When we divide our templates up into separate files there is the tacit goal that the template we use to represent the actual, specific resource contain as little HTML as possible. Why? Well, mostly for consistency. We want to frame all our resources - at least those related to each other - in the same way. That means that if we put as little HTML as we can get away with into our resource templates, we can put as much as we can get away with into our framing templates, and thus have as little variation between the rendered resources as we can. A side effect, and therefore a second benefit, is that if we want to reuse or amend our framing, we can do this in one place - it's DRY.

We already recognise the difference between frame and resource: it's encoded right there in <div id="content">. How many of your templates resemble this structure?

<body>
  <stuff></stuff>
  <div id="content">
    <% content %>
  </div>
  <more stuff></more stuff>
</body>

That right there is the boundary between Alliance and Reaver space. Uh, I mean, the place where the framing goes away and the resource begins. The resource is all the data that change when you ask for a different ID, or a different resource type. The resource is that which, if you took all the HTML away, would still be what you asked for.

I've nearly made my point

Not all resources are data. Some resources are forms. I'm choosing forms as an example for another resource type because we're all familiar with them doing stuff.

Forms contain no data, but instead prompt you for data, and allow you to create more resources. Nominally, they represent the structure of the resource type, but don't represent any particular record of that type. The form holds the key to the answer: behaviour.

Consider:

<form action="/upload_image" method="post" enctype="multipart/form-data">
  <label for="image">Upload image:
    <input name="image" type="file">
  </label>

  <input type="submit">
</form>

This is a form with a file control, as you well know. It renders as a box with a "Browse" button. This one renders with a label, "Upload image:".

If you click on the label, the text of the input, or the browse button, you get the same behaviour: a file browser pops up. When you select a file and confirm it, the name of the file appears in the text part of the input, unless some jackass has installed Uploadify or similar, and broken it.

It also renders a single submit button. The button looks like all the other buttons on your website because you don't put CSS in your templates. The reason for that is being explained as we speak. I mean, as you read. I mean now.

When you click the submit button, the browser composes an HTTP POST request to the URL /upload_image on the host that served this resource. This request contains the entirety of the selected file, encoded in such a way that the receiving server can understand it. Presumably, the resource at that URL knows what to do with it.

Now, kindly point out to me the part of the HTML snippet above that implements any of that behaviour.

It's not there.

Nouns and adjectives - that's what the HTML is made of. There is not a single verb in the entirety of that form, and yet those few lines perform, implicitly, functionality that you would probably have to look up on Wikipedia to implement yourself.

Not all resources are forms, either. Here's a video resource, shamelessly stolen from Wikipedia, and represented in HTML format:

<video src="/movie.webm" poster="/movie.jpg" controls> </video>

Here's a more familiar one:

<img src="/images/avatar.png" alt="avatar" title="Get your pointer off my face">

Noun-adjective-adjective-adjective. Noun adjective-adjective-adjective. The <video> noun:

Fetches the resource at '/movie.jpg' of the host that served this HTML resource, and renders it at the place in the page concordant with the styling associated with it and the rest of the HTML.
Puts some sort of controls on this image, probably a play button, which, when clicked, causes the resource at '/movie.webm' to be fetched.
Renders the fetched video file in situ, replacing the still image, and plays any sound that comes with it.
Renders further controls, such as a scrubber, pause, volume slider.
Affects the right-click menu of the browser to provide appropriate options to a video: save video, get URL, get URL at this time, etc.

Plus anything else I've forgotten. The <img> noun has similar, albeit many fewer, effects: the image is fetched and rendered without user interaction. Indeed, if the image is an animated gif, it will animate! On its own!

This borderline-facetious set of examples serves to point out that the browser has already got verbs. The nouns (HTML elements) say which verbs you want to use (and where to put the visuals for the user's interaction), and the adjectives (the attributes of the elements) control the parameters that the verbs need. (Fetch which video? Play automatically?)

This is called semantics.

Semantics!

I'm going to define semantics as the use of nouns to imply verbs¹. Form fields come with behaviour, and you say which behaviour you want through nouns, i.e. the choice of which input you use. Semantics also covers those adjectives that fine-tune the noun's behaviour by describing it further.

Semantics tell things how to behave based on what the resource contains. An HTML resource often contains framing. Semantics go into the HTML to tell anyone who cares which bit they can ignore. Semantics is the way you phrase things; it's how you describe the resource.

Consider:

<div id="content">

A web scraper can use this sort of thing to know what to ignore. Ignore is a verb. The HTML doesn't say "ignore this"; that's for the client to decide.

The browser isn't going to ignore it - but the browser doesn't care about this particular piece of semantics². If the CSS says to do something to it then the browser will do that to it, but the browser doesn't do that by default.

The web scraper will skip anything outside this div - provided it knows what the 'content' ID means - and the browser will do nothing based on this ID because it hasn't been told to.

That right there is the answer. There is a difference between all the things it is possible for a browser to do and all the things the browser can already do. You can stick together awesome websites entirely using HTML5 and CSS3, but often you want behaviour that is not already built-in to the browser. Maybe you want div#content to have special styling or behaviour, but browsers don't come with that built-in.

And indeed, styling is just a form of behaviour - CSS tells the browser how to behave when it renders certain elements in certain configurations. JavaScript tells the browser how to behave when the user does things.

This is the point where people start putting JavaScript into templates. A specific form needs special behaviour, so you add a <script> tag and then output the form.

Smash! go the semantics. Fie! cry the tortured frontenders.

None of the behaviour you ever write is useful only once. I told you I'd get back to the reusability point. The JavaScript doesn't go in the template because it's not reusable, sure, but why is that a problem?

The problem is the JavaScript defines verbs. Semantic HTML is that HTML which uses only nouns, and lets the browser select the correct verbs.

JavaScript, therefore, is correctly a separate resource that adds verbs to the browser, and defines the nouns to which they apply. That's why everything eventually ends up as a JavaScript plugin; and sometimes as core browser behaviour.

Essentially, we're saying that JavaScript is a CSS file that defines behaviour, not styling. Where CSS tells the browser how to interpret the semantics of your HTML in terms of colouring, positioning and so on, JavaScript tells the browser how to interpret the semantics in terms of direct functionality - behaviour.

Indeed, not only should JavaScript never go into the template, it should never go into <script> tags either. Just like CSS should never go into <style> tags.

The Related Resource

Resources have related resources. If you strip out all the framing of your HTML resource (e.g. you render it as JSON instead) you are still going to keep many of the hyperlinks - the contents of any <a> tag inside the content div, perhaps some of the image sources. That's because the HTML framing is just rendering the content in a human-readable way³. The relations between resources are actually part of the resource itself, or at least metadata to it.

This is important because it addresses one of the main reasons people put JavaScript in templates: so that they can use the template language on the JavaScript, and thus build resource-specific JS that renders, e.g., a list of related resources when you click some "See related" button.

If the resources are related they should already be in the page. I seriously cannot stress that enough. Either the related resources are, or are not, relevant to this representation of the resource.

If the HTML went away and you were returning JSON, would you, or would you not, list those related resources as metadata, one way or another?

They cannot be part of the framing: the framing is consistent across the whole site! They are unique to this resource; and the style of list that is invisible until a button is pressed is unique to this type of resource.

But is "style of list" not an adjective about this list? Is list not a noun? Cannot you use the noun-adjective semantics to say, "This is a list of related resources, and it is of type pop-up-on-button"? HTML is amply equipped to represent this semantically: we even have the rel attribute to let you specify which button should activate the list.

Related resources belong in the page. Either as a hyperlink, or directly in the HTML. If you want to save bandwidth, you don't put the whole list in, but you put in a hyperlink placeholder instead. The important thing is that the HTML is accurately representing the resource. Just like the JSON would. Don't force non-browser consumers of your HTML resource to figure out how to run the JavaScript just to get related data.

e.g.

This|http://harvesthq.github.io/chosen/

is Chosen. You've probably seen it before. You start typing in a form field, and it lists all matching options, filtering as you type.

Chosen can either use an existing set of options, such as from a select box, or a URL from which to fetch options that match the string.

Both of these can be in the HTML before the JS even runs. The list of options is a related resource; it is simply represented in different ways. The first way puts all of the related resources in with the main resource; the second way puts a hyperlink to a single other related resource, from which they can be fetched when it's appropriate to do so.

At no time is it necessary to put this data into the JavaScript. JavaScript can read. Hell, the JavaScript should work on the JSON representation and all you'd have to change would be how it finds the data.

The Answer

The answer, then, is semantics. Of course it is. But it's what semantics means that turned out to be the difficult thing to define here.

Semantics is about saying what this resource is; it's metadata about the resource itself. Semantics allows the client to make the decisions about what parts of the resource are relevant and what parts are not.

It's exactly the same principle by which responsive web design works.

It's exactly the same reason you don't put inline CSS into your HTML.

It's exactly the same reason you've never written a video player, or had to decode the JPEG file format manually in JavaScript and blit the resulting bitstring onto a canvas element.

It's exactly the same reason you don't know how to launch a file browser dialogue box.⁴

It's exactly the same reason web components exist.

It's exactly the same reason JSON resources don't come with a stylesheet or JavaScript.

It's exactly the same reason we now have <nav> and <section> elements.

It's exactly the same reason we can produce screen-reader-friendly representations of HTML pages when the HTML page is correctly structured.

It's because you are describing what the resource is, and letting the client decide what it does.

*drops mic*

¹ A separate discussion

² Not all HTML is for the browser. HTML is a perfectly sensible representation format for machine use as well.

³ Perhaps better: the HTML framing is a machine-readable way of getting the browser to render the content in a human-readable way.

⁴ In principle. HTML5 advances in file handling mean it is more common for the file dialogue to be called directly from JS.

Changing OpenElec's /tmp size

2014-04-27T12:54:00.002+01:00

OpenElec has a limited /tmp partition. Very limited, i.e. 10MiB. Many things fall over because they need more than this on the occasion - especially if it's not the only thing using the tmpfs.

In order to change this you either have to hack around with automatically-created symlinks in startup scripts, or change it yourself.

The size of the /tmp partition is stored in /etc/init.d/01_mount-filesystem

mount -n -t tmpfs -o size=10m tmpfs /var

The problem is, that file is readonly. The reason it's readonly is that the entire root filesystem is stored in a squashfs partition.

To amend it, it is simply a case of unsquashing it, fixing it, and resquashing it.

Fix it

Pull the SD card out of your RPi (I'm assuming that's where you have it) and put it into your card reader. Let your system mount it.

You should have a SYSTEM drive somewhere on your computer. Lubuntu mounts it at /media/altreus/SYSTEM, so let's go with that.

$ mkdir squash
$ cd squash
$ cp /media/altreus/SYSTEM/SYSTEM SYSTEM.bak
$ unsquashfs SYSTEM.bak

Now we have a copy of the OpenElec root filesystem in a .bak file so we can undo it when we screw it up later. We also have the files themselves unpacked into squashfs-root. This is the default place unsquashfs puts them.

$ vi squashfs-root/etc/init.d/01_mount-filesystem

Change the file to have a better size /tmp. I used 500mb because my SD card is 8GB. Ignore the first instance of tmpfs in the file; we want to change the 10mb one.

$ sudo mksquashfs ./squashfs-root SYSTEM

It's important that you do this with sudo. The file /etc/shadow has permissions 000, making it only accessible by root. This is how we got it when we unsquashed it, so this is how we want to keep it. My /etc/shadow is 600, but they presumably wanted theirs to be 000. If we want to do the above step without root, we'd have to change the permissions so our user can see it - we can't change the permissions after it's squashed, so the only way to get a 000 file into the filesystem is to squash it with root.

Anyway, done.

$ cp SYSTEM /media/altreus/SYSTEM

Your new squashfs file will be mounted by OpenElec and your tmpfs will now be mounted with the size you gave it.

I'm not 100% certain this is stable. My Pi has started rebooting occasionally; but I might be giving it more than it can handle. It is an old model, but if I've introduced a bug because 500mb is too much, or something, I'm sure I'll get to the bottom of it and update the post,

Code review time!

2014-02-27T13:06:00.000+00:00

Look! A horrible piece of code in a horrible language in a horrible frame for a sickeningly twee ceremony that should have been made obsolete along with the Inquisition!

Let's review it.

Here's the code, with line numbers.

01    <?
02      function do_wed() {
03        if ($objections != true) {
04          function do_vow() {
05            $vow = 1;
06            do {
07              if ($richer === 1
08                  && $poorer === 1
09                  && $sickness === 1
10                  && $health === 1) {
11                function have_hold($a,$b) {
12                  ini_set('session.gc_maxlifetime','forever');
13              }
14              have_hold('husband','wife');
15              define('friend', true);
16              define('partner', true);
17              define('faithful', true);
18              if ($i = 'do') {
19                   $f = 'finger';
20                   $r = 'ring;
21                   $f = $f + $r;
22                   }
23               }
24               $vow = $vow + 1;
25              } while ($vow != 2);
26            }
27            do_vow();
28            $register = array_fill($details);
29            print_r($register)
30            return $kiss;
31            }
32          }
33        do_wed();
34    ?>

Let's go!

line 1

We use long tags here. <?php

line 3

Undefined variable $objections.

$objections != true better written !$objections. But this is not what you meant; you meant count($objections) == 0, since it will be an array of them

line 4

Don't define functions inside other functions.

lines 6, 25

You know how many vows you want. Use a for loop. Better, use an array of vows and populate it with two Vow objects, which represent the conditions each person agrees to. This means you can marry more than 2 people. The do_wed() function should take the people to wed as arguments. Use func_get_args() to loop over all of them, or (...$parties) in the next version of PHP.

Useless loop anyway. do_vow() should be called twice with the person currently vowing.

"Twice" is a western concept. This code is not internationalised.

lines 7-10

Undefined variables. None of these equals 1. It is unlikely that all four of these things would equal 1 at the same time. You want to test the party's agreement to these concepts, not the value of these variables. You need Person objects.

line 11

A function in a function in a function? This function takes two parameters and uses neither. Get rid of them.

line 12

This ini parameter takes an integer. 'forever' is not an integer.

line 13

This closing brace does not line up with the function definition on line 13. It does line up with the if on line 7, which implies you've forgotten to close the function, but scrutiny shows that you've misaligned the brace.

line 14

have_hold does not take any parameters any more.

This is exclusivist. Not all marriages are between a husband and a wife. These should be parameters to do_wed().

This function is run twice, both times with the same parameters. It should swap over for the second iteration.

line 16

'partner' is presumably the person we are not currently dealing with.

line 17

'faithful' is not a boolean value and should be configured per app. It needs to be a data structure containing parameters of faithfulness, i.e. boundaries.

line 18

This is always true. Remove this condition. $i is never used, so remove the assignment too.

lines 19, 20

Useless variables. Either accept them as parameters or use the literal strings directly.

line 21

If you'd not used these useless variables you'd realise you're trying to numerically add strings. . is the concatenation operator. What is a 'fingerring'?

$f is discarded. Just omit this entire block.

line 22

What is this supposed to line up with?

line 23

This closes the if that looks like it is closed on line 13. But it does not line up with it.

line 24

Better written $vow++, but we've replaced this with an array of Vow objects containing agreement parameters, so don't do this any more.

line 25

The only reason this would be a while loop is if you're just going to keep asking until both (all) parties agree. This is not how one should enter into a marriage.

line 26

This closes do_vow() but does not line up with it.

line 27

This is what should be run n times, once per party in the agreement.

line 28

array_fill takes three parameters. Register should be an object.

line 29

Syntax error - missing semicolon.

print_r is not the best thing to use here. Serialise this properly, perhaps with JSON so it can be consumed by an API or HTML so it can be styled and displayed properly.

line 30

Undefined variable $kiss. Kiss is a verb and should be a function.

lines 31, 32

These braces should line up with what they close.

line 33

Don't run a function when it is defined - that's not how you create a library.

This function could at least be parameterised with the names of the people being married. Isn't Etsy about crafts and hence personalisation?

Model student

2014-02-06T13:16:00.001+00:00

Models! Model trains, model students, model aeroplanes, model citizens. Fashion model, data model, business model. Ford Model T. Model number.

All these different uses of the word model have a commonality, the understanding of which is important to the understanding of what it is we mean when we talk about models in computing. This commonality may be considered the abstract meaning of "model": the meaning that exists behind all the real-world uses of it.

This concept is that of representation. Physical models are scaled-down representations of the things they model. A fashion model is really the representation of real people who would wear clothes (showing quite how divorced from reality fashion really is). A business model is a wordy representation of how the business will operate. Even the term "Ford Model T" is actually referring to the blueprint of all cars of that type: "Model" is referring to the type, not the car itself.

In computing, then, a model is a representation, a blueprint, a prototype that encapsulates the important details about the thing it is modelling. A good model will be a minimal but sufficient representation of the system it is modelling.

An easy example is the rolling of dice.

1d6

Dice are a familiar system to everyone, I hope. They neatly encapsulate our idea of randomness, at least that one we're taught in primary school, whereby the outcome of the system is not predictable from the input.

When we roll a d6 we expect to see one of its six faces pointing upwards but we don't know which one until it does so. Indeed on most dice we see the number represented as a pattern of dots; the number of dots being the number it shows.

This, if you're not used to thinking in these terms, is very specific. There are many extra features of a d6 that have nothing to do with the randomness of the d6. Every feature of the die except its shape (and mass distribution) can be altered and it would still exhibit the same properties of randomness.

Modelling systems, therefore, requires a keen eye about what are the underlying mechanics that allow the system to work, and what are the superficial parts of it that happen to be the case in this particular instance.

At its barest, a d6 is a system that, when run, produces a random integer from 1 to 6. The random distribution is even across all numbers: which is to say, the more times it is rolled, the more we expect to see the counts for each result become equal.

To model a d6, therefore, we simply need a system that can produce the same result.

Math.ceil(Math.random() * 6)

This piece of Javascript models a 6-sided die. Run it in your browser's console if you don't believe me. Run it lots. Here's what happened when I ran it 50 times¹:

[2, 2, 6, 3, 5, 4, 3, 3, 2, 4, 
 1, 5, 3, 4, 6, 1, 6, 6, 4, 5,
 3, 1, 6, 5, 2, 4, 6, 6, 6, 5,
 3, 6, 1, 2, 3, 2, 3, 3, 1, 5,
 2, 5, 3, 2, 4, 3, 5, 6, 6, 5]

And sorted:

[1, 1, 1, 1, 1, 2, 2, 2, 2, 2,
 2, 2, 2, 3, 3, 3, 3, 3, 3, 3,
 3, 3, 3, 3, 4, 4, 4, 4, 4, 4,
 5, 5, 5, 5, 5, 5, 5, 5, 5, 6,
 6, 6, 6, 6, 6, 6, 6, 6, 6, 6]

At this level, Javascript's RNG² should be roughly uniform in distribution, and with true randomness we should not expect uniform results at such small quantities. This distribution certainly seems random and within parameters for uniform distribution, so we've simplified the concept of a d6 into a minimal and sufficient algorithm.

dn

Not all modelling is about functionality. Much of data modelling is about just that: data!

A model like a d6 is fundamentally fairly useless. Indeed the idea of a d6 is just a very tight constraint on a very useful concept - randomness. It serves little purpose to model a d6 specifically, because the number of uses for a d6 is, in the grand scheme of things, small.

In the real world, we use models in computing for two basic purposes: retrieval and prediction. The first one is used to store representations of things that exist, such as people or products. Those are data models. We store these data models to let people log into a system, or to display a list of the products to customers. The second is used to try to work out what would happen in certain situations, based on the understanding that we have about the system in the first place - such as weather. These are functional models, of which the d6 above is one example.

In both situations the model is useless without the things being modelled having data. Properties of the objects store information about the objects and supply parameters to the algorithms we've devised.

We have hit upon the idea of parameterising algorithms. As noted, the d6 algorithm is somewhat useless because all it does is model a d6, which is of limited utility.

We can increase the utility by modelling the algorithm of any die. This is the second thing to be aware of when learning to abstract away the fundamentals from the real-world example. Earlier, we learned that we can turn a gazillion atoms' worth of die into a few electrons' worth of RNG by simply taking a number between 1 and 6 - this is the fundamental behaviour of a d6.

Now, we can look at other real-world dice and see how their behaviour relates to the d6:

A d4 picks a number between 1 and 4
A d6 picks a number between 1 and 6
A d12 picks a number between 1 and 12
A d20 picks a number between 1 and 20
A d100 picks a number between 1 and 100

It doesn't take a complex neural network to see the pattern here. A dn picks a random number between 1 and n.

If we wanted to model a d4 we could amend our d6 model:

Math.ceil(Math.random() * 4)

And we're done. Well done! You've invented job security. Now we've got two models for two different scenarios, and we know how to repeat the process for any die we like.

You should at least by now have the feeling I'm leading you to a point; and if you haven't guessed it yet I'll make the point.

We haven't modelled the pattern.

You can model dice until you're blue in the face but a good model captures the fundamental principles. The d6 model captured the fundamental principles of a d6, but we want a model that captures the fundamental principles of all dice. We need to model the abstract; the pattern that we spotted when we listed our dice.

Abstraction

"Abstract" is another one of those words that no one understands until they're faced with it, and then it confuses them until they understand it, and then they realise why it's been used all along. Most people know abstract as a form of art, and therefore associate it with meaningless shapes and random colours or something.

The abstract of something is those features about the thing that remain behind when you take the actual thing away. The abstracts are those conceptual things that mean you can describe it without actually having one; but which, if you had never seen one, would mean you may recreate a different thing.

This is what we did with the d6. We took the abstract concept of a d6, which is to randomly generate a number between 1 and 6, and then we recreated it in an algorithm that looks nothing like a die. It's a string of characters on a screen, now. It doesn't even roll. Or bounce.

Abstracting across many things is an art form in itself. For a start, the things have to be related, or else there's no real abstraction to make. Secondly, the degree to which things are actually related to one another can vary wildly, so knowing what level of abstraction to make is also a challenge. Thirdly, abstractions themselves may be similar; in which case you can start relating things that look the same in the abstract but are entirely unrelated in real life.

Now that I've thoroughly lost you, let me bring you back to earth. When we laid out all the dice we know and examined how they work we saw a pattern, which is that a die with n sides is an RNG between 1 and n. A pattern is something we can model; we model it with parameterisation.

Parameterisation is when you take a series of concrete examples and you remove one of the things from it and replace it with a variable; in this case, we replaced all the numbers with n³. The multiple types of die have been reduced to a single type, whose number of faces is now variable.

The number of faces the die has is now a property of the die. We have a model with data!

How do we represent it? Well in Javascript terms, parameters are given to functions, and objects have properties. We can divide the model into the two parts, functionality and data, by using a function to represent rolling a die and an object to represent an actual die.

function rollDie(die) {
    return Math.ceil(Math.random() * die.sides);
}

var d6 = { sides: 6 };
var d12 = { sides: 12 };

Here we have one function that will roll a die and return the result. Then we have two dice, each of which is a simple object with the property sides. Inside the rollDie function we use the sides property of something called die, which we can see is mentioned in the parentheses in the function definition. This together means that whatever is given to rollDie is assumed to be a model of a die, and to have a property sides that represents the number of sides it has.

rollDie(d6);
rollDie(d12);

If we provide a die model as a parameter to the rolling function, the rolling function can inspect the property of the model, extract the data, and use the data in the original algorithm. The algorithm has not, fundamentally, changed. It is simply the case that now it is parameterised; which is to say that instead of duplicating the function for every possible invocation, we can create data models that represent the thing we are dealing with, and provide the data to the function. We have abstracted the pattern (1dn returns a number between 1 and n) by making the variable, n, well—variable!

Verbs and nouns

The world is made of verbs and nouns. Systems verb nouns. People roll dice. People buy products. Computers authenticate passwords. Ecommerce systems suggest related products. Search engines search documents. URLs refer to resources.

Our data models therefore comprise verbs and nouns. Our d6 model was a verb⁴, but the noun was hard-coded. Hard-coding is the failure to parameterise. Instead of accepting a parameter, the noun - d6 - was assumed by the verb, because the verb was the whole of "roll a d6".

Our later model had a verb, rollDie, which could roll any noun that looked like a die. It had two dice, d6 and d12, which represented 6- and 12-sided dice, respectively. But the rollDie verb did not rely on those dice. The verb was abstracted from the nouns because with the new verb, anyone can create a die of any size and roll it:

var d27 = { sides: 27 };
rollDie(d27);

... so long as they have access to the verb part - the functionality - of our model.

By parameterisation we can turn a verb into a verb and a noun - "roll a d6" turns into "roll" and "a d6". By doing the opposite, we can turn a separate verb and noun into a single verb. Good modelling comes from learning when it is right to include the noun in the verb, and when the noun is a parameter. In some cases, the noun is fetched from somewhere else - a different verb (to fetch) and a different part of the model, with its own nouns.

In the real world, computer modelling is much more involved than this. Data are often linked to other data, such that if one changes another must reflect it. A shopping basket, for example: if you add an item to the basket, the total must increase. If you change the quantity of an item, the subtotal for that item must increase, and so must the basket total.

In that example, we already introduced nouns and verbs that we can model. Basket; item; total; subtotal; quantity. Some of these are things, and some of them are properties. Some are both! Items are real things, but the list of items is a property of the basket. The total is a property of the basket, and the subtotal is a property of the item when in context of a basket and having a quantity!

Sometimes we replace nouns with verbs: instead of storing the total, we may choose to calculate the total on demand based on the items.

Sometimes we replace verbs with nouns: when you roll a die, its value remains the same until you roll it again, but you should be able to ask it what value it shows. Our model could not do this. Alas! Our simple and sufficient model is no longer sufficient.

Sometimes we separate a verb into a verb and a noun: we turn rolling a d6 into rolling, and create a d6 to roll. This allows us to either roll a different die, or do something different to the die.

Sometimes we combine a verb and noun into a single verb: when we get the total of a basket, we don't separate it into "get" and "total"; if you change the noun here, the verb makes no sense!

Even a simple example like a die can escalate, and it is easy to get overwhelmed by the interactions—imagine the complexity of a "simple but sufficient" model of an entire shop!—but ultimately we are modelling nouns and verbs; all we have to do is parameterise correctly and find the correct abstractions.

Modelling systems

Hopefully you will have, by means of a concrete example and a lot of nebulous ideas, some concept of what it is to model things in computer systems. Ultimately, you will need some way of defining functions - a programming language - and some way of storing data - maybe a database.

Modelling a system therefore involves a good eye for what is a verb and what is a noun. That is to say, if you want to "roll a d6", does this suffice as a verb? Or is "d6" a noun? What if you want to "calculate the total"?

There is no cheat sheet here. Experience is your best recourse. But perhaps we can jot down some things to consider when modelling a system.

How big is the system? The d6 system was small, but the shop system was large. Can it be smaller systems?
How big are the nouns? A d6 has 6 faces, but the number 6 is enough to model that. Meanwhile, a basket has many items, but more information is needed; items are separate things, but faces are not.
Can you de-noun your verb? Does the verb make sense on other things? Does it actually? You can roll anything with sides; but can you get something other than a total from a basket? Can you get a total from something other than a basket?
Can you combine a verb and noun? Have you gone too far parameterising? If your shop has only one basket, the basket is not a parameter: the verbs can assume it.
Can your verb fetch a parameter, instead of accepting or assuming it? When you roll a die, perhaps you can establish elsewhere which die you are rolling. Perhaps the items on a basket know they are items; and there is only one basket, so you can get the items when you need them.

That's all for now on models. In future posts we will take a look at how data get around inside these systems, how we store them, and the transient nature of data while the system is actually running.

¹ var a = [], i = 0; for (i = 0; i < 50; i++) { a.push(Math.ceil(Math.random() * 6)); } a;

² Random number generator

³ Replacing all the ds with m may be a tempting thing to do here, but we shouldn't. That's because d has been constant across all of our examples; it simply serves to refer to the thing we are modelling in the first place. n is the new variable, because the thing it has replaced varies. d, being constant, is the thing our model is taking away entirely! It serves no purpose to know that we are rolling dice, any more; the d is therefore simply our reminder about what we are aiming for.

⁴ Commonly one would not copy-paste an algorithm into a console and run it. Instead, the algorithm would be packaged in a function and the user would be told to run the function. We did this later, when we parameterised, but to simplify and save on explanations, we avoided using a function in the first examples.

Declaring your intent

2014-01-23T20:30:00.000+00:00

In Perl it is necessary to declare a variable with my (or our) before using it. This behaviour is enabled with the strict pragma; and recently it has become the default.

Why?

Today's theme explores the idea that, when writing code, there is meaning in every statement. A good portion of code will comprise statements that actually implement the logic that causes the program to do what it does; but often overlooked are the statements such as these my and our declarations, which explain your intention for the variable before it's ever even used.

We'll look at some of the simpler reasons behind it, and later on we shall look at the less apparent ones.

Requesting

In these cases the intention you are declaring is simple: "I want to use this symbol."

The humble typo is the most obvious reason espoused for requesting new variables: it stops you using something else. But in Perl this actually covers at least three separate types of typo, all of which are solved by declaring things before you use them.

Misspelling it later

Misspelling the variable later on is the most common failure.

my $hard_to_spell_name;
$hard_tp_spell_name = 'cats';

Global symbol "$hard_tp_spell_name" requires explicit package name at script.pl line 3.
Execution of script.pl aborted due to compilation errors.

Saying you want to use symbol A and then using symbol B is an error it is trivial to pick up on.

Misspelling it now

This is less common because you usually spell the variable name right when you create it because you've just spent ages trying to come up with the name in the first place. It's the same declaration, except you meant B and B, rather than A and A.

my $hard_tp_spell_name;
$hard_to_spell_name = 'cats';

Global symbol "$hard_to_spell_name" requires explicit package name at script.pl line 3.
Execution of script.pl aborted due to compilation errors.

Forgetting

This requires a module, but declaring your intent allows the warnings pragma to tell you when you didn't use a variable you asked for.

Install warnings::unused from CPAN in the usual way.

use warnings::unused;
use strict;
use warnings;

my $foo;
my $bar = 'cats';

say $bar;

Unused variable my $foo at script.pl line 5.

Typing

By this I mean the type of the variable, not the typing you're doing when you make a typo.

In this case, you've declared an array and then accidentally used a scalar, or forgotten it's not an arrayref, or something along those lines. This is also the sort of protection you get from languages with a more C-style typing system, where you have to declare a variable by defining its symbol name and its type (int i;). Basically even though you spelled the symbol name right, you're using it wrongly.

my @array_of_cats;
push @$array_of_cats, 'cat';

Global symbol "$array_of_cats" requires explicit package name at script.pl line 3.
Execution of script.pl aborted due to compilation errors.

"You're using it wrongly" is a perfectly reasonable statement here. That's because you declared what "right" is: "wrongly" is directly determined by your own my statement.

Overwriting

Reuse

If you are required to declare your variables the first time you use them then you will always do so. This means that the keyword my is not only used to declare that a variable is supposed to be available, but also to declare that the variable is supposed to be new.

Hence, if you try to introduce a variable that already exists, it tells you off, and thus you avoid clobbering an existing variable.

This behaviour is actually only a warning, so comes from use warnings; rather than use strict;. However, it is still a result of declaring your intent.

use strict;
use warnings;
my $cats = 'cat';
my $cats = 'horse';

"my" variable $cats masks earlier declaration in same scope at script.pl line 4.

Clobbering

It is easy to forget that the use of my and our produce lexical variables. These are variables that are only visible within the block in which they are defined (treating a file as a block for this definition).

With my you simply cannot clobber this variable from anywhere else. It is either a compiler error, or a different variable.

# This sub is useless and does nothing
sub one {
  my @cats;
  push @cats, @_;
  return @cats;
}

# This sub can't see @cats from the other sub!
sub two {
  push @cats, @_; # line 10
  return @cats;
}

Global symbol "@cats" requires explicit package name at script.pl line 10.
Execution of script.pl aborted due to compilation errors.

Or:

# This compiles, but is a new, separate array of cats.
# It is fractionally more useful than sub one.
sub two {
  my @cats = ('default_cat');
  push @cats, @_; # line 11
  return @cats;
}

A bonus of my is that when the block has executed, the variable is tidied up. That is, it falls out of scope. This also works in loop bodies, allowing you to trash and recreate data in every iteration by putting a my line inside the loop.

package Cat {

  my @cats;

  # Both of these use the same @cats - the one above!
  sub one {
    push @cats, @_;
    return @cats;
  }

  sub two {
    @cats = ('default_cat'); # whups, overwrote the whole set!
    push @cats, @_;
    return @cats;
  }
}

@Cat::cats = ('cat_one', 'cat_two');

Here, @cats is available to be clobbered anywhere in the Cat package¹. However, because it is lexical, it is only available within that block². Line 18 appears to be altering the same variable (@cats within the package Cat), but in fact this is creating a new package variable in Cat³.

The intent of using my to declare @cats therefore is to have a variable available throughout the package, but not to be available without the package.

There is a subtler declaration of intent. The position of this my statement declares that this variable is intended to be used throughout the entire package; therefore it should be applicable to the majority of the behaviour in the package. Were this not the intention, the my statement could be put in a block that encapsulates the variable and any places it is supposed to be used.

our is a similar beast, but it adds the ability for outsiders to also alter the variable, so long as they do so explicitly. The following code differs only in the use of our:

package Cat {

  our @cats;

  sub one {
    push @cats, @_;
    return @cats;
  }

  sub two {
    @cats = ('default_cat');
    push @cats, @_;
    return @cats;
  }
}

@Cat::cats = ('cat_one', 'cat_two');

Now, the variable @cats inside the package's block can also be accessed as @Cat::cats from outside of it. This is the intent you declare when using our.

¹ Normally, the package would be defined in its own file, but this format is common for single-use packages, especially in tests.

² When the package is defined in its own file, the file itself is the scope for such variables.

³ The reader should be aware that this is the reasoning behind the message Global symbol "$foo" requires explicit package name when strictures tells you off for an undeclared variable. Any variable name can be used, so long as it explicitly declares a package name like in this example. The difference between a lexical variable and a package variable is not in scope of this blog post.

Fixing PHP

2013-10-18T16:05:00.000+01:00

PHP is not a bad language.

Come back. Let me rephrase that.

PHP is a terrible implementation of what under the surface is a perfectly adequate, dynamic scripting language. Unfortunately it is implemented as a poorly-thought-out, logically bereft templating language, peppered with pitfalls and irritating inconsistencies.

But it can be fixed. It can be fixed with some simple, non-backwardly-compatible, sensible, welcome-to-the-real-world, feasible alterations. Let us begin.

1. Get rid of <?php ?>

The fact that PHP used to be a templating language is archaeologically apparent in this vestigial remnant from a bygone era. These tags are still all over the place because PHP is trying to be two things at once: both a templating language and a scripting language.

Once you grow up (or metastasise) and become a real language, you have to put away childish things.

These break-in-break-out tags were fine when PHP was designed to be parsed by a Perl script and run as a simple if-this, for-each-that dynamic HTML page generator. They remain fine, if you want to use PHP as the templating language it is. But if PHP wants to be taken seriously, the first thing it needs to do is stop hanging on to that I-can-do-templates-me attitude, and hand over to one of the many modern alternatives that have come along since the Internet was still finding its feet.

In fact there's no real reason PHP should not remain a templating language. After all, Mason (and indeed Template Toolkit) allow you to inject actual Perl into your web templates for those times when you simply can't be arsed to abstract your logic to where it's supposed to go. However, if PHP is going to behave like this, it needs to understand there is a difference between a PHP template and a PHP script.

Therefore I propose

1a. Create `.php` and `.phpt` file types

Or suchlike. .php files would naturally be PHP scripts and do away with that ridiculous <?php header that persists throughout PHP projects like a blight. .phpt or suchlike would be recognised as text files containing PHP segments, and they can use the old break-in-break-out paradigm to inject program logic into the template.

Of course it is not recommended in Mason or TT2 that you use actual Perl in your actual templates, because then the temptation is just to merge your views with your controller logic, and then you get into a Right Mess. Better would be simply to have a PHP port of TT2 or Mason, or use Twig or Smarty, and allow those to have their own this-bit-is-PHP-and-I'm-sorry directives.

1b. Make it a decent templating language too

It's a bit of an issue that PHP is stupid, as well. Modern templating languages offer myriad text processing options as part of the language itself. An example is the way Template::Toolkit allows you to filter output text through, e.g., the HTML filter, sanitising the data just before it's output.

PHP's best answer to this so far is user-written PHP classes that render PHP templates (two entirely different things written in the same language) by sanitising the data assigned to them at some time or other just before the template file itself is actually rendered.

That's just one example. PHP is not really a templating language any more either, because templating languages have evolved past the very basic output-string behaviour that PHP was originally tasked with. PHPT would need to catch up as well, and separate itself from PHP proper.

2. Stop pretending everything is an HTTP request

That PHP never left its template roots shows when you try to write command-line interfaces into your business software. You realise that you've been assuming throughout the code that the $_SERVER variable actually contains a URI of some description; that there's a protocol; that you're outputting HTML.

As soon as the first file that started with <?php and didn't contain a ?> was created, PHP was broken. As soon as you create a file that contains utility functions, or classes, you have a file that you can run without a webserver . As soon as you have that , you have a scripting language. That was the point at which people should have stood back, taken a look, and dived in to PHP 4 or whatever with the attitude that this time we're going to do it right.

No one did.

PHP still outputs HTML whenever it feels like it - see var_dump . It still has global, HTTP-centred variables. It doesn't do exit codes properly. The fact that exit and die are the same damn thing just shows that someone somewhere has completely misunderstood the point of these things. Heck I don't even know whether error messages actually go on stderr.

At about the time PHP was swapping its soft teething toy for its first big-boy spoon, the rest of the world was discovering that if you interface your HTTP server with your scripting language via stdout, you can maintain a separation of interests wherein your entire business logic is a collection of useful modules or classes or whatever, which when used in a web environment can be wrapped in an HTML layer and called a website - the layer being swappable for a CLI one that outputs the same information in a salient format. Or a JSON one, for public APIs, or even private, socket-based APIs that don't touch either HTTP or even TCP!

Nope. In PHP's land of unicorns and rainbows the whole world is an HTTP request. The world springs into existence when the request begins and disappears when the response is sent, and if anything happens to be left around since the last universe's brief lifespan came and went then that's just something we have to deal with as part of our new one. Trying to leverage command-line support, or non-HTTP support, into this assembly of spit and chewing gum is baby's first steak knife to PHP.

3. Use your own exception mechanism

Nothing is as irritating while working with PHP as when it throws its toys out of the pram. Now, I'm quite happy to accept that a parsing error is completely unrecoverable, but that is it, and absolutely it. Anything and everything that happens at runtime should be tryable, and anything that ever goes wrong should be catchable.

This expected feature of the language should not be taken as a comment on the sense in doing so. Trying to call $app->run() and catching it when it fails is going to be a bit less useful than letting it fail and tell you what was wrong.

But being able to catch it - now that's a tool we need. Since the original error mechanism was put in place a new, superior nonlocal return is available, and one which puts control in the hands of the user (without horrible set_error_handler hacks). Might as well use it.

4. Tidy up the root namespace

We get it. You like functions. Well, take stock and look around you. Not only have you implemented exceptions and then completely failed to use them, you've also implemented classes, interfaces, namespaces, closures and traits and failed to use those as well!

Right. For a start, having all those functions is confusing because there's no consistency in them. I'm not going to rewrite the entirety of A Fractal Of Bad Design , but I'm going to borrow from it here. Some of the functions have underscores, some don't ( strpos / str_rot13 ). Some take arguments one way, some the other ( array_filter($input, $callback) / array_map($callback, $input) ). Every time we use a built-in function we have to look up how it's spelled and what order the arguments are in and there are so. Damn. Many.

Secondly, certainly PHP has to lookup every called symbol in both the user's own symbol tables as well as the language's. That sort of thing is surely expensive, especially if this language is aimed at beginner programmers who are only ever going to use 10% of the functions 90% of the time.

Thirdly, every single built-in function or class is just another name that the user can neither use for their own functions nor override to replace. Sure, PHP has modules that you can jump through hoops to install at the C level, but who needs that?

All of this might be forgivable if this overabundance of global functions covered literally every possible operation a user could conceivably want; but it doesn't! Worse still, a majority of them can trivially be abstracted into one generic function that takes a callable. All the array_* functions, for example: the sort functions are all just user sort with different sort procedures passed in. The filter functions are all the same with different identity functions passed in - and, for a specific example, recently I needed a version of array_search that took a custom identity function! How dare I want the key of a value that has a sub-value that matches my input! PHP says I may not do that and therefore I may not do that.

Ridiculous. The fact the PHP team haven't abstracted this stuff sensibly does not speak in favour of their ability to write the code behind PHP in the first place, does it? It doesn't take a genius to tidy all this up, and yet no one has - nor has anyone written the tidied version alongside. That attitude of constant implacability hurts the language and the community and the reputation of the people behind it, and damages confidence.

Hypothetical inefficiency aside it's just poor maintenance. The language has a mechanism by which to automatically find class files when a non-existent class is requested. So, put all the less-common functions in autoloaded classes and put those classes somewhere discoverable. Everyone else is modular these days. Is it stubbornness or incompetence that's leaving PHP behind?

Also, quit adding useless prefixes or suffixes to your functions. I know you're going to push onto an array because you push onto arrays. So call it push , not array_push .

Also also, don't fob us off with mb_ crap. Fix your Unicode. There's no excuse whatsoever for a language prevalent in the 21st century to be coded by people who can't cope with Unicode, or its various representations. I know, it's hard. Writing a language is hard. If you can't, don't.

5. Expressions, for the love of god

PHP's compiler is apparently written by chimps. Do we still really believe that there is a difference between a statement and an expression? Do we really still have to have "language constructs" (PHP's term) that are parsed and treated differently from any other expression?

No. Maybe back in the stone age we did things that way but here in the age of enlightenment we have come to realise that the only real difference between a statement and an expression is that a statement actually has a persistent effect.

In PHP, for example, the x or y construct has become possible. Except when y is not an expression - which is 90% of the bloody language. return is not an expression. continue is not an expression. die is not an expression, but it is special-cased to work with or , and has been since before we even had the x or y construct in the first place. Because Perl did it. exit is not an expression and does not have the same special-casing in the language that die does, even though it is the exact same thing .

Another example. Normally, () is used to group things, i.e. to override precedence. I'm quite OK with the way it's required for function calls, conditions etc. In PHP, however, these seem to form a magical, ref-breaking construct that is parsed under its own rules. That is to say, in PHP, $a is not guaranteed to be the same as ($a) . That's because PHP is a language whose every feature is a special case in the parser. If $a is a ref, ($a) is not any more.

So what's the point of all these examples? Well hopefully they all bring up the obvious question: why? Why are these things different? For a given X, why does the way you use X have to be allowed by the compiler?

A language built out of expressions is obvious - expressions are what make the operands to operators. And an operator is itself another, larger expression. Suddenly the parsing should seem trivial; you look at a line of code, decide which operators and expressions it contains and run them in a well-defined order. You can see it in the language that when you use an expression it behaves exactly like you'd expect any other expression to behave. At least, it compiles like that - runtime behaviour may be bizarre.

It's trivial to draw up a simple table of PHP's main features in terms of expressions; in all of this the reader is invited to consider in what situations these do not work in PHP's current implementation, and what it means about the compiler for that to be the case. In the table, X and Y mean any expression, i.e. literally anything that compiles.

Construct	Meaning	Examples	Notes
`${X}`	The value referred to by X	`${$foo} # $$foo` `${f()}` `$a = &$b; ${$a}`	When X returns a string, look up that variable. Otherwise, treat it as a reference. When X is another variable, the `{}` can be omitted.
X [Y]	Return the element Y from the array X	`$array['foo']` `f()['foo']` `x()[y()]` `['a', 'b', 'c'][0]`	This implements the "feature" that is "special" in PHP 5.5 of array literal dereferencing (example 3)
X()	Run the closure X	`f()()` `$x()` `['a' => function() {}, ...][$x]($y)`	Actual functions like `f()` are separate, since `f` is not a valid expression.
X or Y	If X is false, run Y	`$type = $config['type'] or continue;`
X and X	If X is true, run Y	`$val = $config['x'] and return $val;`

The reader should take away from this at least the awareness that all of the examples in this table would already work if PHP used a proper expression-based grammar; but instead we have been sold these things piecemeal over the past few versions as new features important enough to go on the front page of the release notes.

6. Complete the complement of magic methods

__toString is a pretty good method. It uses an established consistent convention that double-underscore means special-to-PHP. It uses dynamic dispatch so that if it exists it's used, and if it doesn't there's no "default" behaviour - it just complains.

There are also __isset , __set , __get etc. These do what you'd expect: test for setness, default setter, default getter...

Where's __toInt ? __toFloat ? __toArray ? Why is __toString represented and not the others? Furthermore, if you can use a string as an integer and only complain after this conversion, why don't you use __toString first and then try to turn the result into an integer?

Consistency is paramount in a structured, logical world such as programming. Expectations being formed and then violated is the worst of things. It's the Principle of Least Astonishment . Use it.

7. Stop pretending you have types. Or: Have proper types.

What in god's name is this? (int) $val

"Casting," I hear you cry. "It is casting the type of $val to int !"

"Rollocks," I reply in a PG way. For casting is the act of converting a type through known mechanisms to another type. But we don't have __toInt to convert all possible $val s to int , and we don't have mechanisms to convert all possible types in place of int in the first place.

Nope, it is another special case in the PHP compiler, where someone saw another language doing something and implemented the same syntax but completely failed to understand what it was doing, and implement the theory rather than the practice .

What about this? function foo(array $arg)

"Type hinting!" comes the call from the thousands-strong crowd. But if I ask them to explain this mechanism they roll out the usual approximately-right answers they read in the documentation but cannot explain the concept.

PHP is a dynamic language; that's one of its strengths. Dynamic means that PHP exhibits certain runtime features that static languages require at compile time. For the purposes of this section the dynamic features we are interested in are:

Runtime method lookup. If an object can perform a method, the method will be performed. If not, a runtime exception is thrown. Inheritance introduces methods from other classes into the object's symbol table, assisting DRY, but otherwise there is no reason every method could not simply be dynamically dispatched to a function somewhere using magic.
Automatic type conversion. If an operation requires a string and an integer is provided, or an integer and a string is provided, or a string and an object is provided, PHP will transparently perform the conversion at runtime and only complain if it didn't work.

Now apply your theories about type hinting to this. What can it do but cripple PHP's dynamicity? Duck typing is the principle by which, if you have dynamic method lookup, an object only has to be able to perform a task in order to be considered suitable for the task. That is, until runtime, until you actually try to run the method on the object, there is no way to know that the object cannot do it. If there were you would have sacrificed dynamic method lookup for static compilation already. Type hinting for classes is completely non-semantic if you have the option of duck typing, because there is literally nothing special about your particular class that makes it important that an object is of this type.

How about non-object type hinting? Well you can't actually do that, because int and string aren't types to hint about - probably because any scalar can be used as a string! And any string can be used as an integer! So why enforce the check? Or, from the other perspective, why aren't they types? I can cast to them; why can't I require them?

And why can I require classes but not cast to them?

If we look at the whole type system of PHP as a looser concept than PHP makes it, it makes a lot more sense.

Classes are not some promissory aspect of a piece of data that ensure the datum can perform tasks, but an organisational structure allowing you to introduce functionality from other classes into new ones by inheritance or merging traits. From this perspective, duck typing makes sense - you don't need a specific class to ensure an object can perform tasks; any class can theoretically do it, especially if it consumes a trait that provides it. Type hinting for classes, from this perspective, is logically inconsistent with traits - which are considerably more useful - because you can't test for what a class can do , which is the only thing that's important.

Similarly, basic types are not remotely based on reality either: even if you could ask for a string or an integer, assuming we get the rest of the family of magic methods, any object could have __toString or __toInt . And even if we don't get __toInt , a string can be an int . So if you ask for an int, you could give a string, and you won't know the data the string contains are bad until you try to use it as an int. And you should be able to give an object to a parameter that wants an int simply by casting it to a string and then an int - something PHP should be doing for us already.

Hopefully the reader has spotted the inconsistency between type hinting and a dynamic language: the language cares about what the datum can be , but the type hinting cares about what the datum is . There is absolutely no logical association between what the datum is and what it can do , because Dyamic Point 1 allows for any object - independently of class , thanks to traits and __call - to be able to perform any task; and Dynamic Point 2 allows any type - thanks to __toString and the proposed __toInt and __toArray - to be any other type.

If you're going to have type hinting, therefore, you have to have statically compiled types: you have to enforce the relationship between type and behaviour; otherwise, your type hints are just extra bytes in a file that are going to appear in a commit log at some point in the future deleted by some frustrated developer trying to implement a trait and use it in a method that doesn't expect it.

That's all

I'm sure I could find many more examples of things PHP can fix at a basic level and stop being so irritating about simple things. You'll note I didn't complain about the tiresome conflation of array and dictionary, despite it being the biggest misunderstanding in programming history.

But surely this is a start? We can keep most of the PHP grammar; the syntax doesn't change (much); and so many of the pitfalls and gotchas that a programmer falls into will be resolved in one fell swoop!

As with many things PHP has reached sufficient mass that nothing important will ever change, because the politics of the mailing lists drag everything down, with half-right people expressing their ill-informed opinions on stuff that really, actually matters.

And there's the rub; the alternative is to start again. Start a new, similar language, on the right foot. A language that doesn't have those tags; a language that interfaces with the standard streams properly; a language detached from the web server, that doesn't assume a web environment; a standalone, dynamic, modular language, easy to learn, easy to stick together, easy to run on any decent OS and the not-decent one.

But why? We already have Perl and Ruby and Python. The amount of changes required to PHP means that literally the only reason to improve it at all is that it's associated with the name PHP. Installing it, upgrading it; these things would take an identical amount of effort as simply using an alternative. It wouldn't be sufficiently backwardly compatible that existing PHP code would run, because all the crap you have to do in existing PHP code wouldn't be possible or necessary.

It can still be done, though. But it won't.