snapsvg

2017-02-10

Bootstrapping Perl

This blog post shows a simple, hands-off, automated way to get yourself a Perl environment in user land. If you already know enough about all of this to do it the hard way, and you prefer that, then this post is not aimed at you.

Here's what we are going to achieve:

  • Set up a Perl 5.24 installation
  • Set up your environment so you can install modules
  • Set up your project so you can install its dependencies

These are the things people seem to struggle with a lot, and the instructions are piecemeal all over the internet. Here they are all standing in a row.

Perlbrew your Perl 5.24

As this blog post becomes older, that number will get bigger, so make sure to alter it if you copy this from the future.

Do this as root:

Debian

apt-get install perlbrew

FreeBSD

fetch -o- https://install.perlbrew.pl | sh

Whatever else

curl -L https://install.perlbrew.pl | bash

Windows

Haha, yeah, right.

Once you've installed perlbrew, log out of root and init it as your user. Then install a perl. This will take a while.

perlbrew init
perlbrew install perl-5.24.0

There, you now have a Perl 5.24.0 installation in your home folder. which perl will still say /usr/bin/perl so you can change that:

perlbrew switch perl-5.24.0

It will have already told you that you need to alter your .bashrc or suchlike, with something like this:

source $HOME/perl5/perlbrew/etc/bashrc

You should do that.

Perlbrew does other stuff - see https://perlbrew.pl for details.

cpanm

You want to be able to install modules against your new perl.

You will have to reinstall modules under every perl you have if you want to use the same modules under different versions. This is because of reasons.1

perlbrew install-cpanm

Now you can use cpanm to install modules. If you install a new Perl with perlbrew, you will have to

perlbrew switch $your_new_perl
perlbrew install-cpanm

All over again. If you're dealing with multiple Perl versions for a reason, you've probably already read the docs enough that you know which commands to use.

cpanfile

A cpanfile is a file in your project that lists the dependencies it requires. The purpose of this file is for when you are developing a project, and thus you haven't actually installed it. It looks like this.

requires "Moose";
requires "DBIx::Class" => "0.082840";
requires "perl" => "5.24";

test_requires "Test::More";

You use it like this

cpanm --installdeps .

The . refers to the current directory, of course, so you run this from a place that has a cpanfile in it.

The full syntax is on CPAN.

Purpose of cpanfile

A "project" here refers to basically anything you might put on CPAN - a distribution. It might be a module, or just some scripts, or a whole suite of both of those things.

The point is it's a unit, and it has dependencies, and you can't run the code without satisfying those dependencies.

If you install this distribution with cpanm then it will automatically install the dependencies because the author set up the makefile correctly so that cpanm knew what the dependencies were. cpanm also puts the modules in $PERL5LIB and the scripts in $PATH so that you can use them.

If you have the source code, either you are the author, or at least you're a contributor; you don't want to run the makefile just to install the dependencies, because this will install the development version of the module too. Nor do you want to require your contributors to install the whole of dzil just to contribute to your module. So, you provide a cpanfile that lists the dependencies they require to run or develop your module or scripts.

1 The primary reason is that every Perl version has a slightly different set of development headers, so any modules written in C will be incompatible. It's too much effort to separate them and disk space is cheap; so we just keep separate libraries and avoid the problem.

2015-07-29

Extending Catalyst Controllers

Our API is versioned. Any change made to the API requires a new version at some level or another.

/api/v1/customers
/api/v1.1/customers
/api/v1.1.1/customers

Additionally, some of the URLs may want to be aliased

/api/v1.0.0/customers

When I got to the code we had Catalyst controllers based on Catalyst::Controller::REST, which looked somewhat like this:

package OurApp::Controller::API::v1::Customer;
use Moose;
BEGIN { extends 'Catalyst::Controller::REST'; };

sub index
    : Path('/api/v1/customer') 
    : Args(1)
    : ActionClass('REST')
{
    # ... fetch and stash customer
}

sub index_GET
    : Action
{
}

1;

In order to extend this API, well, I faffed around a bit. I needed to add a new v1.1 controller that had all the methods available to this v1 controller, plus a new one. It needed to be done quickly, and nothing really stood out as obvious to me at the time.

So I used detach.

package OurApp::Controller::API::v1_1::Customer;
use Moose;
BEGIN { extends 'Catalyst::Controller::REST'; };

sub index
    : Path('/api/v1.1/customer') 
    : Args(1)
    : ActionClass('REST')
{ }

sub index_GET
    : Action
{
    my ($self, $c) = @_;
    $c->detach('/api/v1/customer/index');
}

1;

This had the effect of creating new paths under /api/v1.1/ that simply detached to their counterparts.

The problem with this particular controller is that in v1.0 it only had GET defined. That meant it only had index defined, and so the customer object itself was fetched in the index method, ready for index_GET. I needed a second method that also used the customer object: this meant I had to refactor the index methods to use a chained loader, which the new method could also use.

sub get_customer
    : Chained('/')
    : PathPart('api/v1.1/customer') 
    : CaptureArgs(1)
{
    # ... fetch and stash the customer
}

sub index
    : PathPart('')
    : Chained('get_customer')
    : Args(0)
    : ActionClass('REST')
{ }

sub index_GET
    : Action
{
    my ($self, $c) = @_;
    $c->detach('/api/v1.1/customer/index');
}

sub address
    : PathPart('address')
    : Chained('get_customer')
    : Args(0)
    : ActionClass('REST')
{}

sub address_GET
    : Action
{
    # ... get address from stashed customer
}

The argument that used to terminate the URL is now in the middle of the URL for the address: /api/v1.1/customer/$ID/address. So it's gone from : Args(1) on the index action to : CaptureArgs(1) on the get_customer action.

The problem now is that I can't use detach in v1.1.1, because we'd be detaching mid-chain.

I had1 to use goto.

package OurApp::Controller::API::v1_1_1::Customer;
use Moose;
BEGIN { extends 'Catalyst::Controller::REST'; };

sub get_customer
    : Chained('/')
    : PathPart('api/v1.1.1/customer') 
    : CaptureArgs(1)
{
    goto &OurApp::Controller::API::v1_1::Customer::get_customer;
}

#...
1;

This was fine, except I also introduced a validation method that was not an action; it was simply a method on the controller that validated customers for POST and PUT.

sub index_POST
    : Action
{
    my ($self, $c) = @_;
    my $data = $c->req->data;

    $self->_validate($c, $data);
}

sub _validate {
    # ...
}

In version 1.1.1, the only change was to augment validation; phone numbers were now constrained, where previously they were not.

It seemed like a ridiculous quantity of effort to clone the entire directory of controllers, change all the numbers to 1.1.1, and hack in a goto, just because I couldn't use Moose's after method modifier on _validate.

Why couldn't I? Because I couldn't use OurApp::Controller::API::v1_1::Customer as the base class for OurApp::Controller::API::v1_1_1::Customer.

Why? Because the paths were hard-coded in the Paths and PathParts!

This was the moment of clarity. That is not the correct way of specifying the paths.

To Every Controller, A Path

There is actually already a controller at every level of our API.

OurApp::Controller::API
OurApp::Controller::API::v1
OurApp::Controller::API::v1_1
OurApp::Controller::API::v1_1_1
OurApp::Controller::API::v1_1_1::Customer

This means we can add path information at every level. It's important to remember the controller namespace has nothing to do with Chained actions - The : Chained(path) and : PathPart(path) attributes can contain basically anything, allowing any path to be constructed from any controller.

In practice, this is a bad idea, because the first thing you want to know when you look at a path is how it's defined; and you don't want to have to pick apart the debug output when you could simply make assumptions based on a consistent association between controllers and paths.

But there is a way of associating the controller with the chained path, and that's by use of the path config setting and the : PathPrefix and : ChainedParent attributes. Both of these react to the current controller, meaning that if you subclass the controller, the result changes.

First I made the v1 controller have just the v1 path.

package OurApp::Controller::API::v1;
use Moose;
BEGIN { extends 'Catalyst::Controller'; };

__PACKAGE__->config
(
    path => 'v1',
);

sub api
    : ChainedParent
    : PathPrefix
    : CaptureArgs(0)
{}

1;

Then I gave the API controller the api path.

package OurApp::Controller::API;
use Moose;
BEGIN { extends 'Catalyst::Controller'; };

__PACKAGE__->config
(
    path => '/api',
);

sub api
    : Chained
    : PathPrefix
    : CaptureArgs(0)
{}

1;

This Tomato Is Not A Fruit

You may be wondering, why isn't ::v1 an extension of ::API itself? It's 100% to do with the number of path parts we need. The ::API controller defines a path => '/api' , while the ::API::v1 controller defines path => 'v1' . If the latter extended the former, it would inherit the methods rather than chaining them, i.e. v1 would override rather than extend /api.

So we have one controller per layer, but things in the same layer can inherit.

package OurApp::Controller::API::v1::Customer;
use Moose;
BEGIN { extends 'Catalyst::Controller::REST'; };

__PACKAGE__->config
(
    path => 'customer',
);

sub index
    : Chained('../api')
    : PathPrefix
    : Args(1)
    : ActionClass('REST')
{}

sub index_GET {}

1;
package OurApp::Controller::API::v1_1;

use Moose;
BEGIN { extends 'OurApp::Controller::API::v1'; };

__PACKAGE__->config
(
    path => 'v1.1',
);

1;

The reason we can inherit is that everything we've done is relative.

  • ChainedParent
  • This causes ::API::v1::api to be chained from ::API::v1::api, but when inherited, causes ::API::v1_1::api to be chained from ::API::v1_1::api.

  • Chained('../api')
  • This causes ::API::v1::Customer::index to be chained from ::API::v1::api, but when we inherit it, the new ::API::v1_1::Customer::index will be chained from ::API::v1_1::api.

  • PathPrefix
  • This causes these methods to have the PathPart of their controller's path_prefix. The most important example of this is in ::API::v1. Here, we see the api method configured with it:

    sub api
        : ChainedParent
        : PathPrefix
        : CaptureArgs(0)
    {}

This last is the central part of the whole deal. This means that the configuration path => 'v1' causes this chain to have the PathPart v1. When we inherit from this class, we simply redefine path, as we did in the v1.1 controller above:

__PACKAGE__->config( path => 'v1.1' );

The code above wasn't abbreviated. That was the entirety of the controller.

We can also create the relevant Customer controller in the same way:

package OurApp::Controller::API::v1_1::Customer;
use Moose;
BEGIN { extends 'OurApp::Controller::API::v1::Customer'; };
1;

This is even shorter because we don't have to even change the path! All we need to do is establish that there is a controller called ::API::v1_1::Customer and the standard path stuff will take care of the rest.

Equally, you can alias the same version with the same trick:

package OurApp::Controller::API::v1_0;
use Moose;
BEGIN { extends 'OurApp::Controller::API::v1'; };
__PACKAGE__->config( path => 'v1.0' );
1;

And of course the whole point of this is that now you can extend your API.

package OurApp::Controller::API::v1_1::Customer
use Moose;
BEGIN { extends 'OurApp::Controller::API::v1::Customer'; };

sub index_PUT { }

sub _validate {}

1;

This is where I came in. Now I can extend v1.1 into v1.1.1 and use Moose's around or after to change the way _validate works only for v1.1.1, and thus I have extended my API in code as well as in principle.

CatalystX::AppBuilder

We're actually using CatalystX::AppBuilder. This makes subclassing the entire API tree even easier, because you can inject v1 controllers as v1.1 controllers.

after 'setup_components' => sub {
    my $class = shift;

    $class->add_paths(__PACKAGE__);

    CatalystX::InjectComponent->inject(
        into      => $class,
        component => 'OurApp::Controller::API',
        as        => 'Controller::API'
    );
    CatalystX::InjectComponent->inject(
        into      => $class,
        component => 'OurApp::Controller::API::v1',
        as        => 'Controller::API::v1'
    );
    CatalystX::InjectComponent->inject(
        into      => $class,
        component => 'OurApp::Controller::API::v1_1',
        as        => 'Controller::API::v1_1'
    );

    for my $version (qw/v1 v1_1/) {
        CatalystX::InjectComponent->inject(
            into      => $class,
            component => 'OurApp::Controller::API::' . $version . '::Customers',
            as        => 'Controller::API::' . $version . '::Customers'
        );

        for my $controller (qw/Addresses Products/) {
            CatalystX::InjectComponent->inject(
                into      => $class,
                component => 'OurApp::Controller::API::v1::' .  $controller, # sic!
                as        => 'Controller::API::' . $version . '::' .  $controller
            );
        }
    }
};

Now we've injected all controllers that weren't changed simply by using the v1 controller as both the v1 and the v1.1 controllers; and the Customer controller, which was subclassed, has had the v1.1 version added explicitly.

The only thing we can't get away with injecting with different names are subclassed controllers themselves. Obviously that includes the v1.1 Customer controller because that's the one with new functionality, but don't forget it is also necessary to have a v1_1 controller in the first place in order to override the path config of its parent.

We would also have to create subclasses if we wanted to alias v1 into v1.0 and v1.0.0. That is the limitation of this, and it's a few lines of boilerplate to do so; but it's considerably better than an entire suite of copy-pasted controllers using goto.

I expect there's a good way to perform this particular form of injection without CatalystX::AppBuilder, but I don't know it. Comments welcome.

1 Chose.

2015-05-27

CPAN installation order

At work we use Catalyst. Catalyst apps can be (should be?) built up from multiple modules, in the sense of distribution. This allows them to be modular, which is kind of why they're called modules.

That means each project is a directory full of directories, most of which represent Perl modules, and most of which depend on each other. In order to deploy we throw this list at cpanm (http://cpanmin.us) and let cpanm install them all.

This works by accident, because they're all installed already, and so module X depending on module Y is normally OK because Y will be updated during the process.

For a fresh installation, cpanm will fail to install many of them because their prerequisites are in the installation list:

$ cpanm X Y
--> Working on X
...
-> FAIL Installing the dependencies failed: 'Y' is not installed
--> Working on Y
...
-> OK
Successfully installed Y

Now Y is installed, but not X.

I wrote a script to reorder them. https://gist.github.com/Altreus/26c33421c36cc1eee68c

$ installation-order X Y
Y X

$ cpanm $(installation-order X Y)
--> Working on Y
...
-> OK
Successfully installed Y
--> Working on X
...
-> OK
Successfully installed X

This will use the same information that cpanm used in the first place to complain that Y was not installed; which is to say, if a dependency is missing, the original cpanm invocation would not have failed anyway.

Update

It is worth noting that cpanm can install from directories; and it will always try this if the module name starts with ./.

Therefore, X and Y above can be the result of a glob, so long as you include the ./ in the glob:

$ echo ./*
./Module1 ./Module2
$ installation-order ./*
./Module2 ./Module1

This also works with absolute paths.

2015-04-14

Catalyst Models

A Catalyst model is simply a package in the MyApp::Model namespace.

$c->model('DBIC')

simply returns

"MyApp::Model::DBIC"

I recently spent some time at work trying to work out quite how Catalyst models work with relation to, well, everything else.

Our app structure is based on CatalystX::AppBuilder, and I needed to add a model to one of the components, in order to provide a caching layer in the right place.

The mistake I'd been making was that the Schema subclass is not the same thing as the model. Rather, the model is an interface into the Schema class. Essentially, I had one class too few.

You can determine that by creating a new Catalyst application and then running the helper script that creates a model from an existing schema. You get a class like this:

package MyApp::Model::FilmDB;

use strict;
use base 'Catalyst::Model::DBIC::Schema';

__PACKAGE__->config(
    schema_class => 'MyApp::Schema::FilmDB',

    connect_info => {
        dsn => 'dbi:mysql:filmdb',
        user => 'dbusername',
        password => 'dbpass',
    }
);

A Model class is created and it points to the Schema class, being your actual DBIC schema.

Once I'd realised the above rule it was easy enough to create MyApp::Extension::Model::DBIC to go alongside MyApp::Extension::Schema.

Further confusion arose with the configuration. There appeared to be no existing configuration that matched any of the extant classes in the application or its components. However, it was clear which was the DBIC model configuration because of the DSN.

I wanted to follow suit with the new module, which meant that some how I had to map the real name to the config name.

<Model::DBIC>
</Model::DBIC>

This makes sense; if I do $c->model('DBIC') I'll get "MyApp::Model::DBIC", and that'll be configured with the Model::DBIC part of the config.

What I'd missed here was that we were mixing CatalystX::AppBuilder with CatalystX::InjectComponent:

package MyApp::Extension;
use CatalystX::InjectComponent;

after 'setup_components' => sub {
    my $class = shift;

    ...

    CatalystX::InjectComponent->inject(
        into      => $class,
        component => __PACKAGE__ . '::Model::DBIC',
        as        => 'Model::DBIC',
    );
}

This was the missing part - the stuff inside the CatalystX::AppBuilder component was itself built up out of other components, aliasing their namespace-specific models so that $c->model would return the appropriate class.

Now, Model::DBIC refers to MyApp::Extension::Model::DBIC, which is an interface into MyApp::Extension::Schema.

2015-01-05

User groups in Odoo 8

Odoo has a user group concept that, if you Google for errors, crops up all the time. Odd that when you first run Odoo, you can't assign users to groups.

The answer is you have to give the Administrator user the "Technical Features" feature in Usability. Navigate to Settings > Users, click Administrator, click Edit, check the relevant box, click Save, and finally refresh.

If you Google for it, there's hardly any information on the subject. However, Odoo is quite happy to occasionally tell you what groups you need to be a part of in order to access something.

User groups are access control, so it's common that you'd want to create levels of access and assign the user to them. I first discovered an issue with this when trying the Project Management module - trying which was the entire point of me running Odoo 8 in the first place. (I can't reproduce the problem now that it's a new year. Maybe Odoo's NYR is to be less whiny.)

You can run a Docker container with Odoo 8 in it from the tinyerp/odoo-docker github repo; either the Debian or the Ubuntu version should work fine.1

1 I recommend the Debian version, since Ubuntu is just Debian with extra, irrelevant stuff bundled in, making it not entirely useful to have an Ubuntu version in the first place. Licensing is probably involved.

2014-12-22

Day 22 - The nth Day Of Christmas

How many presents were given, in total, on the 12th day of Christmas, by the so-called "true love"? How many for the nth day?

For each day we know that each other day was done again, so we have a shape like this:

1
12
123
1234
...

Each column is as tall as the number of rows, and the number of rows is 12.

This means the 1 column is 12 tall, the 2 column 11, and so on.

This is 12 * 1 + 11 * 2 + 10 * 3 ...

That's boring. That's not what computers or maths are for. Let's generalise.

We can see that each section of the summed sequence follows a pattern of x * y, where x + y = 13.

It is common, when analysing sequences, to forget that the order matters, and the row number can be used as a variable. If we call that variable i then each section is (13 - i) * i, and the total is the sum over 1, 12.

 12
  Σ (13 - i) * i
i=1

13 is suspiciously close to 12. What happens if we do this?

 12
  Σ (12 + 1 - i) * i
i=1

And then replace the 12 with our n to answer "What about the nth day?"

  n
  Σ (n + 1 - i) * i
i=1

Does it work? Let's Perl it up. Each value of (n + 1 - i) * i can be produced by a map over the range 1..$n, using $_ in place of i, since that's exactly what it is in this case.

sum0 map { $_ * ($n + 1 - $_) } 1 .. $n

sum0 comes from List::Util, and does a standard sum, except the list is assumed to start with zero in case the list is empty - this just avoids pesky warnings.

Try it. Using $ARGV[0] for $n we can give our n on the command line:

perl -MList::Util=sum0 -E'say sum0 map { $_ * ($ARGV[0] + 1 - $_) } 1 .. $ARGV[0]' 12

Vary the 12 to solve for different values of n.

The answer, incidentally, is 364.

Day 18: The URI

I've talked a lot about this resource-first way of dealing with the web, and really the internet in general, but it isn't a tool that fits all things. For instance, today I was looking at the point-of-sale module in Odoo, which is essentially an HTML representation of the index resource of the products in the system, but is actually more complicated than that, because it includes that resource, a numeric input box, the bill of items so far, a search box, and a few other twiddly bits to improve the cashier's use of the system. Plus, it is designed with tablets in mind.

This is quite different from the list of products you get when you look for the list of products in Odoo itself.

However, we must construct a URI that refers to this view of the data if we're to be able to access that view of the data in the first place. That means that we somehow have to shoehorn this not-a-resource idea into the everything-is-a-resource idea.

Today I'm going to deconstruct the URI and explain how each part can be used, in order to avoid too much in the way of special behaviour. Ideally we'd like every resource to be represented by a single URI, but that's clearly not going to work.

Allow me to state up front that I consider Odoo's URI scheme to be utterly shocking. But it appears to be a legacy from back in the old days when more people made web things than really understood what URIs were for.

The URI

The URI is made up of several parts. Here is what I consider to be the simplest URL that contains all common parts1:

http://www.example.com:8080/resource/id?query=data#part-of-document

|_____|___|_______|___|____|________|__|__________|_______________|
   1    2     3     4   5      6     7      8             9
  • 1. Schema
  • 2. Subdomain
  • 3. SLD2
  • 4. TLD
  • 5. Port
  • 6. Resource (type) name
  • 7. Resource (instance) identifier
  • 8. Query string
  • 9. URI fragment

Together, 2, 3 and 4 comprise the hostname; 6 and 7 are the path.

Breaking down the URI

Schema

The schema is the first place where you restrict yourself. Often referred to as protocol, the schema usually determines how the URI should be used. In this example http is the assumed protocol by which web requests are made. The http schema tells the client to use the HTTP protocol to make the request.

This is very useful because it means we can immediately assume a large quantity of knowledge about the system that we wouldn't have without the schema. Particularly useful is that we know what sort of programs can be used to actually access this URL3. This is, if you think about it, what the word protocol means: it is those things that are assumed to be the case, given a certain situation. When we all follow protocol, we don't need to explain why we're doing what we're doing.

Mostly we come across URLs specifying the HTTP schema; in fact, it's assumed, in many cases, that a URI with no schema is an HTTP URL, because if you click on it, it opens up in your browser. However, some places have started using their own schemata, such as the spotify: schema, which opens URLs in the Spotify client, or the steam: schema, which opens things with Steam.

It's worth noting that the entire hostname can also be omitted from a URI, but this usually means you get three slashes, not two. This is commonly seen with the file protocol, such as file:///home/user/documents/example.html; where the third / is actually part of the path. For this reason it can be observed that the steam: schema does not quite follow the normal URI standards, since the part immediately following the schema is an action - arguably a resource - and not a hostname.

By inventing our own schemata like this we can create entire applications with a new way of communicating, but we're focusing on the web here, which means we're going to use HTTP(S), like it or lump it.

Subdomain

The term "subdomain" is a bit of a colloquialism. Each section of the hostname is a subdomain for the part to the right. The host name is a hierarchy with, in this case, com at the top. We usually call this part the "subdomain" because it's the first subdivision that is really relevent to a human.

When we have a subdivided subdomain we sort of stop talking about them and start mumbling and saying "that bit" and pointing.

The subdomain is a tool we can use to do many things. Traditionally the web is in the www subdomain, but the http protocol is usually sufficient to assume web, these days. However, that's starting to change, as we start to send non-web things over HTTP. These non-web things are, e.g., the API, or the CDN.

Really consider using an api subdomain for your API. You'll find that if you have an api and a www, then your website can have, in the majority, the exact same URI structure as the API. This is more often the case than it appears to be, because people don't tend to think of their web pages as representing a resource in HTML format.

Domain

The SDL is the part of the domain that really, to a human, represents where the site is. This is usually your company or organisation name, or some other thing whose entire purpose is to say what this whole web site is about.

You can install a system under multiple domains and thus they would all have the exact same URI scheme, except that, because they're in different places, the records that you get would be different.

Because yoursite.com/user/1 is not the same person as mysite.com/user/1, except by coincidence.

I've lumped the TLD in here too, because the TLD is, to most people, part of your domain name - which is why we call the subdomain the subdomain regardless of where it appears on the actual hostname.

Port

When designing URI schemes it's helpful to drink a lot of port, for inspiration.

Commonly there are alternative services associated with your website, meaning they're on the same domain, and you can't use the subdomain because these other services need api and www subdomains of their own.

One trick is to mount these services under a part of the path, and consider them a big resource with sub-resources; but easier is to install them on a different port.

For example, your Elasticsearch instance - which communicates entirely via HTTP - can be running on the same hostname as your website, but a different port. Elasticsearch's default port is 9200, going up to 9300 as you add instances on the same machine.

Resource name

The first part of the path of the URL I'm calling the resource name. That's because this is where the actual resource you're requesting starts. Everything before the path is defining whose resource you are asking for, but once the path starts you're starting to get a handle on the actual information.

The resource name, when requested, can have multiple behaviours, depending on the purpose of the resource, but common is simply to be an index of all the items of that type. Since that can be cumbersome, it is perfectly legitimate to both paginate this list and summarise the entries. That sort of stuff is well out of scope of this article, though.

Other uses of the first part of the path are organisational, and may be handled better as a subdomain. For example, having an api part of the path here is not as useful as it would be to have an API subdomain, because if the paths to the resources can be consistent then we don't have to ask questions about what they should be.

https://www.example.com/resource
https://api.example.com/resource

Other times, you may want to use a different port. For example, if the web stuff is on port 80 then the administration part could be on port 8080. This also allows you to control access to the different parts of the site at the kernel level, using routing rather than soft authentication.

https://www.example.com/admin
https://www.example.com:8080

Doing this also means that it's harder to guess the correct path to the admin area, since you can use an obscure port. Denying access based on IP rules means you'd never report to unauthorised users when they guessed right in the first place.

But really, there's no exact reason why you would or would not add parts of the path to the URL in order to divide it up into separate logical zones. This can certainly help with human comprehension of the purpose of your URL. Sometimes you may even want to provide dummy paths - paths that refer to the same resource as other paths, but assist with conceptual compartmentalisation by having different subpaths.

https://www.example.com/shop/product/1
https://www.example.com/blog/post/1

In these examples, the first part of the path could be omitted, provided that post is always the blog post and product is always a shop product. Consider also that you could still use subdomains for these.

https://api.shop.example.com/product/1

The important part would be to ensure that your uses are consistent. Always have each part of the URL refer to the same logical division of your resource structure.

Item ID

Once you've decided at which point of the path to put the resource type, you should probably put the next part as an optional ID field.

The combination of a resource name and an item ID should be entirely sufficient to retrieve all the information about that specific instance of that type.

This is a reasonably central principle to the resource-first model of your system - all your things have a type and an ID and that's all you need to provide to retrieve it, or at least a representation of it. Everything else is your organisational whimsy and the system really shouldn't have to know.

More formally than dismissing it as whimsy, I should point out that even the type names and shapes can change, and that's difficult enough to deal with. Every level of organisation you add on top of this is another changeable shape of the system that at some point you're going to have to adapt. The fewer of those you have, the better.

The actual format of your identifier is up to you, but there's really nothing else you can put after the resource name that is relevant at this point.

Query string

If I catch you using a query string to tell a dynamic resource to load a specific other resource I will murder you in your sleep.

https://example.com/index.php?type=resource&id=1234

Seriously, this sort of crap is all over the internet. Yes, it's usually PHP.

You are using a URI - at least put the resource identifier in the resource identifier.

It is important to note that the query string is not the same thing as the "GET parameters". A query string does not have to be in the format key=value&key=value - the web server passes the query string straight to the app, and it is the application that decodes it in its own way. It is common to use the key=value&key=value structure but not required.

The query string's most obvious purpose is to pass a query to a resource that expects one, or that at least accepts one. Often the index resources will allow for some sort of search or filter functionality, and if that's not the case then special resources designed to search and filter - and possibly concatenate - other resources will accept search parameters.

Further specialisation of resources would not even use the KVP format of "GET parameters", and simply take the query string as instruction. These types of resource are drifting away from the "object" type of resource and moving towards "function" resources, which are a separate discussion.

The thing about the query string is that it is usually only relevant to GET requests, which is why it is sometimes called the GET string. But GET is an HTTP verb and the query string is part of the URL; and URLs don't have to be http://, so the query string can really be used against any scheme.

It is often said the query string should not be used to send data to the server, but I'm really not sure that's the case. The server should not store data as the result of a read request (HTTP's GET), but it is welcome to store data as the result of a write request (HTTP's POST or PUT). In which case it is entirely up to the server the mechanism by which the data are provided to it.

These are why you should call it the query string, not the GET string.

Fragment

The part of the URL after the # is called the fragment. This is not actually part of the resource identifier, but is provided for the client's benefit.

If you click on any of the footnote marks in this document4, most browsers I give a toss about will jump to the footnote, and back again when you click on the number of that footnote.

No new page request is made. The browser is not being instructed to access a different resource. In the example earlier, the fragment is #part-of-document. The fragment is usually used to refer to a part of the document. In HTML and XML, this is either by the id or name attributes of the elements.

In this document, the a tags that jump around the page have name attributes that the browser uses to scroll to them when the URL fragment changes, i.e. in these blog-post resources, the parts-of-the-document that I refer to with URL fragments are the footnotes and the places the footnotes refer to.

Using the document fragment to refer to specific resources is a crime committed by many "JavaScript apps" today. The reason this is a crime is that it is not identifying the resource; it's identifying the resource proxy, which means the correct client must be used to actually access the resource itself. It's like having a proprietary browser that only understands a completely different URI format.

It's a crime because browsers are more than capable of intercepting URI requests inside an application and getting the application to update as necessary, and servers are more than capable of returning a javascript-app-with-resource-in-it as the HTML representation of the resource.

There is no reason besides lack of imagination to trample all over that URI system just to avoid reloading the page every so often.

TODO

Not mentioned is the idea of a "related resource". This can be a third part of the URI path whereby you request an index of a separate resource based on the current one:

https://www.example.com/blog/post/1/comments

This is, conceptually, the same as

https://www.example.com/blog/comments?post=1

but you may wish to return the results differently, e.g. with more expanded objects rather than just URLs to the results.

In upcoming posts we'll probably have a look at those "functional" resources I mentioned in passing. This post has been entirely about "object" resources, i.e. those resources that simply represent some representation of a real-world object, or a fake-world object, but ultimately something that can be represented as a JSON object with fields and values. I will also try to discuss the resource-first view of website building using the aforementioned point-of-sale in Odoo as an example.

We also haven't discussed how it is that you would relate resources to one another in knowable ways. This ties in with the hyperlink concept and is the thinking behind Web::HyperMachine - HTML pages are already linked together with <a href="related-link">, but there are myriad other ways even those use hyperlinks to refer to other resources, and even more ways in HTTP itself.


1 I've omitted from this the user:pass@ part that can be used before the hostname, because it's not very common.

2 The "second-level domain" is colloquially the "company" part of the name, i.e. the first part that actually identifies at a human-readable level what it is the URI refers to. In some cases, such as .co.uk, the TLD is actually the SLD (co) and TLD (uk), and it is the third-level domain that is the company part. Colloquially, we can refer .co.uk as a TLD, so that this remains the SLD.

3 A URL is basically a URI that you can actually use. That is, there exist URIs that refer to resources but that cannot actually be used to access that resource; for example the ISBN URI schema cannot be used to get an actual book.

4 Like this one.