snapsvg

2015-05-27

CPAN installation order

At work we use Catalyst. Catalyst apps can be (should be?) built up from multiple modules, in the sense of distribution. This allows them to be modular, which is kind of why they're called modules.

That means each project is a directory full of directories, most of which represent Perl modules, and most of which depend on each other. In order to deploy we throw this list at cpanm (http://cpanmin.us) and let cpanm install them all.

This works by accident, because they're all installed already, and so module X depending on module Y is normally OK because Y will be updated during the process.

For a fresh installation, cpanm will fail to install many of them because their prerequisites are in the installation list:

$ cpanm X Y
--> Working on X
...
-> FAIL Installing the dependencies failed: 'Y' is not installed
--> Working on Y
...
-> OK
Successfully installed Y

Now Y is installed, but not X.

I wrote a script to reorder them. https://gist.github.com/Altreus/26c33421c36cc1eee68c

$ installation-order X Y
Y X

$ cpanm $(installation-order X Y)
--> Working on Y
...
-> OK
Successfully installed Y
--> Working on X
...
-> OK
Successfully installed X

This will use the same information that cpanm used in the first place to complain that Y was not installed; which is to say, if a dependency is missing, the original cpanm invocation would not have failed anyway.

2015-04-14

Catalyst Models

A Catalyst model is simply a package in the MyApp::Model namespace.

$c->model('DBIC')

simply returns

"MyApp::Model::DBIC"

I recently spent some time at work trying to work out quite how Catalyst models work with relation to, well, everything else.

Our app structure is based on CatalystX::AppBuilder, and I needed to add a model to one of the components, in order to provide a caching layer in the right place.

The mistake I'd been making was that the Schema subclass is not the same thing as the model. Rather, the model is an interface into the Schema class. Essentially, I had one class too few.

You can determine that by creating a new Catalyst application and then running the helper script that creates a model from an existing schema. You get a class like this:

package MyApp::Model::FilmDB;

use strict;
use base 'Catalyst::Model::DBIC::Schema';

__PACKAGE__->config(
    schema_class => 'MyApp::Schema::FilmDB',

    connect_info => {
        dsn => 'dbi:mysql:filmdb',
        user => 'dbusername',
        password => 'dbpass',
    }
);

A Model class is created and it points to the Schema class, being your actual DBIC schema.

Once I'd realised the above rule it was easy enough to create MyApp::Extension::Model::DBIC to go alongside MyApp::Extension::Schema.

Further confusion arose with the configuration. There appeared to be no existing configuration that matched any of the extant classes in the application or its components. However, it was clear which was the DBIC model configuration because of the DSN.

I wanted to follow suit with the new module, which meant that some how I had to map the real name to the config name.

<Model::DBIC>
</Model::DBIC>

This makes sense; if I do $c->model('DBIC') I'll get "MyApp::Model::DBIC", and that'll be configured with the Model::DBIC part of the config.

What I'd missed here was that we were mixing CatalystX::AppBuilder with CatalystX::InjectComponent:

package MyApp::Extension;
use CatalystX::InjectComponent;

after 'setup_components' => sub {
    my $class = shift;

    ...

    CatalystX::InjectComponent->inject(
        into      => $class,
        component => __PACKAGE__ . '::Model::DBIC',
        as        => 'Model::DBIC',
    );
}

This was the missing part - the stuff inside the CatalystX::AppBuilder component was itself built up out of other components, aliasing their namespace-specific models so that $c->model would return the appropriate class.

Now, Model::DBIC refers to MyApp::Extension::Model::DBIC, which is an interface into MyApp::Extension::Schema.

2015-01-05

User groups in Odoo 8

Odoo has a user group concept that, if you Google for errors, crops up all the time. Odd that when you first run Odoo, you can't assign users to groups.

The answer is you have to give the Administrator user the "Technical Features" feature in Usability. Navigate to Settings > Users, click Administrator, click Edit, check the relevant box, click Save, and finally refresh.

If you Google for it, there's hardly any information on the subject. However, Odoo is quite happy to occasionally tell you what groups you need to be a part of in order to access something.

User groups are access control, so it's common that you'd want to create levels of access and assign the user to them. I first discovered an issue with this when trying the Project Management module - trying which was the entire point of me running Odoo 8 in the first place. (I can't reproduce the problem now that it's a new year. Maybe Odoo's NYR is to be less whiny.)

You can run a Docker container with Odoo 8 in it from the tinyerp/odoo-docker github repo; either the Debian or the Ubuntu version should work fine.1

1 I recommend the Debian version, since Ubuntu is just Debian with extra, irrelevant stuff bundled in, making it not entirely useful to have an Ubuntu version in the first place. Licensing is probably involved.

2014-12-22

Day 22 - The nth Day Of Christmas

How many presents were given, in total, on the 12th day of Christmas, by the so-called "true love"? How many for the nth day?

For each day we know that each other day was done again, so we have a shape like this:

1
12
123
1234
...

Each column is as tall as the number of rows, and the number of rows is 12.

This means the 1 column is 12 tall, the 2 column 11, and so on.

This is 12 * 1 + 11 * 2 + 10 * 3 ...

That's boring. That's not what computers or maths are for. Let's generalise.

We can see that each section of the summed sequence follows a pattern of x * y, where x + y = 13.

It is common, when analysing sequences, to forget that the order matters, and the row number can be used as a variable. If we call that variable i then each section is (13 - i) * i, and the total is the sum over 1, 12.

 12
  Σ (13 - i) * i
i=1

13 is suspiciously close to 12. What happens if we do this?

 12
  Σ (12 + 1 - i) * i
i=1

And then replace the 12 with our n to answer "What about the nth day?"

  n
  Σ (n + 1 - i) * i
i=1

Does it work? Let's Perl it up. Each value of (n + 1 - i) * i can be produced by a map over the range 1..$n, using $_ in place of i, since that's exactly what it is in this case.

sum0 map { $_ * ($n + 1 - $_) } 1 .. $n

sum0 comes from List::Util, and does a standard sum, except the list is assumed to start with zero in case the list is empty - this just avoids pesky warnings.

Try it. Using $ARGV[0] for $n we can give our n on the command line:

perl -MList::Util=sum0 -E'say sum0 map { $_ * ($ARGV[0] + 1 - $_) } 1 .. $ARGV[0]' 12

Vary the 12 to solve for different values of n.

The answer, incidentally, is 364.

Day 18: The URI

I've talked a lot about this resource-first way of dealing with the web, and really the internet in general, but it isn't a tool that fits all things. For instance, today I was looking at the point-of-sale module in Odoo, which is essentially an HTML representation of the index resource of the products in the system, but is actually more complicated than that, because it includes that resource, a numeric input box, the bill of items so far, a search box, and a few other twiddly bits to improve the cashier's use of the system. Plus, it is designed with tablets in mind.

This is quite different from the list of products you get when you look for the list of products in Odoo itself.

However, we must construct a URI that refers to this view of the data if we're to be able to access that view of the data in the first place. That means that we somehow have to shoehorn this not-a-resource idea into the everything-is-a-resource idea.

Today I'm going to deconstruct the URI and explain how each part can be used, in order to avoid too much in the way of special behaviour. Ideally we'd like every resource to be represented by a single URI, but that's clearly not going to work.

Allow me to state up front that I consider Odoo's URI scheme to be utterly shocking. But it appears to be a legacy from back in the old days when more people made web things than really understood what URIs were for.

The URI

The URI is made up of several parts. Here is what I consider to be the simplest URL that contains all common parts1:

http://www.example.com:8080/resource/id?query=data#part-of-document

|_____|___|_______|___|____|________|__|__________|_______________|
   1    2     3     4   5      6     7      8             9
  • 1. Schema
  • 2. Subdomain
  • 3. SLD2
  • 4. TLD
  • 5. Port
  • 6. Resource (type) name
  • 7. Resource (instance) identifier
  • 8. Query string
  • 9. URI fragment

Together, 2, 3 and 4 comprise the hostname; 6 and 7 are the path.

Breaking down the URI

Schema

The schema is the first place where you restrict yourself. Often referred to as protocol, the schema usually determines how the URI should be used. In this example http is the assumed protocol by which web requests are made. The http schema tells the client to use the HTTP protocol to make the request.

This is very useful because it means we can immediately assume a large quantity of knowledge about the system that we wouldn't have without the schema. Particularly useful is that we know what sort of programs can be used to actually access this URL3. This is, if you think about it, what the word protocol means: it is those things that are assumed to be the case, given a certain situation. When we all follow protocol, we don't need to explain why we're doing what we're doing.

Mostly we come across URLs specifying the HTTP schema; in fact, it's assumed, in many cases, that a URI with no schema is an HTTP URL, because if you click on it, it opens up in your browser. However, some places have started using their own schemata, such as the spotify: schema, which opens URLs in the Spotify client, or the steam: schema, which opens things with Steam.

It's worth noting that the entire hostname can also be omitted from a URI, but this usually means you get three slashes, not two. This is commonly seen with the file protocol, such as file:///home/user/documents/example.html; where the third / is actually part of the path. For this reason it can be observed that the steam: schema does not quite follow the normal URI standards, since the part immediately following the schema is an action - arguably a resource - and not a hostname.

By inventing our own schemata like this we can create entire applications with a new way of communicating, but we're focusing on the web here, which means we're going to use HTTP(S), like it or lump it.

Subdomain

The term "subdomain" is a bit of a colloquialism. Each section of the hostname is a subdomain for the part to the right. The host name is a hierarchy with, in this case, com at the top. We usually call this part the "subdomain" because it's the first subdivision that is really relevent to a human.

When we have a subdivided subdomain we sort of stop talking about them and start mumbling and saying "that bit" and pointing.

The subdomain is a tool we can use to do many things. Traditionally the web is in the www subdomain, but the http protocol is usually sufficient to assume web, these days. However, that's starting to change, as we start to send non-web things over HTTP. These non-web things are, e.g., the API, or the CDN.

Really consider using an api subdomain for your API. You'll find that if you have an api and a www, then your website can have, in the majority, the exact same URI structure as the API. This is more often the case than it appears to be, because people don't tend to think of their web pages as representing a resource in HTML format.

Domain

The SDL is the part of the domain that really, to a human, represents where the site is. This is usually your company or organisation name, or some other thing whose entire purpose is to say what this whole web site is about.

You can install a system under multiple domains and thus they would all have the exact same URI scheme, except that, because they're in different places, the records that you get would be different.

Because yoursite.com/user/1 is not the same person as mysite.com/user/1, except by coincidence.

I've lumped the TLD in here too, because the TLD is, to most people, part of your domain name - which is why we call the subdomain the subdomain regardless of where it appears on the actual hostname.

Port

When designing URI schemes it's helpful to drink a lot of port, for inspiration.

Commonly there are alternative services associated with your website, meaning they're on the same domain, and you can't use the subdomain because these other services need api and www subdomains of their own.

One trick is to mount these services under a part of the path, and consider them a big resource with sub-resources; but easier is to install them on a different port.

For example, your Elasticsearch instance - which communicates entirely via HTTP - can be running on the same hostname as your website, but a different port. Elasticsearch's default port is 9200, going up to 9300 as you add instances on the same machine.

Resource name

The first part of the path of the URL I'm calling the resource name. That's because this is where the actual resource you're requesting starts. Everything before the path is defining whose resource you are asking for, but once the path starts you're starting to get a handle on the actual information.

The resource name, when requested, can have multiple behaviours, depending on the purpose of the resource, but common is simply to be an index of all the items of that type. Since that can be cumbersome, it is perfectly legitimate to both paginate this list and summarise the entries. That sort of stuff is well out of scope of this article, though.

Other uses of the first part of the path are organisational, and may be handled better as a subdomain. For example, having an api part of the path here is not as useful as it would be to have an API subdomain, because if the paths to the resources can be consistent then we don't have to ask questions about what they should be.

https://www.example.com/resource
https://api.example.com/resource

Other times, you may want to use a different port. For example, if the web stuff is on port 80 then the administration part could be on port 8080. This also allows you to control access to the different parts of the site at the kernel level, using routing rather than soft authentication.

https://www.example.com/admin
https://www.example.com:8080

Doing this also means that it's harder to guess the correct path to the admin area, since you can use an obscure port. Denying access based on IP rules means you'd never report to unauthorised users when they guessed right in the first place.

But really, there's no exact reason why you would or would not add parts of the path to the URL in order to divide it up into separate logical zones. This can certainly help with human comprehension of the purpose of your URL. Sometimes you may even want to provide dummy paths - paths that refer to the same resource as other paths, but assist with conceptual compartmentalisation by having different subpaths.

https://www.example.com/shop/product/1
https://www.example.com/blog/post/1

In these examples, the first part of the path could be omitted, provided that post is always the blog post and product is always a shop product. Consider also that you could still use subdomains for these.

https://api.shop.example.com/product/1

The important part would be to ensure that your uses are consistent. Always have each part of the URL refer to the same logical division of your resource structure.

Item ID

Once you've decided at which point of the path to put the resource type, you should probably put the next part as an optional ID field.

The combination of a resource name and an item ID should be entirely sufficient to retrieve all the information about that specific instance of that type.

This is a reasonably central principle to the resource-first model of your system - all your things have a type and an ID and that's all you need to provide to retrieve it, or at least a representation of it. Everything else is your organisational whimsy and the system really shouldn't have to know.

More formally than dismissing it as whimsy, I should point out that even the type names and shapes can change, and that's difficult enough to deal with. Every level of organisation you add on top of this is another changeable shape of the system that at some point you're going to have to adapt. The fewer of those you have, the better.

The actual format of your identifier is up to you, but there's really nothing else you can put after the resource name that is relevant at this point.

Query string

If I catch you using a query string to tell a dynamic resource to load a specific other resource I will murder you in your sleep.

https://example.com/index.php?type=resource&id=1234

Seriously, this sort of crap is all over the internet. Yes, it's usually PHP.

You are using a URI - at least put the resource identifier in the resource identifier.

It is important to note that the query string is not the same thing as the "GET parameters". A query string does not have to be in the format key=value&key=value - the web server passes the query string straight to the app, and it is the application that decodes it in its own way. It is common to use the key=value&key=value structure but not required.

The query string's most obvious purpose is to pass a query to a resource that expects one, or that at least accepts one. Often the index resources will allow for some sort of search or filter functionality, and if that's not the case then special resources designed to search and filter - and possibly concatenate - other resources will accept search parameters.

Further specialisation of resources would not even use the KVP format of "GET parameters", and simply take the query string as instruction. These types of resource are drifting away from the "object" type of resource and moving towards "function" resources, which are a separate discussion.

The thing about the query string is that it is usually only relevant to GET requests, which is why it is sometimes called the GET string. But GET is an HTTP verb and the query string is part of the URL; and URLs don't have to be http://, so the query string can really be used against any scheme.

It is often said the query string should not be used to send data to the server, but I'm really not sure that's the case. The server should not store data as the result of a read request (HTTP's GET), but it is welcome to store data as the result of a write request (HTTP's POST or PUT). In which case it is entirely up to the server the mechanism by which the data are provided to it.

These are why you should call it the query string, not the GET string.

Fragment

The part of the URL after the # is called the fragment. This is not actually part of the resource identifier, but is provided for the client's benefit.

If you click on any of the footnote marks in this document4, most browsers I give a toss about will jump to the footnote, and back again when you click on the number of that footnote.

No new page request is made. The browser is not being instructed to access a different resource. In the example earlier, the fragment is #part-of-document. The fragment is usually used to refer to a part of the document. In HTML and XML, this is either by the id or name attributes of the elements.

In this document, the a tags that jump around the page have name attributes that the browser uses to scroll to them when the URL fragment changes, i.e. in these blog-post resources, the parts-of-the-document that I refer to with URL fragments are the footnotes and the places the footnotes refer to.

Using the document fragment to refer to specific resources is a crime committed by many "JavaScript apps" today. The reason this is a crime is that it is not identifying the resource; it's identifying the resource proxy, which means the correct client must be used to actually access the resource itself. It's like having a proprietary browser that only understands a completely different URI format.

It's a crime because browsers are more than capable of intercepting URI requests inside an application and getting the application to update as necessary, and servers are more than capable of returning a javascript-app-with-resource-in-it as the HTML representation of the resource.

There is no reason besides lack of imagination to trample all over that URI system just to avoid reloading the page every so often.

TODO

Not mentioned is the idea of a "related resource". This can be a third part of the URI path whereby you request an index of a separate resource based on the current one:

https://www.example.com/blog/post/1/comments

This is, conceptually, the same as

https://www.example.com/blog/comments?post=1

but you may wish to return the results differently, e.g. with more expanded objects rather than just URLs to the results.

In upcoming posts we'll probably have a look at those "functional" resources I mentioned in passing. This post has been entirely about "object" resources, i.e. those resources that simply represent some representation of a real-world object, or a fake-world object, but ultimately something that can be represented as a JSON object with fields and values. I will also try to discuss the resource-first view of website building using the aforementioned point-of-sale in Odoo as an example.

We also haven't discussed how it is that you would relate resources to one another in knowable ways. This ties in with the hyperlink concept and is the thinking behind Web::HyperMachine - HTML pages are already linked together with <a href="related-link">, but there are myriad other ways even those use hyperlinks to refer to other resources, and even more ways in HTTP itself.


1 I've omitted from this the user:pass@ part that can be used before the hostname, because it's not very common.

2 The "second-level domain" is colloquially the "company" part of the name, i.e. the first part that actually identifies at a human-readable level what it is the URI refers to. In some cases, such as .co.uk, the TLD is actually the SLD (co) and TLD (uk), and it is the third-level domain that is the company part. Colloquially, we can refer .co.uk as a TLD, so that this remains the SLD.

3 A URL is basically a URI that you can actually use. That is, there exist URIs that refer to resources but that cannot actually be used to access that resource; for example the ISBN URI schema cannot be used to get an actual book.

4 Like this one.

2014-12-17

Day 17: A complex and detailed investigation into the various merits and faults of the assorted combinations of codepage, character set and byte encoding of human-readable text.

There are 127 characters in ASCII and tens of thousands of characters in the real world. It is probably an interesting debate, trying to come up with the most efficient way of encoding non-ASCII characters without screwing everything up.

Don't waste your time. Use UTF-8 and Unicode.

"But what about UTF-16?" No.

"But what about--" NO.

ASCII is included in UTF-8 Unicode. So is everything else. Everyone understands it, everything's assuming it, and all the other encodings and charsets are more obscure and therefore harder to deal with.

Everyone (except PHP) has UTF-8 Unicode built in to whatever programming language they're using.

Unless you're writing for devices with memory measured in bytes and a network connection measured in baud then you have time and space to use the bloating of UTF-8 Unicode. So suck it up, be inefficient, and accept the VHS of UTF-8 over the Betamax of whatever you're looking all cow-eyed at today.

And, in case you were wondering, ASCII is never the right answer.

2014-12-16

Day 16: Web::Machine

Web::Machine is pretty cool because it reorganises the way you think about your website's structure, focusing on the perspective you should really be starting with in the first place.

Web::Machine encourages you to construct several objects, each of which handles a URI by representing the resource to which that URI points.

Remember that URI is a Uniform Resource Identifier. We've had this discussion. The parts of the internet that use URIs are based on the assumption that they are sharing information about resources, and hence the focus is on the resource.

Web::Machine starts with the resource. You construct an object and mount it as Plack middleware to handle the URI to that resource. These objects are actually the machines. You construct a Web::Machine with a subclass of Web::Machine::Resource, and if that's all you want to do, you call ->to_app on it and plack it up.

Each Web::Machine so constructed is a Plack::Component. That means you can bring in a Plack::Builder and mount machines in it.

my $builder = Plack::Builder->new;
$builder->mount(
    '/resource' => Web::Machine->new( 
        resource => 'MyApp::Resource'
    )
);

Alternatively, you might prefer to use something like Path::Router, providing subs that build Web::Machines based on arguments.

my $router = Path::Router->new;
$router->add_route('/resource/:id' => sub {
    my ($req, $id) = @_;
    Web::Machine->new(
        resource => 'MyApp::Resource',
        resource_args => [
            id => $id,
        ],
    )
    ->call($req->env);
});

Two things are notable about this particular invocation. First, it is necessary to run call on the resulting machine manually. The second is that, now that we have actual args coming in, we're seeing how Web::Machine takes an array ref for these, not a hashref; i.e. it's an argument list and not required to be hash-shaped.

MyApp::Resource is what handles the actual magic: Web::Machine expects certain subroutines to be overridden from the base class Web::Machine::Resource that define what this resource can do.

The sensible ones to provide are content_types_provided and the to_* filters that define how to represent this resource as the various content types it supports.

The documentation lists all of the functions that can be overridden to provide behaviour specific to this class.

RFPR: Web::HyperMachine

I've started taking this a step further. Resources are only part of what makes the interwebs work. The other part is the fact the resources are related to each other: hypermedia.

Up on the githubs is a start to the module Web::HyperMachine, which tries to wrap Web::Machine in an understanding of how the resources relate to one another. By adding a couple of DSL-like functions to the Resource class it is possible to automatically construct the URI schema for the system, using the declared names of resources and relationships within the resource classes themselves.

The user simply mounts those resources and the machine does the rest:

#!/usr/bin/perl
use strict;
use warnings;
use Web::HyperMachine;

my $app = Web::HyperMachine->new;
$app->with('MyApp::Resource');      
$app->to_app;

And the resource would be e.g.:

package MyApp::Resource;
use strict;
use warnings;

use parent 'Web::HyperMachine::Resource';

__PACKAGE__->uri('resource');

our @data = qw( hello hi hey howdy );

sub content_types_provided { [{ 'text/html' => 'to_html' }] }

sub fetch {
    my ($self, $id) = @_;
    return $data[$id];
}

sub to_html {
    my $self = shift;
    my $resource = $self->{resource};

    q{<h1>} . $resource . q{ world</h1>}
}

1;

If you plackup that script, you'll find that /resource/01 will return an HTML page with "Hello world" in it; and other values will correspondingly index into the array.

Feedback on this concept is encouraged; it's not been worked on for some time, like most things I do, because I got bored of it, because I didn't have an actual use for it.

1 If 0 doesn't appear to work, you may have an outdated version of Path::Router. The issue tracker says it is fixed on CPAN now.