snapsvg

2013-10-18

Fixing PHP

PHP is not a bad language.

Come back. Let me rephrase that.

PHP is a terrible implementation of what under the surface is a perfectly adequate, dynamic scripting language. Unfortunately it is implemented as a poorly-thought-out, logically bereft templating language, peppered with pitfalls and irritating inconsistencies.

But it can be fixed. It can be fixed with some simple, non-backwardly-compatible, sensible, welcome-to-the-real-world, feasible alterations. Let us begin.

1. Get rid of <?php ?>

The fact that PHP used to be a templating language is archaeologically apparent in this vestigial remnant from a bygone era. These tags are still all over the place because PHP is trying to be two things at once: both a templating language and a scripting language.

Once you grow up (or metastasise) and become a real language, you have to put away childish things.

These break-in-break-out tags were fine when PHP was designed to be parsed by a Perl script and run as a simple if-this, for-each-that dynamic HTML page generator. They remain fine, if you want to use PHP as the templating language it is. But if PHP wants to be taken seriously, the first thing it needs to do is stop hanging on to that I-can-do-templates-me attitude, and hand over to one of the many modern alternatives that have come along since the Internet was still finding its feet.

In fact there's no real reason PHP should not remain a templating language. After all, Mason (and indeed Template Toolkit) allow you to inject actual Perl into your web templates for those times when you simply can't be arsed to abstract your logic to where it's supposed to go. However, if PHP is going to behave like this, it needs to understand there is a difference between a PHP template and a PHP script.

Therefore I propose

1a. Create .php and .phpt file types

Or suchlike. .php files would naturally be PHP scripts and do away with that ridiculous <?php header that persists throughout PHP projects like a blight. .phpt or suchlike would be recognised as text files containing PHP segments, and they can use the old break-in-break-out paradigm to inject program logic into the template.

Of course it is not recommended in Mason or TT2 that you use actual Perl in your actual templates, because then the temptation is just to merge your views with your controller logic, and then you get into a Right Mess. Better would be simply to have a PHP port of TT2 or Mason, or use Twig or Smarty, and allow those to have their own this-bit-is-PHP-and-I'm-sorry directives.

1b. Make it a decent templating language too

It's a bit of an issue that PHP is stupid, as well. Modern templating languages offer myriad text processing options as part of the language itself. An example is the way Template::Toolkit allows you to filter output text through, e.g., the HTML filter, sanitising the data just before it's output.

PHP's best answer to this so far is user-written PHP classes that render PHP templates (two entirely different things written in the same language) by sanitising the data assigned to them at some time or other just before the template file itself is actually rendered.

That's just one example. PHP is not really a templating language any more either, because templating languages have evolved past the very basic output-string behaviour that PHP was originally tasked with. PHPT would need to catch up as well, and separate itself from PHP proper.

2. Stop pretending everything is an HTTP request

That PHP never left its template roots shows when you try to write command-line interfaces into your business software. You realise that you've been assuming throughout the code that the $_SERVER variable actually contains a URI of some description; that there's a protocol; that you're outputting HTML.

As soon as the first file that started with <?php and didn't contain a ?> was created, PHP was broken. As soon as you create a file that contains utility functions, or classes, you have a file that you can run without a webserver . As soon as you have that , you have a scripting language. That was the point at which people should have stood back, taken a look, and dived in to PHP 4 or whatever with the attitude that this time we're going to do it right.

No one did.

PHP still outputs HTML whenever it feels like it - see var_dump . It still has global, HTTP-centred variables. It doesn't do exit codes properly. The fact that exit and die are the same damn thing just shows that someone somewhere has completely misunderstood the point of these things. Heck I don't even know whether error messages actually go on stderr.

At about the time PHP was swapping its soft teething toy for its first big-boy spoon, the rest of the world was discovering that if you interface your HTTP server with your scripting language via stdout, you can maintain a separation of interests wherein your entire business logic is a collection of useful modules or classes or whatever, which when used in a web environment can be wrapped in an HTML layer and called a website - the layer being swappable for a CLI one that outputs the same information in a salient format. Or a JSON one, for public APIs, or even private, socket-based APIs that don't touch either HTTP or even TCP!

Nope. In PHP's land of unicorns and rainbows the whole world is an HTTP request. The world springs into existence when the request begins and disappears when the response is sent, and if anything happens to be left around since the last universe's brief lifespan came and went then that's just something we have to deal with as part of our new one. Trying to leverage command-line support, or non-HTTP support, into this assembly of spit and chewing gum is baby's first steak knife to PHP.

3. Use your own exception mechanism

Nothing is as irritating while working with PHP as when it throws its toys out of the pram. Now, I'm quite happy to accept that a parsing error is completely unrecoverable, but that is it, and absolutely it. Anything and everything that happens at runtime should be tryable, and anything that ever goes wrong should be catchable.

This expected feature of the language should not be taken as a comment on the sense in doing so. Trying to call $app->run() and catching it when it fails is going to be a bit less useful than letting it fail and tell you what was wrong.

But being able to catch it - now that's a tool we need. Since the original error mechanism was put in place a new, superior nonlocal return is available, and one which puts control in the hands of the user (without horrible set_error_handler hacks). Might as well use it.

4. Tidy up the root namespace

We get it. You like functions. Well, take stock and look around you. Not only have you implemented exceptions and then completely failed to use them, you've also implemented classes, interfaces, namespaces, closures and traits and failed to use those as well!

Right. For a start, having all those functions is confusing because there's no consistency in them. I'm not going to rewrite the entirety of A Fractal Of Bad Design , but I'm going to borrow from it here. Some of the functions have underscores, some don't ( strpos / str_rot13 ). Some take arguments one way, some the other ( array_filter($input, $callback) / array_map($callback, $input) ). Every time we use a built-in function we have to look up how it's spelled and what order the arguments are in and there are so. Damn. Many.

Secondly, certainly PHP has to lookup every called symbol in both the user's own symbol tables as well as the language's. That sort of thing is surely expensive, especially if this language is aimed at beginner programmers who are only ever going to use 10% of the functions 90% of the time.

Thirdly, every single built-in function or class is just another name that the user can neither use for their own functions nor override to replace. Sure, PHP has modules that you can jump through hoops to install at the C level, but who needs that?

All of this might be forgivable if this overabundance of global functions covered literally every possible operation a user could conceivably want; but it doesn't! Worse still, a majority of them can trivially be abstracted into one generic function that takes a callable. All the array_* functions, for example: the sort functions are all just user sort with different sort procedures passed in. The filter functions are all the same with different identity functions passed in - and, for a specific example, recently I needed a version of array_search that took a custom identity function! How dare I want the key of a value that has a sub-value that matches my input! PHP says I may not do that and therefore I may not do that.

Ridiculous. The fact the PHP team haven't abstracted this stuff sensibly does not speak in favour of their ability to write the code behind PHP in the first place, does it? It doesn't take a genius to tidy all this up, and yet no one has - nor has anyone written the tidied version alongside. That attitude of constant implacability hurts the language and the community and the reputation of the people behind it, and damages confidence.

Hypothetical inefficiency aside it's just poor maintenance. The language has a mechanism by which to automatically find class files when a non-existent class is requested. So, put all the less-common functions in autoloaded classes and put those classes somewhere discoverable. Everyone else is modular these days. Is it stubbornness or incompetence that's leaving PHP behind?

Also, quit adding useless prefixes or suffixes to your functions. I know you're going to push onto an array because you push onto arrays. So call it push , not array_push .

Also also, don't fob us off with mb_ crap. Fix your Unicode. There's no excuse whatsoever for a language prevalent in the 21st century to be coded by people who can't cope with Unicode, or its various representations. I know, it's hard. Writing a language is hard. If you can't, don't.

5. Expressions, for the love of god

PHP's compiler is apparently written by chimps. Do we still really believe that there is a difference between a statement and an expression? Do we really still have to have "language constructs" (PHP's term) that are parsed and treated differently from any other expression?

No. Maybe back in the stone age we did things that way but here in the age of enlightenment we have come to realise that the only real difference between a statement and an expression is that a statement actually has a persistent effect.

In PHP, for example, the x or y construct has become possible. Except when y is not an expression - which is 90% of the bloody language. return is not an expression. continue is not an expression. die is not an expression, but it is special-cased to work with or , and has been since before we even had the x or y construct in the first place. Because Perl did it. exit is not an expression and does not have the same special-casing in the language that die does, even though it is the exact same thing .

Another example. Normally, () is used to group things, i.e. to override precedence. I'm quite OK with the way it's required for function calls, conditions etc. In PHP, however, these seem to form a magical, ref-breaking construct that is parsed under its own rules. That is to say, in PHP, $a is not guaranteed to be the same as ($a) . That's because PHP is a language whose every feature is a special case in the parser. If $a is a ref, ($a) is not any more.

So what's the point of all these examples? Well hopefully they all bring up the obvious question: why? Why are these things different? For a given X, why does the way you use X have to be allowed by the compiler?

A language built out of expressions is obvious - expressions are what make the operands to operators. And an operator is itself another, larger expression. Suddenly the parsing should seem trivial; you look at a line of code, decide which operators and expressions it contains and run them in a well-defined order. You can see it in the language that when you use an expression it behaves exactly like you'd expect any other expression to behave. At least, it compiles like that - runtime behaviour may be bizarre.

It's trivial to draw up a simple table of PHP's main features in terms of expressions; in all of this the reader is invited to consider in what situations these do not work in PHP's current implementation, and what it means about the compiler for that to be the case. In the table, X and Y mean any expression, i.e. literally anything that compiles.

Construct Meaning Examples Notes
${X} The value referred to by X ${$foo} # $$foo

${f()}

$a = &$b; ${$a}
When X returns a string, look up that variable. Otherwise, treat it as a reference. When X is another variable, the {} can be omitted.
X [Y] Return the element Y from the array X $array['foo']

f()['foo']

x()[y()]

['a', 'b', 'c'][0]
This implements the "feature" that is "special" in PHP 5.5 of array literal dereferencing (example 3)
X() Run the closure X f()()

$x()

['a' => function() {}, ...][$x]($y)
Actual functions like f() are separate, since f is not a valid expression.
X or Y If X is false, run Y $type = $config['type'] or continue;
X and X If X is true, run Y $val = $config['x'] and return $val;

The reader should take away from this at least the awareness that all of the examples in this table would already work if PHP used a proper expression-based grammar; but instead we have been sold these things piecemeal over the past few versions as new features important enough to go on the front page of the release notes.

6. Complete the complement of magic methods

__toString is a pretty good method. It uses an established consistent convention that double-underscore means special-to-PHP. It uses dynamic dispatch so that if it exists it's used, and if it doesn't there's no "default" behaviour - it just complains.

There are also __isset , __set , __get etc. These do what you'd expect: test for setness, default setter, default getter...

Where's __toInt ? __toFloat ? __toArray ? Why is __toString represented and not the others? Furthermore, if you can use a string as an integer and only complain after this conversion, why don't you use __toString first and then try to turn the result into an integer?

Consistency is paramount in a structured, logical world such as programming. Expectations being formed and then violated is the worst of things. It's the Principle of Least Astonishment . Use it.

7. Stop pretending you have types. Or: Have proper types.

What in god's name is this? (int) $val

"Casting," I hear you cry. "It is casting the type of $val to int !"

"Rollocks," I reply in a PG way. For casting is the act of converting a type through known mechanisms to another type. But we don't have __toInt to convert all possible $val s to int , and we don't have mechanisms to convert all possible types in place of int in the first place.

Nope, it is another special case in the PHP compiler, where someone saw another language doing something and implemented the same syntax but completely failed to understand what it was doing, and implement the theory rather than the practice .

What about this? function foo(array $arg)

"Type hinting!" comes the call from the thousands-strong crowd. But if I ask them to explain this mechanism they roll out the usual approximately-right answers they read in the documentation but cannot explain the concept.

PHP is a dynamic language; that's one of its strengths. Dynamic means that PHP exhibits certain runtime features that static languages require at compile time. For the purposes of this section the dynamic features we are interested in are:

  • Runtime method lookup. If an object can perform a method, the method will be performed. If not, a runtime exception is thrown. Inheritance introduces methods from other classes into the object's symbol table, assisting DRY, but otherwise there is no reason every method could not simply be dynamically dispatched to a function somewhere using magic.
  • Automatic type conversion. If an operation requires a string and an integer is provided, or an integer and a string is provided, or a string and an object is provided, PHP will transparently perform the conversion at runtime and only complain if it didn't work.

Now apply your theories about type hinting to this. What can it do but cripple PHP's dynamicity? Duck typing is the principle by which, if you have dynamic method lookup, an object only has to be able to perform a task in order to be considered suitable for the task. That is, until runtime, until you actually try to run the method on the object, there is no way to know that the object cannot do it. If there were you would have sacrificed dynamic method lookup for static compilation already. Type hinting for classes is completely non-semantic if you have the option of duck typing, because there is literally nothing special about your particular class that makes it important that an object is of this type.

How about non-object type hinting? Well you can't actually do that, because int and string aren't types to hint about - probably because any scalar can be used as a string! And any string can be used as an integer! So why enforce the check? Or, from the other perspective, why aren't they types? I can cast to them; why can't I require them?

And why can I require classes but not cast to them?

If we look at the whole type system of PHP as a looser concept than PHP makes it, it makes a lot more sense.

Classes are not some promissory aspect of a piece of data that ensure the datum can perform tasks, but an organisational structure allowing you to introduce functionality from other classes into new ones by inheritance or merging traits. From this perspective, duck typing makes sense - you don't need a specific class to ensure an object can perform tasks; any class can theoretically do it, especially if it consumes a trait that provides it. Type hinting for classes, from this perspective, is logically inconsistent with traits - which are considerably more useful - because you can't test for what a class can do , which is the only thing that's important.

Similarly, basic types are not remotely based on reality either: even if you could ask for a string or an integer, assuming we get the rest of the family of magic methods, any object could have __toString or __toInt . And even if we don't get __toInt , a string can be an int . So if you ask for an int, you could give a string, and you won't know the data the string contains are bad until you try to use it as an int. And you should be able to give an object to a parameter that wants an int simply by casting it to a string and then an int - something PHP should be doing for us already.

Hopefully the reader has spotted the inconsistency between type hinting and a dynamic language: the language cares about what the datum can be , but the type hinting cares about what the datum is . There is absolutely no logical association between what the datum is and what it can do , because Dyamic Point 1 allows for any object - independently of class , thanks to traits and __call - to be able to perform any task; and Dynamic Point 2 allows any type - thanks to __toString and the proposed __toInt and __toArray - to be any other type.

If you're going to have type hinting, therefore, you have to have statically compiled types: you have to enforce the relationship between type and behaviour; otherwise, your type hints are just extra bytes in a file that are going to appear in a commit log at some point in the future deleted by some frustrated developer trying to implement a trait and use it in a method that doesn't expect it.

That's all

I'm sure I could find many more examples of things PHP can fix at a basic level and stop being so irritating about simple things. You'll note I didn't complain about the tiresome conflation of array and dictionary, despite it being the biggest misunderstanding in programming history.

But surely this is a start? We can keep most of the PHP grammar; the syntax doesn't change (much); and so many of the pitfalls and gotchas that a programmer falls into will be resolved in one fell swoop!

As with many things PHP has reached sufficient mass that nothing important will ever change, because the politics of the mailing lists drag everything down, with half-right people expressing their ill-informed opinions on stuff that really, actually matters.

And there's the rub; the alternative is to start again. Start a new, similar language, on the right foot. A language that doesn't have those tags; a language that interfaces with the standard streams properly; a language detached from the web server, that doesn't assume a web environment; a standalone, dynamic, modular language, easy to learn, easy to stick together, easy to run on any decent OS and the not-decent one.

But why? We already have Perl and Ruby and Python. The amount of changes required to PHP means that literally the only reason to improve it at all is that it's associated with the name PHP. Installing it, upgrading it; these things would take an identical amount of effort as simply using an alternative. It wouldn't be sufficiently backwardly compatible that existing PHP code would run, because all the crap you have to do in existing PHP code wouldn't be possible or necessary.

It can still be done, though. But it won't.