Facebook released a new PHP extension today that supports inlining XML.
This is a feature known as XML Literals in Visual
Basic. Go read
their description here:
http://www.facebook.com/notes/facebook-engineering/xhp-a-new-way-to-write-php/294003943919
It adds an extra parsing step which maps inlined XML elements to PHP
classes. These classes are
core.php
and
html.php
which covers all the main HTML elements. The syntax of those class
definitions is a bit odd. That oddness is explained in the How It
Works document.
Essentially, it lets you turn:
<?php
if ($_POST['name']) {
echo "<span>Hello, {$_POST['name']}.</span>";
} else {
?>
<form method="post">
What is your name?<br>
<input type="text" name="name">
<input type="submit">
</form>
<?php
}
into:
<?php
require './core.php';
require './html.php';
if ($_POST['name']) {
echo <span>Hello, {$_POST['name']}.</span>;
} else {
echo
<form method="post">
What is your name?<br />
<input type="text" name="name" />
<input type="submit" />
</form>;
}
The main interest, at least to me, is that because PHP now understands the XML it is outputting, filtering can be done in a context-sensitive manner. The input filtering built into PHP can not know which context a string is going to be used in. If you use a string inside an on-handler or a style attribute, for example, you need radically different filtering from it being used as regular XML PCDATA in the html body. Some will say this form is more readable as well, but that isn't something that concerns me very much.
The real question here is what is this runtime xml validation going to cost you. I have given talks in the past where I have used "class br extends html { ... }" as a classic example of something you should never do. A br tag is just a br tag. When you need one, stick a \<br> in your page, don't instantiate a class and call a render() method. So, when I looked at html.php and saw:
class :br extends :xhp:html-singleton {
category %flow, %phrase;
protected $tagName = 'br';
}
I got a bit skeptical. Another thing I have been known to tell people is, "Friend don't let friends use Singletons." Which isn't something I came up with. Someone, a friend, I guess, told me that years ago. Ok ok, as Marcel points out in the comments, this isn't a real singleton, just in name.
The "singleton" looks like this:
abstract class :xhp:html-singleton extends :xhp:html-element {
children empty;
protected function stringify() {
return $this->renderBaseAttrs() . ' />';
}
}
which extends html-element which in turn extends primitive. You can go read all the code for those yourself.
Note that to build XHP you will need flex 2.5.35 which most distros won't have installed by default. Grab the flex tarball and ./configure && make install it. Then you are ready to go.
I pointed Siege at my rather underpowered AS1410 SU2300 with the above trivial form examples. The plain PHP one and the XHP version. Ran each one 5 times benchmarking for 30s each time. The plain PHP one averaged around 1300 requests/sec. Here is a representative sample:
acer:~> siege -c 3 -b -t30s http://xhp.localhost/1.php
** SIEGE 2.68
** Preparing 3 concurrent users for battle.
The server is now under siege...
Lifting the server siege... done.
Transactions: 38239 hits
Availability: 100.00 %
Elapsed time: 29.60 secs
Data transferred: 3.97 MB
Response time: 0.00 secs
Transaction rate: 1291.86 trans/sec
Throughput: 0.13 MB/sec
Concurrency: 2.93
Successful transactions: 38239
Failed transactions: 0
Longest transaction: 0.05
Shortest transaction: 0.00
And the XHP version:
Transactions: 868 hits
Availability: 100.00 %
Elapsed time: 29.28 secs
Data transferred: 0.08 MB
Response time: 0.10 secs
Transaction rate: 29.64 trans/sec
Throughput: 0.00 MB/sec
Concurrency: 2.99
Successful transactions: 868
Failed transactions: 0
Longest transaction: 0.21
Shortest transaction: 0.05
So, a drop from 1300 to around 30 requests per second and latency from less than 10ms to 100ms. Running XHP on plain PHP is definitely out of the question. But, knowing that Facebook uses APC heavily and looking through the code (see the MINIT function in ext.cpp) we can see that it should play nicely with APC. So, re-running our PHP version of the form, now with APC enabled, that goes from 1300 to around 1460 requests per second, and no measurable latency:
Transactions: 43773 hits
Availability: 100.00 %
Elapsed time: 29.88 secs
Data transferred: 4.55 MB
Response time: 0.00 secs
Transaction rate: 1464.96 trans/sec
Throughput: 0.15 MB/sec
Concurrency: 2.93
Successful transactions: 43773
Failed transactions: 0
Longest transaction: 0.07
Shortest transaction: 0.00
The XHP version of the form now with APC enabled:
Transactions: 9707 hits
Availability: 100.00 %
Elapsed time: 29.45 secs
Data transferred: 0.94 MB
Response time: 0.01 secs
Transaction rate: 329.61 trans/sec
Throughput: 0.03 MB/sec
Concurrency: 2.97
Successful transactions: 9707
Failed transactions: 0
Longest transaction: 0.21
Shortest transaction: 0.00
Much better. But it is still around a 75% performance drop from 1460 to 330 and a \~10ms latency penalty. And yes, I did have a default filter enabled for these tests, so there was basic XSS filtering in place for the naked $_POST['name'] variable in the plain PHP version. Of course, the default filtering would likely fail if the user data was used in a different context. And this 75% is obviously going to depend on what else is going on during the request. If you are spending most of your time calculating a fractal or waiting on MySQL, you may not notice XHP very much at all.
The bulk of the time is spent in all the tag to class interaction. If
the core.php and html.php code was all baked into the XHP extension, it
would be a lot quicker, of course. So, when you combine XHP with HipHop
PHP you can start to imagine that the performance penalty would be a lot
less than 75% and it becomes a viable approach. Of course, this also
means that if you are unable to run HipHop you probably want to think a
bit and run some tests before adopting this. If you are already doing
some sort of external templating, XHP could very well be a faster
approach.
Update: Here are the callgraphs.
The first is the plain PHP+APC version without XHP. And the second is
the PHP+APC+XHP version. In the first you see all sorts of bits and
pieces all across the stack getting cpu time.
In the second graph we see the effect of needing to copy and instantiate
those core and html classes on every request. They are cached in APC, of
course, but because of PHP's perfect sandbox, they cannot persist.
So we went from spending around 1% of our time in the executor to over
80%.
This isn't an entirely fair comparison, of course, since the plain
version has close to no PHP to execute while the XHP version has 93
userspace classes to deal with. I would guess that XHP could get quite a
boost if at least the primitives in core.php could be baked into the
extension. Ideally all 93 basic html classes would be in C++ in the
extension itself, but that would be a bit of a tedious undertaking.