Running Composer with HHVM, not so fast!

HHVM is an open-source virtual machine developed by Facebook and designed for executing programs written in Hack and PHP. It offers increased performance for PHP, most of the time. You can already use HHVM in ServerGrove servers by using our packages.

We have read numerous blog posts and articles suggesting to use HHVM when running Composer. If you are not familiar with Composer, check out our Composer 101 article.

Since Composer needs to perform some heavy computations in order to resolve the dependencies of a project, it makes sense to use HHVM. However, the heavy computations are mainly done when running composer update, or when the composer.lock file has not yet been generated so this is where you will see most of your gains in execution time.

Here are some tests while working with a Symfony2 Standard Edition project.

Running composer update with HHVM  takes ~4 seconds:

$ time hhvm composer.phar update --prefer-dist --no-scripts
Loading composer repositories with package information
Updating dependencies (including require-dev)
Nothing to install or update
Writing lock file
Generating autoload files

real 0m4.362s
user 0m3.540s
sys 0m0.168s

While running the same command with PHP takes almost 11 seconds:


$ time php composer.phar update --prefer-dist --no-scripts
Loading composer repositories with package information
Updating dependencies (including require-dev)
Nothing to install or update
Writing lock file
Generating autoload files

real 0m10.768s
user 0m9.692s
sys 0m0.180s

Running composer install

So running composer update with HHVM is clearly a benefit. It will save you a lot of time, and the larger the number of dependencies, the more time you will save.

However, we found that when running composer install with HHVM takes longer than doing it with PHP. This is especially important when using Composer in deployment workflows, as usually you would run install and not update and you will have a composer.lock file.

The reasons why HHVM is slower than PHP when running composer install are still unclear. There is some time lost while doing the JIT warmup, and HHVM really shines when it can run optimized and compiled bytecode, which is not the case with command line scripts, but we believe there is more than just the JIT warmup, as tests show several seconds of difference. We gained some performance by disabling the JIT (hhvm -v Eval.Jit=0), but it is still slower than PHP.

Composer install with PHP:

real 0m2.547s
user 0m1.190s
sys 0m0.919s

Composer install with HHVM:

real 0m4.905s
user 0m3.010s
sys 0m1.337s

Composer install with HHVM and JIT disabled:

real 0m3.062s
user 0m1.604s
sys 0m1.158s

You can run the same tests by using the shell script in this gist. Please share your findings with us.

Conclusion

We have seen many posts and articles suggest that you should create an alias in your shell configuration so you always run Composer with HHVM. Given the fact that HHVM is not always faster than PHP, we don’t think this is a good idea yet. We expect the HHVM team to improve the performance and at some point this article will become obsolete. Until then, consider this before you go all the way HHVM.

Also, we would like to know what others think and we encourage to try this test with your project. Share with us what you find.

abril 17 / 2014
Author Pablo
Category PHP
Comments 3 Comments
Tags ,

New TLDs

NewTLDs

Upcoming Conferences

CraftingTour

CraftingTour

PHP New Zealand









Symfony2 components overview: Process

This post covers the Symfony2 process component and is the 11th post from our series on Symfony2 components.

The Symfony2 Process component, allows us to execute commands in sub-processes.

171847361_546785da09_z

The Process component does all the hard low-level work for you
when dealing with sub-processes

Installation

The recommended way of installing the component is through Composer:

{
    "require": {
        "symfony/process": "2.4.*"
    }
}

If you have never used Composer before check out our Composer 101 post.

The component

The Process component provides an object-oriented abstraction on top of proc_* functions to execute independent processes from PHP.

For example, to list the files and directories in the directory where the current PHP script is located:


use Symfony\Component\Process\Process;

$process = new Process('ls -lh ' . escapeshellarg(__DIR__));
$process->run();

// executes after the command finishes
if (!$process->isSuccessful()) {
    throw new \RuntimeException($process->getErrorOutput());
}

echo $process->getOutput();

The example is quite self-explanatory, but let’s take a look and go step-by-step to see what happened exactly. First, we create a Process instance, which is the main class of the component. Besides passing the command we want to execute, it is also possible to pass the working directory, environment variables or a timeout.

function __construct($commandline, $cwd = null, array $env = null, $stdin = null, $timeout = 60, array $options = array())

As soon as we call the run() method, the command is executed and the PHP interpreter waits until the command finishes. In case that the execution was not successful (exit code different than 0), it throws an exception. If the execution was successful, it simply prints out the command output.

The output of the script would be similar to this:

total 64
-rw-r--r--  1 raul  raul   374B Apr 14 09:14 composer.json
-rw-r--r--  1 raul  raul    22K Apr 14 09:15 composer.lock
-rw-r--r--  1 raul  raul   321B Apr 14 11:32 ex1.php
drwxr-xr-x  8 raul  raul   272B Apr 14 09:15 vendor

Exit codes

The process class provides a few methods to deal with exit codes. We can even get the exit code as a text, as the Process class maintains a map of common exit codes:

$processA = new Process('ls -lh');
$processA->run();

// int(0) string(2) "OK"
var_dump($processA->getExitCode(), $processA->getExitCodeText());

$processB = new Process('foo');
$processB->run();

// int(127) string(17) "Command not found"
var_dump($processB->getExitCode(), $processB->getExitCodeText());

The exit code is also used to determine if a command was executed successfully, by the isSuccessful() method, which returns true if the exit code is 0 and false otherwise.

Long Running Processes

When we have to deal with long running processes things tend to get a little trickier, as we have take into account things like timeouts, incremental outputs, responsiveness, and signals. The Process class provides ways to make these problems manageable.

Timeouts

There are two available timeouts: process timeout (max runtime) and process idle timeout (max. time since last output). In the following code, as the ping command in Unix systems runs infinitely (unless we specify the “-c” option), a ProcessTimedOutException exception will be thrown after 10 seconds:

$process = new Process('ping example.com');
$process->setTimeout(10);
$process->run();

This would not be true with the idle timeout, as most of the time the ping command outputs new information in less than 10 seconds. In this case, the process will probably run until exceeds the memory_limit setting:

$process = new Process('ping example.com');
$process->setTimeout(null); // disable "normal" timeout
$process->setIdleTimeout(10);
$process->run();

Outputs

In long running processes we need some sort of “real time” output so the user perceives that the process is still running and is not dead. There are two ways to do this: outputting the command output as soon as it gets available, or printing some “loading” or “in progress” message. Let’s see an example of both approaches using again the ping command:

In the following example, we pass PHP callable to run whenever there is some output available on STDOUT (standard output) or STDERR (standard error). Each time there is output available, we print a dot so the user knows the command did not hang and is still running.

$process = new Process('ping -c 5 example.com');
echo 'Executing';
$process->run(function ($type) {
    if (Process::OUT === $type) {
        echo '.';
    }
});

It is also possible to print the command output as soon as it gets available defining a second parameter in the callable. This time, the script will print the command output as soon as it gets available:

$process = new Process('ping -c 5 example.com');
$process->run(function ($type, $buffer) {
    if (Process::ERR === $type) {
        echo $buffer;
    }
});

This can be done also with the getIncrementalOutput() and getIncrementalErrorOutput() methods, that returns only the new output since the last call.

$process = new Process('ping -c 5 example.com');
$process->start();

while ($process->isRunning()) {
    echo $process->getIncrementalOutput();
}

It is recommended to add a small delay in the while loop if we can afford it to reduce the number of calls to getIncrementalOutput().

Signals

Signals are asynchronous notifications sent to a process to notify of an event that occurred. Using signals we can for instance stop asynchronous processes using the signal() method:

$process = new Process('ping -c 50 example.com');
$process->start();
sleep(3);
$process->signal(SIGKILL);

The ping command is “killed” after 3 seconds. The SIGKILL constant is defined in PCNTL.

PIDs

A PID is a number to temporarily uniquely identify a process. We can get the PID of a running process with the getPid() method:

$process = new Process('ls -lh');
$process->start();

var_dump($process->getPid()); // int(78316)

Executing PHP Code

The component provides the class PhpProcess (which extends from Process), to execute PHP code in isolation. That means that it is run in a different process so no variables or open resources are shared between them.

use Symfony\Component\Process\PhpProcess;

$process = new PhpProcess('<!--?php echo "Hello world!";'); $process--->run();

echo $process->getOutput();

It will print out “Hello world!”.

Under the hood

If you are following the posts of this series you know that we always like to dive a little bit deeper and find out how the component is made internally. Usually, the official documentation for Symfony components is excellent so we try to give back to the community by explaining them in a different way and trying to share roughly how they work internally.

Internally, the Process class makes use of the proc_open() function to execute a command and open file pointers for input/output. The proc_open() function is not straightforward to use, as it needs a descriptor specification.

The descriptor specification is an indexed array to tell the function how we want to handle stdin, stdout and stderr. By default, in the component, pipes are used, but it can be configured to use a file for stdout instead. These are the parameters that proc_open() receives when executing “ls -lh”:

string(6) "ls -lh"
array(3) {
  [0] =>
  array(2) {
    [0] => string(4) "pipe"
    [1] => string(1) "r"
  }
  [1] =>
  array(2) {
    [0] => string(4) "pipe"
    [1] => string(1) "w"
  }
  [2] =>
  array(2) {
    [0] => string(4) "pipe"
    [1] => string(1) "w"
  }
}
array(0) {
}
string(31) "/Users/raulfraile/servergrove/process"
NULL
array(2) {
  'suppress_errors' => bool(true)
  'binary_pipes' => bool(true)
}

The option “suppress_errors” is only for Windows systems and suppresses errors generated by the proc_open() function, while “binary_pipes” forces to open pipes in binary mode, instead of using the usual stream_encoding.

The Process component is one of the oldest in the Symfony framework, it is quite interesting to view the transformation it went through over 4 years, take a look at how simple it was (254 LoC) and how much more complete it became (1446 LoC). This is a big reason why it is great to reuse well developed and tested libraries.

Finally, as a curiosity, the PhpProcess class, which is used to execute PHP code, already supports HHVM. The PhpExecutableFinder class, used to find the PHP binary, checks whether HHVM is being used by reading the HHVM_VERSION constant, only available when HHVM is the current engine.

Who’s using it?

More info

Photo: Work in progress, by Stefano Mortellaro

abril 16 / 2014

New TLDs

NewTLDs

Upcoming Conferences

CraftingTour

CraftingTour

PHP New Zealand









How GZIP Compression Works

Even in today’s world, with fast networks and almost unlimited storage, data compression is still relevant, especially for mobile devices and countries with poor Internet connections. This post covers the de-facto lossless compression method for compressing text data in websites: GZIP.

Morse code is one of the first lossless compression standards. More frequent letters get shorter codes

Morse code is one of the first lossless compression standards. More frequent letters get shorter codes

GZIP compression

GZIP provides a lossless compression, that is, we can recover the original data when decompressing it. It is based on the DEFLATE algorithm, which is a combination of LZ77 and Huffman coding.

The LZ77 algorithm replaces repeated occurrences of data with references. Each reference has two values: the jump and the length. Let’s see an example:

Original text: "ServerGrove, the PHP hosting company, provides hosting solutions for PHP projects" (81 bytes)
LZ77: "ServerGrove, the PHP hosting company, p<3,32>ides<9,26>solutions for<5,52><3,35>jects" (73 bytes, assuming that each reference is 3 bytes)

As you can see, the strings ” hosting ” and ” PHP ” are repeated, so the second time that the substring is found, is replaced by a reference. There are other repetitions too, like “er”, but as there would not be any gain, the original text is left.

Huffman coding is a variable-length coding method that assigns shorter codes to more frequent “characters”. The problem with variable-length codes is usually that we need a way to know when a code ends and the new one starts to decode it. Huffman coding solves this by creating a prefix code, where no codeword is a prefix of another one. It can be understood more easily by an example:

Original text: "ServerGrove"
ASCII codification: "01010011 01100101 01110010 01110110 01100101 01110010 01000111 01110010 01101111 01110110 01100101" (88 bits)

ASCII is a fixed-length character-encoding scheme, so the letter “e”, which appears three times and is also the most frequent letter in the English language, has the same size as the letter “G”, which only appears once. Using this statistical information, Huffman can create a most optimized scheme:

Huffman: "1110 00 01 10 00 01 1111 01 110 10 00" (27 bits)

The Huffman method allows us to get shorter codes for “e”, “r” and “v”, while “S” and “G” got the longer ones. Explaining how to use the Huffman method is out of the scope of this post, but if you are interested I recommend you to check this great video from Computerphile.

DEFLATE, which is the algorithm used for GZIP compression, is a combination of both these algorithms.

Is GZIP the best compression method?

The answer is NO. There are other compression methods that get higher compression ratios, but there are a few good reasons to use it.

First, even though GZIP is not the best compression method, it provides a good tradeoff between speed and ratio. Compressing and decompressing data with GZIP is fast, and the ratio is quite decent.

Second, it is not easy to add a new global compression method that everyone can use. Browsers would need to be updated, which today is much simpler using self-update mechanisms. However, browsers are not the only problem, Chromium tried to add support for BZIP2, a better compression method based on the Burrows–Wheeler transform, but had to cancel it as some old intermediate proxies corrupted the data as they were not able to understand the bzip2 header and tried to gzip the contents. You can see the bug report here.

GZIP + HTTP

The process between the client (browser) and the server to get the content gzipped is simple. If the browser has support for GZIP/DEFLATE, it lets the server know by the “Accept-Encoding” request header. Then, the server can choose whether sending the contents gzipped or raw.

Screen Shot 2014-04-07 at 09.44.05

Implementations

The DEFLATE specification provides some freedom to developers to implement the algorithm using different approaches, as long as the resulting stream is compatible with the specification.

GNU GZIP

The GNU implementation is the most common and was designed to be a replacement for the compress utility, free from patented algorithms. To compress a file using the GNU GZIP utility:

$ gzip -c file.txt > file.txt.gz

There are 9 levels of compression, being “1″ the fastest with the smallest compression ratio and “9″ the slowest with better compression ratio. By default, “6″ is used. If we want maximum compression at the cost of using more memory and time in the process, the -9 flag (or –best) can be used:

$ gzip -9 -c file.txt > file.txt.gz

7-zip

7-zip implements the DEFLATE algorithm differently and usually achieves higher compression ratios. To compress a file with the maximum compression:

7z a -mx9 file.txt.gz file.txt

7-zip is also available in Windows and provides implementations for other compression methods such as 7z, xz, bzip2, zip and others.

Zopfli

Zopfli is ideal for one-time compression, for example, in build processes when the file is compressed once and served many. It is ~100x slower, but compresses around 5% better than other compressors.

Enabling GZIP compression

Apache

The mod_deflate module provides support for GZIP compression, so the response is compressed on the fly before being sent to the client over the network.

To enable it for text files, add this in your .htaccess file:

AddOutputFilterByType DEFLATE text/plain
AddOutputFilterByType DEFLATE text/html
AddOutputFilterByType DEFLATE text/xml
AddOutputFilterByType DEFLATE text/css
AddOutputFilterByType DEFLATE application/xml
AddOutputFilterByType DEFLATE application/xhtml+xml
AddOutputFilterByType DEFLATE application/rss+xml
AddOutputFilterByType DEFLATE application/javascript
AddOutputFilterByType DEFLATE application/x-javascript

There are some known bugs with older versions of some browsers, so it is also recommended to add these lines in case you support them:

BrowserMatch ^Mozilla/4 gzip-only-text/html
BrowserMatch ^Mozilla/4\.0[678] no-gzip
BrowserMatch \bMSIE !no-gzip !gzip-only-text/html
Header append Vary User-Agent

It is also possible to serve pre-gzipped files instead of doing every time on the fly. This is especially useful for files that don’t change in every request such as JavaScript or CSS files, which can be compressed using a slow algorithm and then served directly. In your .htaccess, include this:

RewriteEngine On
AddEncoding gzip .gz
RewriteCond %{HTTP:Accept-encoding} gzip
RewriteCond %{REQUEST_FILENAME}.gz -f
RewriteRule ^(.*)$ $1.gz [QSA,L]

What we are doing here is telling Apache that files with .gz extensions should be served with the gzip encoding-type (line 2), checking that the browser accepts gzip (line 3) and if the gzipped file exists (line 4), we append .gz to the requested filename.

nginx

The ngx_http_gzip_module module allows compressing files with GZIP on the fly, while the ngx_http_gzip_static_module one allows sending precompressed files with the “.gz” filename extension instead of regular files.

An example configuration looks like this:

gzip            on;
gzip_min_length 1000;
gzip_types      text/plain application/xml;

GZIP + PHP

While it is not usually recommended to compress data using PHP as it will be slower, it is possible to do it from PHP using the zlib module.

For example, to compress the jQuery minified library using the maximum compression:

<?php

$originalFile = __DIR__ . '/jquery-1.11.0.min.js';
$gzipFile = __DIR__ . '/jquery-1.11.0.min.js.gz';

$originalData = file_get_contents($originalFile);

$gzipData = gzencode($originalData, 9);
file_put_contents($gzipFile, $gzipData);

var_dump(filesize($originalFile)); // int(96380)
var_dump(filesize($gzipFile)); // int(33305)

Data can be decompressed with gzdecode(). The zlib module also defines a few stream wrappers to compress data.

More information

Photo: Morse Code Straight Key J-38, by Anthony Catalano

abril 14 / 2014
Author Raul Fraile
Category PHP, Shared Hosting, VPS
Comments 3 Comments

New TLDs

NewTLDs

Upcoming Conferences

CraftingTour

CraftingTour

PHP New Zealand









Important OpenSSL security update

A serious security vulnerability has been found in certain versions of the OpenSSL library which is used by servers and other software to security and encrypt network connections.

Affected versions of OpenSSL are 1.0.1 prior to 1.0.1g and 1.0.2-beta. You can read more about the report at the MITRE site.

Please note that some distributions, like CentOS, apply security patches while maintaining version numbers, so the latest openssl.x86_64 0:1.0.1e-16.el6_5.7 will include the fix.

If you want to check your SSL installation, we recommend that you use SSL Labs’s server test tool.

OpenSSL on VPS

If you are running a VPS with CentOS 6.x, Ubuntu 12.04, or Debian 7, we highly recommend that you upgrade OpenSSL immediately. If you have a previous OS version you are not vulnerable, unless you upgraded OpenSSL manually, to be sure check your OpenSSL version. Here’s how to do it:

$ openssl version -a

Updating OpenSSL can be done with the following command:

# CentOS
$ yum upgrade

# Ubuntu / Debian
$ apt-get update && apt-get upgrade

Please note that in order to use the updated library, you will need to restart Apache or other servers that use SSL.

UPDATE: Due to the severity of the issue we have upgraded all VPS with OpenSSL installations that we found to be vulnerable.

OpenSSL on Shared Hosting

On our shared hosting servers we run a previous version of OpenSSL which is not vulnerable so no update is required.

What steps should you take to secure your applications?

If your system was affected, it is advisable to take steps to secure your application. Even though there is no way to know if your system was compromised, the safest option is to act as if it were.

1) If you are using SSL certificates for your sites there is a risk that your certificates have been compromised. So we recommend that you ask your certificate provider to re-issue your certificates and then replace your certificates with the new ones.
2) Change any passwords or other credentials that were encrypted by your old SSL certificates.
3) If your application has user accounts, we recommend you change the passwords on all user accounts
4) If you’re using phpMyAdmin or phpPgAdmin on our servers you should change these passwords.
5) You may want to invalidate all current sessions after requesting your users change their passwords to rule out any potential session hijacking.

You can find more information about the heartbleed bug at http://heartbleed.com

Instructions to update your SSL certificate can be found here.

If you have questions, please open a support ticket.

abril 08 / 2014
Author Pablo
Category ServerGrove, VPS
Comments No Comments

New TLDs

NewTLDs

Upcoming Conferences

CraftingTour

CraftingTour

PHP New Zealand









Symfony2 components overview: ExpressionLanguage

This is the 10th post in our series on Symfony2 components and we will cover the latest component added to Symfony: the ExpressionLanguage component. This component was added in version 2.4 and  provides a way to have dynamic aspects in static configurations. For example, it can be used to evaluate expressions in configuration files, create a DSL ,or build a business rules engine.

The ExpressionLanguage component adds a bit of "color" to your configuration data

The ExpressionLanguage component adds a bit of “color” to static data

Installation

Like the other components, you can install using Composer:

{
    "require": {
        "symfony/expression-language": "2.4.*"
    }
}

First time using Composer? Check out our Composer 101 post.

Simple example

Imagine we want to create a blog system where users can create their own blogs. Also, we would like to give users some flexibility by letting them to define if a given article is featured or not based on almost anything. It could be based on the number of visits that the article has received, the category, or even something weird as the current time. The expression that determines if a given article is featured or not in run time would be saved in the database too.

Doing this in a classic way would be cumbersome, we would need to define fixed rules and force users to choose between one of them… unless we use eval().

// get an article from the blog
$article = $blog->getArticle(1);

// check the values
var_dump($article->getVisits()); // int(15)
var_dump($article->getFeatured()); // bool(false)
var_dump($blog->getFeaturedExpression()); // string(26) "$article->getVisits() > 10"

// calculate whether it is a featured article
$article->setFeatured(eval('return ' . $blog->getFeaturedExpression() . ';'));

// featured changed to true
var_dump($article->getFeatured()); // bool(true)

// render the article
...

The number of visits of the post is 15 and the expression to make it featured is “$article->getVisits() > 10″, so when evaluated returns true. The problem of this approach is that we are using eval() and we all know that eval is evil as it allows execution of arbitrary PHP code. In this example, eval() works fine and adding a return statement we get the result of the comparison “15 > 10″, but that will not be always the case. Since we are letting users define their own expressions that will be executed by the PHP engine, a malicious user could configure his blog with something like “exec(‘rm -fr *’)”.

To quote Rasmus Lerdorf, “if eval() is the answer, you’re almost certainly asking the wrong question”.

The ExpressionLanguage component elegantly solves this issue. Since it has its own engine, no raw PHP is executed. Never. The only operations that will work are those defined and whitelisted. This is same example, but now using the ExpressionLanguage component:


use Symfony\Component\ExpressionLanguage\ExpressionLanguage;

// get an article from the blog
$article = $blog->getArticle(1);

// object values
var_dump($article->getVisits()); // int(15)
var_dump($article->getFeatured()); // bool(false)
var_dump($blog->getFeaturedExpression()); // string(26) "article.getVisits() > 10"

// calculate whether it is a featured article
$language = new ExpressionLanguage();
$article->setFeatured($language->evaluate($blog->getFeaturedExpression(), array(
    'article' => $article
)));

// featured changed to true
var_dump($article->getFeatured()); // bool(true)

// render the article
...

We created an instance of the ExpressionLanguage class to safely evaluate the expression “article.getVisits() > 10″. The evaluate() method evaluates the expression and optionally accepts an array of input parameters. The engine will only have access to the passed parameters, avoiding one of the problems of eval(), which has access to the current scope where is being executed. And it also solves the potential security problem with code execution, as the component does not execute PHP code, but a pseudo-language, which is limited and sandboxed.

Evaluate != compile

The ExpressionLanguage class provides two methods to deal with expressions: evaluate and compile.

The evaluate method evaluates the expression and returns its value. The return value can be a PHP variable of any type, even objects:

var_dump($language->evaluate('value**2', array('value' => 5)));
var_dump($language->evaluate('article.getVisits() > 10', array('article' => $article)));
var_dump(get_class($language->evaluate('article', array('article' => $article))));

The output would be:

int(25)
bool(true)
string(7) "Article"

Moreover, the compile method converts an expression into PHP code, so it can be cached and evaluated later.

var_dump($language->compile('value**2', array('value')));
var_dump($language->compile('article.getVisits() > 10', array('article')));
string(14) "pow($value, 2)"
string(28) "($article->getVisits() > 10)"

Syntax

The syntax is available in the official documentation. There you can find all the literals, operators and accessors available. Just a quick summary:

  • Literals: strings (e.g. ‘hello’ or “hello”), numbers (e.g. 10), arrays (e.g. [1, 2, 3]), hashes (e.g. { name: ‘Raul’ }), booleans (true/false) and null.
  • Operators: arithmetic (+, -, *, /, %, **), bitwise (&, |, ^), comparison (==, ===, !=, !===, <, >, <=, >=, matches – regex -, ?, ?: – ternary -), logical (not, !, and, &&, or, ||), string (~ – concatenation -), array (in, not in) and numeric (.. – ranges -).
  • Accessors: object public properties/methods (e.g. article.title, article.getTitle()) and arrays (e.g. articles[0]).

Functions

By default there is only one function available for using in our expressions: constant. This function wraps the PHP’s constant function, which returns the value of the given constant:

var_dump($language->evaluate('constant("PHP_INT_MAX")')); // int(9223372036854775807)

We can add our own functions to the engine quite easily:

$language = new ExpressionLanguage();

$language->register('sum_digits', function ($str) {
    return sprintf('$sum = (is_string(%1$s)) ? array_sum(str_split(%1$s)) : %1$s; return $sum;', $str);
}, function ($arguments, $str) {
    if (!is_string($str)) {
        return $str;
    }

    return array_sum(str_split($str));
});

Ok, it is a bit confusing why we are kind of “repeating” the function body… the register() method takes three arguments: the name of the function and two closures, one for compiling the function (converting it into PHP code) and another for evaluating. We defined the sum_digits function, which calculates the sum of the digits of a string, and works as expected:

// int(15)
var_dump($language->evaluate('sum_digits("12345")')); 

// string(83) "$sum = (is_string($values)) ? array_sum(str_split($values)) : $values; return $sum;"
var_dump($language->compile('sum_digits(values)', array('values'))); 

Using this idea, and since we are a hosting company, we could provide a way to configure actions based on the servers status:

server.memory_usage > 70 ? send_mail_warning("raul@servergrove.com")
server.memory_usage > 90 ? send_mail_critical("pablo@servergrove.com")
server.disk_usage > 85 ? upgrade(server)
server.php_version < repository.php_version ? upgrade_php(server)

Caching

Parsing expressions can be slow, so the component adds a cache layer to save parsed expressions (ParsedExpression). This way the same expressions are not parsed twice in the same request. This is achieved by the parser cache: ArrayParserCache, which caches parsed expressions in an array.

These parsed expressions can also be persisted to be used between requests. We can implement our own cache layer by implementing the ParserCacheInterface, which has the methods save() and fetch(). For example, to create a simple file cache:

namespace RaulFraile\ExpressionLanguage;

use Symfony\Component\ExpressionLanguage\ParsedExpression;
use Symfony\Component\ExpressionLanguage\ParserCache\ParserCacheInterface;

class FileParserCache implements ParserCacheInterface
{

    protected function getPath($key)
    {
        return sys_get_temp_dir() . '/' . sha1($key);
    }

    public function save($key, ParsedExpression $expression)
    {
        file_put_contents($this->getPath($key), serialize($expression));
    }

    public function fetch($key)
    {
        $path = $this->getPath($key);

        return is_readable($path) ? unserialize(file_get_contents($path)) : null;
    }

}

The save() method saves the serialized parsed expression in a file, while fetch() checks if the file exists and then reads its contents. As the key may not be suitable for file names, we use sha1() to create a hash, that will act as a filename.

To use this parser cache instead of the default one, we inject it when creating the engine object. Both evaluate() and compile() accept strings (what we are passing so far) or ParsedExpression instances:

use Symfony\Component\ExpressionLanguage\ExpressionLanguage;
use RaulFraile\ExpressionLanguage\FileParserCache;

$cache = new FileParserCache();
$language = new ExpressionLanguage($cache);

// parse checks if the expression is cached
$parsedExpression = $language->parse('article.getVisits() > 10', array('article'));

$value = $language->evaluate($parsedExpression, array(
    'article' => $article
));

Internals

Internally, the component is not too different from a usual compiler or interpreter. Let’s dive into it a bit…

Lexer/Tokenizer

A Lexer instance tokenizes an expression, converting a string into a TokenStream, which is basically an array of tokens. Each token has a type and a value.


use Symfony\Component\ExpressionLanguage\ExpressionLanguage;
use Symfony\Component\ExpressionLanguage\Lexer;

require_once __DIR__.'/vendor/autoload.php';

$lexer = new Lexer();
$tokenStream = $lexer->tokenize('1 + 2');

while (!$tokenStream->isEOF()) {
    var_dump($tokenStream->current);

    $tokenStream->next();
}

We get as output the three tokens (instances of Token), two for the numbers and one for the operator:

class Symfony\Component\ExpressionLanguage\Token#3 (3) {
  public $value => int(1)
  public $type => string(6) "number"
  public $cursor => int(1)
}
class Symfony\Component\ExpressionLanguage\Token#4 (3) {
  public $value => string(1) "+"
  public $type => string(8) "operator"
  public $cursor => int(3)
}
class Symfony\Component\ExpressionLanguage\Token#5 (3) {
  public $value => int(2)
  public $type => string(6) "number"
  public $cursor => int(5)
}

Parser

Once the engine has the list of tokens, they must be parsed to build a node tree. The component ships with an operator precedence parser (a bottom-up parser that interprets an operator-precedence grammar). The method used is the “precedence climbing”.

The basic idea behind the parser is that converts a sequence of tokens to a node tree, understanding how operators work and associate to each other (unary/binary, associativity and precedence). For example, the following operations are equivalent:

  • “1 + 2 * 3″ == “1 + (2 * 3)” (*precedence = 60, +precedence = 500)
  • “a or b and c” == “(a or b) and c” (orprecedence = 10, andprecedence = 15)
  • “a**3 + 1″ == “(a * a * a) + 1″ (**precedence = 200, **associativity= right, +precedence = 500)

Finally, this tree is used to evaluate the concrete expression:

use Symfony\Component\ExpressionLanguage\ExpressionLanguage;
use Symfony\Component\ExpressionLanguage\Lexer;
use Symfony\Component\ExpressionLanguage\Parser;

require_once __DIR__.'/vendor/autoload.php';

$expression = '1 + 2';

$lexer = new Lexer();
$tokenStream = $lexer->tokenize((string) $expression);

$parser = new Parser(array());
$nodes = $parser->parse($tokenStream);

var_dump($nodes->evaluate(array(), array())); // int(3)

compile() is faster than evaluate()

It may not be obvious, but actually, compile() is faster than evaluate(). Both methods need to tokenize and parse the expression, but compile() just returns the string containing the PHP code while evaluate() loops through the tree nodes to evaluate them on the fly.

Who’s using it?

The Symfony2 full-stack framework, in the version 2.4, uses expressions extensively in service definitions, access control rules, caching, routing and validation. But as the component is quite new, there are not many projects using it already. Here are a few:

More info

Photo: Camera Shy, by Hair-Flick

abril 07 / 2014

New TLDs

NewTLDs

Upcoming Conferences

CraftingTour

CraftingTour

PHP New Zealand