How GZIP Compression Works

Even in today’s world, with fast networks and almost unlimited storage, data compression is still relevant, especially for mobile devices and countries with poor Internet connections. This post covers the de-facto lossless compression method for compressing text data in websites: GZIP.

Morse code is one of the first lossless compression standards. More frequent letters get shorter codes

Morse code is one of the first lossless compression standards. More frequent letters get shorter codes

GZIP compression

GZIP provides a lossless compression, that is, we can recover the original data when decompressing it. It is based on the DEFLATE algorithm, which is a combination of LZ77 and Huffman coding.

The LZ77 algorithm replaces repeated occurrences of data with references. Each reference has two values: the jump and the length. Let’s see an example:

Original text: "ServerGrove, the PHP hosting company, provides hosting solutions for PHP projects" (81 bytes)
LZ77: "ServerGrove, the PHP hosting company, p<3,32>ides<9,26>solutions for<5,52><3,35>jects" (73 bytes, assuming that each reference is 3 bytes)

As you can see, the strings ” hosting ” and ” PHP ” are repeated, so the second time that the substring is found, is replaced by a reference. There are other repetitions too, like “er”, but as there would not be any gain, the original text is left.

Huffman coding is a variable-length coding method that assigns shorter codes to more frequent “characters”. The problem with variable-length codes is usually that we need a way to know when a code ends and the new one starts to decode it. Huffman coding solves this by creating a prefix code, where no codeword is a prefix of another one. It can be understood more easily by an example:

Original text: "ServerGrove"
ASCII codification: "01010011 01100101 01110010 01110110 01100101 01110010 01000111 01110010 01101111 01110110 01100101" (88 bits)

ASCII is a fixed-length character-encoding scheme, so the letter “e”, which appears three times and is also the most frequent letter in the English language, has the same size as the letter “G”, which only appears once. Using this statistical information, Huffman can create a most optimized scheme:

Huffman: "1110 00 01 10 00 01 1111 01 110 10 00" (27 bits)

The Huffman method allows us to get shorter codes for “e”, “r” and “v”, while “S” and “G” got the longer ones. Explaining how to use the Huffman method is out of the scope of this post, but if you are interested I recommend you to check this great video from Computerphile.

DEFLATE, which is the algorithm used for GZIP compression, is a combination of both these algorithms.

Is GZIP the best compression method?

The answer is NO. There are other compression methods that get higher compression ratios, but there are a few good reasons to use it.

First, even though GZIP is not the best compression method, it provides a good tradeoff between speed and ratio. Compressing and decompressing data with GZIP is fast, and the ratio is quite decent.

Second, it is not easy to add a new global compression method that everyone can use. Browsers would need to be updated, which today is much simpler using self-update mechanisms. However, browsers are not the only problem, Chromium tried to add support for BZIP2, a better compression method based on the Burrows–Wheeler transform, but had to cancel it as some old intermediate proxies corrupted the data as they were not able to understand the bzip2 header and tried to gzip the contents. You can see the bug report here.

GZIP + HTTP

The process between the client (browser) and the server to get the content gzipped is simple. If the browser has support for GZIP/DEFLATE, it lets the server know by the “Accept-Encoding” request header. Then, the server can choose whether sending the contents gzipped or raw.

Screen Shot 2014-04-07 at 09.44.05

Implementations

The DEFLATE specification provides some freedom to developers to implement the algorithm using different approaches, as long as the resulting stream is compatible with the specification.

GNU GZIP

The GNU implementation is the most common and was designed to be a replacement for the compress utility, free from patented algorithms. To compress a file using the GNU GZIP utility:

$ gzip -c file.txt > file.txt.gz

There are 9 levels of compression, being “1″ the fastest with the smallest compression ratio and “9″ the slowest with better compression ratio. By default, “6″ is used. If we want maximum compression at the cost of using more memory and time in the process, the -9 flag (or –best) can be used:

$ gzip -9 -c file.txt > file.txt.gz

7-zip

7-zip implements the DEFLATE algorithm differently and usually achieves higher compression ratios. To compress a file with the maximum compression:

7z a -mx9 file.txt.gz file.txt

7-zip is also available in Windows and provides implementations for other compression methods such as 7z, xz, bzip2, zip and others.

Zopfli

Zopfli is ideal for one-time compression, for example, in build processes when the file is compressed once and served many. It is ~100x slower, but compresses around 5% better than other compressors.

Enabling GZIP compression

Apache

The mod_deflate module provides support for GZIP compression, so the response is compressed on the fly before being sent to the client over the network.

To enable it for text files, add this in your .htaccess file:

AddOutputFilterByType DEFLATE text/plain
AddOutputFilterByType DEFLATE text/html
AddOutputFilterByType DEFLATE text/xml
AddOutputFilterByType DEFLATE text/css
AddOutputFilterByType DEFLATE application/xml
AddOutputFilterByType DEFLATE application/xhtml+xml
AddOutputFilterByType DEFLATE application/rss+xml
AddOutputFilterByType DEFLATE application/javascript
AddOutputFilterByType DEFLATE application/x-javascript

There are some known bugs with older versions of some browsers, so it is also recommended to add these lines in case you support them:

BrowserMatch ^Mozilla/4 gzip-only-text/html
BrowserMatch ^Mozilla/4\.0[678] no-gzip
BrowserMatch \bMSIE !no-gzip !gzip-only-text/html
Header append Vary User-Agent

It is also possible to serve pre-gzipped files instead of doing every time on the fly. This is especially useful for files that don’t change in every request such as JavaScript or CSS files, which can be compressed using a slow algorithm and then served directly. In your .htaccess, include this:

RewriteEngine On
AddEncoding gzip .gz
RewriteCond %{HTTP:Accept-encoding} gzip
RewriteCond %{REQUEST_FILENAME}.gz -f
RewriteRule ^(.*)$ $1.gz [QSA,L]

What we are doing here is telling Apache that files with .gz extensions should be served with the gzip encoding-type (line 2), checking that the browser accepts gzip (line 3) and if the gzipped file exists (line 4), we append .gz to the requested filename.

nginx

The ngx_http_gzip_module module allows compressing files with GZIP on the fly, while the ngx_http_gzip_static_module one allows sending precompressed files with the “.gz” filename extension instead of regular files.

An example configuration looks like this:

gzip            on;
gzip_min_length 1000;
gzip_types      text/plain application/xml;

GZIP + PHP

While it is not usually recommended to compress data using PHP as it will be slower, it is possible to do it from PHP using the zlib module.

For example, to compress the jQuery minified library using the maximum compression:

<?php

$originalFile = __DIR__ . '/jquery-1.11.0.min.js';
$gzipFile = __DIR__ . '/jquery-1.11.0.min.js.gz';

$originalData = file_get_contents($originalFile);

$gzipData = gzencode($originalData, 9);
file_put_contents($gzipFile, $gzipData);

var_dump(filesize($originalFile)); // int(96380)
var_dump(filesize($gzipFile)); // int(33305)

Data can be decompressed with gzdecode(). The zlib module also defines a few stream wrappers to compress data.

More information

Photo: Morse Code Straight Key J-38, by Anthony Catalano

abril 14 / 2014
Author Raul Fraile
Category PHP, Shared Hosting, VPS
Comments 3 Comments

Upcoming Conferences

WeCamp

PHP New Zealand

PHP Summer Camp Croatia

PHPNE

MadisonPHP

brnoPHP

SymfonyLiveLondon

ZgPHP

PHPSouthAfrica

PHPNWUK

SymfonyLiveNYC

PHPForumParis

PHPARG

PHPWorld

TechMeetupUY

SymfonyConMadrid

Important OpenSSL security update

A serious security vulnerability has been found in certain versions of the OpenSSL library which is used by servers and other software to security and encrypt network connections.

Affected versions of OpenSSL are 1.0.1 prior to 1.0.1g and 1.0.2-beta. You can read more about the report at the MITRE site.

Please note that some distributions, like CentOS, apply security patches while maintaining version numbers, so the latest openssl.x86_64 0:1.0.1e-16.el6_5.7 will include the fix.

If you want to check your SSL installation, we recommend that you use SSL Labs’s server test tool.

OpenSSL on VPS

If you are running a VPS with CentOS 6.x, Ubuntu 12.04, or Debian 7, we highly recommend that you upgrade OpenSSL immediately. If you have a previous OS version you are not vulnerable, unless you upgraded OpenSSL manually, to be sure check your OpenSSL version. Here’s how to do it:

$ openssl version -a

Updating OpenSSL can be done with the following command:

# CentOS
$ yum upgrade

# Ubuntu / Debian
$ apt-get update && apt-get upgrade

Please note that in order to use the updated library, you will need to restart Apache or other servers that use SSL.

UPDATE: Due to the severity of the issue we have upgraded all VPS with OpenSSL installations that we found to be vulnerable.

OpenSSL on Shared Hosting

On our shared hosting servers we run a previous version of OpenSSL which is not vulnerable so no update is required.

What steps should you take to secure your applications?

If your system was affected, it is advisable to take steps to secure your application. Even though there is no way to know if your system was compromised, the safest option is to act as if it were.

1) If you are using SSL certificates for your sites there is a risk that your certificates have been compromised. So we recommend that you ask your certificate provider to re-issue your certificates and then replace your certificates with the new ones.
2) Change any passwords or other credentials that were encrypted by your old SSL certificates.
3) If your application has user accounts, we recommend you change the passwords on all user accounts
4) If you’re using phpMyAdmin or phpPgAdmin on our servers you should change these passwords.
5) You may want to invalidate all current sessions after requesting your users change their passwords to rule out any potential session hijacking.

You can find more information about the heartbleed bug at http://heartbleed.com

Instructions to update your SSL certificate can be found here.

If you have questions, please open a support ticket.

abril 08 / 2014
Author Pablo
Category ServerGrove, VPS
Comments No Comments

Upcoming Conferences

WeCamp

PHP New Zealand

PHP Summer Camp Croatia

PHPNE

MadisonPHP

brnoPHP

SymfonyLiveLondon

ZgPHP

PHPSouthAfrica

PHPNWUK

SymfonyLiveNYC

PHPForumParis

PHPARG

PHPWorld

TechMeetupUY

SymfonyConMadrid

Symfony2 components overview: ExpressionLanguage

This is the 10th post in our series on Symfony2 components and we will cover the latest component added to Symfony: the ExpressionLanguage component. This component was added in version 2.4 and  provides a way to have dynamic aspects in static configurations. For example, it can be used to evaluate expressions in configuration files, create a DSL ,or build a business rules engine.

The ExpressionLanguage component adds a bit of "color" to your configuration data

The ExpressionLanguage component adds a bit of “color” to static data

Installation

Like the other components, you can install using Composer:

{
    "require": {
        "symfony/expression-language": "2.4.*"
    }
}

First time using Composer? Check out our Composer 101 post.

Simple example

Imagine we want to create a blog system where users can create their own blogs. Also, we would like to give users some flexibility by letting them to define if a given article is featured or not based on almost anything. It could be based on the number of visits that the article has received, the category, or even something weird as the current time. The expression that determines if a given article is featured or not in run time would be saved in the database too.

Doing this in a classic way would be cumbersome, we would need to define fixed rules and force users to choose between one of them… unless we use eval().

// get an article from the blog
$article = $blog->getArticle(1);

// check the values
var_dump($article->getVisits()); // int(15)
var_dump($article->getFeatured()); // bool(false)
var_dump($blog->getFeaturedExpression()); // string(26) "$article->getVisits() > 10"

// calculate whether it is a featured article
$article->setFeatured(eval('return ' . $blog->getFeaturedExpression() . ';'));

// featured changed to true
var_dump($article->getFeatured()); // bool(true)

// render the article
...

The number of visits of the post is 15 and the expression to make it featured is “$article->getVisits() > 10″, so when evaluated returns true. The problem of this approach is that we are using eval() and we all know that eval is evil as it allows execution of arbitrary PHP code. In this example, eval() works fine and adding a return statement we get the result of the comparison “15 > 10″, but that will not be always the case. Since we are letting users define their own expressions that will be executed by the PHP engine, a malicious user could configure his blog with something like “exec(‘rm -fr *’)”.

To quote Rasmus Lerdorf, “if eval() is the answer, you’re almost certainly asking the wrong question”.

The ExpressionLanguage component elegantly solves this issue. Since it has its own engine, no raw PHP is executed. Never. The only operations that will work are those defined and whitelisted. This is same example, but now using the ExpressionLanguage component:


use Symfony\Component\ExpressionLanguage\ExpressionLanguage;

// get an article from the blog
$article = $blog->getArticle(1);

// object values
var_dump($article->getVisits()); // int(15)
var_dump($article->getFeatured()); // bool(false)
var_dump($blog->getFeaturedExpression()); // string(26) "article.getVisits() > 10"

// calculate whether it is a featured article
$language = new ExpressionLanguage();
$article->setFeatured($language->evaluate($blog->getFeaturedExpression(), array(
    'article' => $article
)));

// featured changed to true
var_dump($article->getFeatured()); // bool(true)

// render the article
...

We created an instance of the ExpressionLanguage class to safely evaluate the expression “article.getVisits() > 10″. The evaluate() method evaluates the expression and optionally accepts an array of input parameters. The engine will only have access to the passed parameters, avoiding one of the problems of eval(), which has access to the current scope where is being executed. And it also solves the potential security problem with code execution, as the component does not execute PHP code, but a pseudo-language, which is limited and sandboxed.

Evaluate != compile

The ExpressionLanguage class provides two methods to deal with expressions: evaluate and compile.

The evaluate method evaluates the expression and returns its value. The return value can be a PHP variable of any type, even objects:

var_dump($language->evaluate('value**2', array('value' => 5)));
var_dump($language->evaluate('article.getVisits() > 10', array('article' => $article)));
var_dump(get_class($language->evaluate('article', array('article' => $article))));

The output would be:

int(25)
bool(true)
string(7) "Article"

Moreover, the compile method converts an expression into PHP code, so it can be cached and evaluated later.

var_dump($language->compile('value**2', array('value')));
var_dump($language->compile('article.getVisits() > 10', array('article')));
string(14) "pow($value, 2)"
string(28) "($article->getVisits() > 10)"

Syntax

The syntax is available in the official documentation. There you can find all the literals, operators and accessors available. Just a quick summary:

  • Literals: strings (e.g. ‘hello’ or “hello”), numbers (e.g. 10), arrays (e.g. [1, 2, 3]), hashes (e.g. { name: ‘Raul’ }), booleans (true/false) and null.
  • Operators: arithmetic (+, -, *, /, %, **), bitwise (&, |, ^), comparison (==, ===, !=, !===, <, >, <=, >=, matches – regex -, ?, ?: – ternary -), logical (not, !, and, &&, or, ||), string (~ – concatenation -), array (in, not in) and numeric (.. – ranges -).
  • Accessors: object public properties/methods (e.g. article.title, article.getTitle()) and arrays (e.g. articles[0]).

Functions

By default there is only one function available for using in our expressions: constant. This function wraps the PHP’s constant function, which returns the value of the given constant:

var_dump($language->evaluate('constant("PHP_INT_MAX")')); // int(9223372036854775807)

We can add our own functions to the engine quite easily:

$language = new ExpressionLanguage();

$language->register('sum_digits', function ($str) {
    return sprintf('$sum = (is_string(%1$s)) ? array_sum(str_split(%1$s)) : %1$s; return $sum;', $str);
}, function ($arguments, $str) {
    if (!is_string($str)) {
        return $str;
    }

    return array_sum(str_split($str));
});

Ok, it is a bit confusing why we are kind of “repeating” the function body… the register() method takes three arguments: the name of the function and two closures, one for compiling the function (converting it into PHP code) and another for evaluating. We defined the sum_digits function, which calculates the sum of the digits of a string, and works as expected:

// int(15)
var_dump($language->evaluate('sum_digits("12345")')); 

// string(83) "$sum = (is_string($values)) ? array_sum(str_split($values)) : $values; return $sum;"
var_dump($language->compile('sum_digits(values)', array('values'))); 

Using this idea, and since we are a hosting company, we could provide a way to configure actions based on the servers status:

server.memory_usage > 70 ? send_mail_warning("raul@servergrove.com")
server.memory_usage > 90 ? send_mail_critical("pablo@servergrove.com")
server.disk_usage > 85 ? upgrade(server)
server.php_version < repository.php_version ? upgrade_php(server)

Caching

Parsing expressions can be slow, so the component adds a cache layer to save parsed expressions (ParsedExpression). This way the same expressions are not parsed twice in the same request. This is achieved by the parser cache: ArrayParserCache, which caches parsed expressions in an array.

These parsed expressions can also be persisted to be used between requests. We can implement our own cache layer by implementing the ParserCacheInterface, which has the methods save() and fetch(). For example, to create a simple file cache:

namespace RaulFraile\ExpressionLanguage;

use Symfony\Component\ExpressionLanguage\ParsedExpression;
use Symfony\Component\ExpressionLanguage\ParserCache\ParserCacheInterface;

class FileParserCache implements ParserCacheInterface
{

    protected function getPath($key)
    {
        return sys_get_temp_dir() . '/' . sha1($key);
    }

    public function save($key, ParsedExpression $expression)
    {
        file_put_contents($this->getPath($key), serialize($expression));
    }

    public function fetch($key)
    {
        $path = $this->getPath($key);

        return is_readable($path) ? unserialize(file_get_contents($path)) : null;
    }

}

The save() method saves the serialized parsed expression in a file, while fetch() checks if the file exists and then reads its contents. As the key may not be suitable for file names, we use sha1() to create a hash, that will act as a filename.

To use this parser cache instead of the default one, we inject it when creating the engine object. Both evaluate() and compile() accept strings (what we are passing so far) or ParsedExpression instances:

use Symfony\Component\ExpressionLanguage\ExpressionLanguage;
use RaulFraile\ExpressionLanguage\FileParserCache;

$cache = new FileParserCache();
$language = new ExpressionLanguage($cache);

// parse checks if the expression is cached
$parsedExpression = $language->parse('article.getVisits() > 10', array('article'));

$value = $language->evaluate($parsedExpression, array(
    'article' => $article
));

Internals

Internally, the component is not too different from a usual compiler or interpreter. Let’s dive into it a bit…

Lexer/Tokenizer

A Lexer instance tokenizes an expression, converting a string into a TokenStream, which is basically an array of tokens. Each token has a type and a value.


use Symfony\Component\ExpressionLanguage\ExpressionLanguage;
use Symfony\Component\ExpressionLanguage\Lexer;

require_once __DIR__.'/vendor/autoload.php';

$lexer = new Lexer();
$tokenStream = $lexer->tokenize('1 + 2');

while (!$tokenStream->isEOF()) {
    var_dump($tokenStream->current);

    $tokenStream->next();
}

We get as output the three tokens (instances of Token), two for the numbers and one for the operator:

class Symfony\Component\ExpressionLanguage\Token#3 (3) {
  public $value => int(1)
  public $type => string(6) "number"
  public $cursor => int(1)
}
class Symfony\Component\ExpressionLanguage\Token#4 (3) {
  public $value => string(1) "+"
  public $type => string(8) "operator"
  public $cursor => int(3)
}
class Symfony\Component\ExpressionLanguage\Token#5 (3) {
  public $value => int(2)
  public $type => string(6) "number"
  public $cursor => int(5)
}

Parser

Once the engine has the list of tokens, they must be parsed to build a node tree. The component ships with an operator precedence parser (a bottom-up parser that interprets an operator-precedence grammar). The method used is the “precedence climbing”.

The basic idea behind the parser is that converts a sequence of tokens to a node tree, understanding how operators work and associate to each other (unary/binary, associativity and precedence). For example, the following operations are equivalent:

  • “1 + 2 * 3″ == “1 + (2 * 3)” (*precedence = 60, +precedence = 500)
  • “a or b and c” == “(a or b) and c” (orprecedence = 10, andprecedence = 15)
  • “a**3 + 1″ == “(a * a * a) + 1″ (**precedence = 200, **associativity= right, +precedence = 500)

Finally, this tree is used to evaluate the concrete expression:

use Symfony\Component\ExpressionLanguage\ExpressionLanguage;
use Symfony\Component\ExpressionLanguage\Lexer;
use Symfony\Component\ExpressionLanguage\Parser;

require_once __DIR__.'/vendor/autoload.php';

$expression = '1 + 2';

$lexer = new Lexer();
$tokenStream = $lexer->tokenize((string) $expression);

$parser = new Parser(array());
$nodes = $parser->parse($tokenStream);

var_dump($nodes->evaluate(array(), array())); // int(3)

compile() is faster than evaluate()

It may not be obvious, but actually, compile() is faster than evaluate(). Both methods need to tokenize and parse the expression, but compile() just returns the string containing the PHP code while evaluate() loops through the tree nodes to evaluate them on the fly.

Who’s using it?

The Symfony2 full-stack framework, in the version 2.4, uses expressions extensively in service definitions, access control rules, caching, routing and validation. But as the component is quite new, there are not many projects using it already. Here are a few:

More info

Photo: Camera Shy, by Hair-Flick

abril 07 / 2014

Upcoming Conferences

WeCamp

PHP New Zealand

PHP Summer Camp Croatia

PHPNE

MadisonPHP

brnoPHP

SymfonyLiveLondon

ZgPHP

PHPSouthAfrica

PHPNWUK

SymfonyLiveNYC

PHPForumParis

PHPARG

PHPWorld

TechMeetupUY

SymfonyConMadrid

HHVM & Hack available at ServerGrove

hhvm It is well known that Facebook is currently the largest site built using PHP, but not everyone knows that they have created their own new language called Hack. It’s based on PHP and contains several new features and improvements. You can start learning about Hack with their online tutorial.

To run Hack they created a new runtime platform, HHVM. HHVM can provide big performance gains, which allowed Facebook to run their site in less servers, saving lots of money. Both HHVM and Hack have been released to the public so everyone outside of Facebook can use them.

HHVM is not fully compatible with PHP, yet. The HHVM development team is working hard to increase compatibility and they already support most of the major PHP frameworks. Having said this, there is a lot of custom PHP code that may not run in HHVM, so before you go ahead and install HHVM in your production servers, test your applications and sites really well.

As we said, Hack is very similar to PHP, a simple script looks like this:

<?hh
echo "Hi, I'm a Hack script!";

There are other bigger differences, and lots of new features that are worth checking.

To run this script, you will use the hhvm command:

$ hhvm script.php

In fact, you can use composer with hhvm, which will provide a big boost in performance:

$ hhvm composer.phar update

HHVM also includes a web server and a FastCGI server, giving you the option to run a standalone HHVM server or configure your Apache or NGINX using FastCGI.

Installing HHVM & Hack on ServerGrove

We have created customized packages for our customers so they can install HHVM in their VPS following some simple steps. Currently HHVM is available for Centos 6.x and Ubuntu 12.04 VPSes. We are preparing packages for additional distributions.

For Centos 6.x:

$ yum install hhvm

For Ubuntu 12.04:

$ apt-get update && apt-get install hhvm

Once installed, you can find HHVM configuration files in /etc/hhvm. The log files will be located in /var/log/hhvm. All the HHVM commands are located in /opt/hhvm/bin which is added to your PATH configuration.

Configuring HHVM

The /etc/hhvm/server.hdf file will contain most configuration options pertaining to the web and fastcgi servers.

Once configured, you can start the HHVM server by running:

$ /etc/init.d/hhvm start

If you need more control, you can run:

# spawn daemon on the background
$ /opt/hhvm/bin/hhvm --config /etc/hhvm/server.hdf --user www-data --mode daemon
# keep server in foreground
$ /opt/hhvm/bin/hhvm --config /etc/hhvm/server.hdf --user www-data --mode server

HHVM & FastCGI

If you want to use the fastcgi mode, you will need to set this configuration in server.hdf

Server {
  Type = fastcgi
  FileSocket = /var/run/hhvm/sock
  Port = 9000
}

This will make the server available in port 9000 or /var/run/hhvm/sock. Then you need to configure Apache with mod_fastcgi support. Install and enable mod_fastcgi and add this to the Apache configuration:

 FastCgiExternalServer /var/www/html -host 127.0.0.1:9000

Our installation also includes hh_single_type_check, hh_client and hh_server which are used to run the typecheker.

Using our packages outside of ServerGrove

If you want to use our packages in your servers or VMs outside of ServerGrove, you are more than welcome. Follow the instructions on setting up our repository in your server and then install hhvm as described above:
- Centos 6.x
- Ubuntu 12.04

Conclusion

HHVM and Hack bring a new perspective into PHP and provide big performance improvements. We will keep updating our repository with new versions as they come out. At the time of publishing this post, you will install HHVM 3.0.1. And don’t forget to share with us your experience with HHVM & Hack!

abril 03 / 2014
Author Pablo
Category PHP, VPS
Comments 1 Comment
Tags ,

Upcoming Conferences

WeCamp

PHP New Zealand

PHP Summer Camp Croatia

PHPNE

MadisonPHP

brnoPHP

SymfonyLiveLondon

ZgPHP

PHPSouthAfrica

PHPNWUK

SymfonyLiveNYC

PHPForumParis

PHPARG

PHPWorld

TechMeetupUY

SymfonyConMadrid

Deployment of Symfony2 applications with Ansible

iStock_000017954524Small

Ansible is a powerful automation engine that simplifies deploying systems and apps. Its popularity has been rising rapidly as developers and system administrators look for simpler ways to manage servers and deploy applications.

The selling points of Ansible are:

  • simplicity: the configuration is done through INI and YAML files
  • agentless: there is no agent to install, making it dead easy to use on virtual servers and shared hosting
  • extensible: thanks to roles and modules, it is very easy to extend its functionality and reuse of configuration blocks

If you follow our blog you will know that here at ServerGrove we have been promoting the use of Capifony, an extension for Capistrano that is targeted to the deployment of Symfony2 applications. However Capifony & Capistrano are based on Ruby, and every time we run into an issue, we need to dig to find out where the problem lies. In addition, with every major upgrade of Capistrano, something breaks, especially in our case where we have a fairly complex deployment process. So, we decided to give Ansible a try.

Installing and setting up Ansible

Installing Ansible is simple. The only requirement is to have Python installed in the computer where you will be launching the Ansible process. You can follow the manual instructions which recommends several ways to get Ansible running.

Once it is installed, Ansible will connect over SSH to the remote servers that it controls. The servers that you want to control need to be listed in a simple text file following the INI format, this is called the inventory:

192.168.1.50
aserver.example.org
bserver.example.org

You can group servers in sets so you can define a list of servers for testing, qa, production, etc.

[testing]
test1.example.com

[prod]
www1.example.com
www2.example.com
www[10:50].example.com

There are more complex groupings you can do, so it covers pretty much any need you can think of.

Now, you can execute simple commands with Ansible:

# send a ping command to all servers
$ ansible all -m ping
# print hostname and logged in users
$ ansible all -a "hostname -f && w"

You can also copy files:

$ ansible prod -m copy -a "src=/etc/hosts dest=/tmp/hosts"

You can manage users, OS packages (rpm, deb, etc), services (ie. stop/start/restart apache) and much more thanks to modules.

The real power comes when you start defining these commands as tasks:

---
- hosts: webservers
  vars:
    http_port: 80
    max_clients: 200
  remote_user: root
  tasks:
  - name: ensure apache is at the latest version
    yum: pkg=httpd state=latest
  - name: write the apache config file
    template: src=/srv/httpd.j2 dest=/etc/httpd.conf
    notify:
    - restart apache
  - name: ensure apache is running
    service: name=httpd state=started
  handlers:
    - name: restart apache
      service: name=httpd state=restarted

The above code tells Ansible that these tasks apply to all hosts in the group “webservers”, then we define the variables, and then create a list of tasks that will be run in sequence. If any of these tasks fail, the process will stop.

All of these can be store in files called playbooks, and these playbooks can depend on other playbooks, including external files, etc, making it really extensible and easy.

Furthermore, you can combine these playbooks into “roles” (similar to Symfony2′s bundles), so you can reuse these across projects. These roles can be shared by uploading them to Ansible Galaxy, a site where people share roles for common needs.

There is a lot more that Ansible can do, but for this article we want to focus on Symfony2, for more information we recommend that you read the excellent Ansible documentation.

Deploying Symfony2

kittens

As you may know, successfully deploying a Symfony2 application involves several steps, the most common ones are:

  • define a directory structure in the server which will allow to setup the Symfony2 application before making it live and keep multiple releases, so you can rollback to a previous release if something fails.
  • clone your git repository
  • run composer install to get all your dependencies
  • warm up the cache / dump Assetic assets
  • update the web server document root to point the site to the new release
If you used Capifony before, these are more or less the steps that are done every time you run cap deploy.

In Ansible this is easily achievable by defining a set of tasks in a playbook and then running them and you can put all of these in a role, so you can reuse it in all your Symfony2 projects.

Introducing the Ansible Symfony2 Deployment Role

We have created a role that you can start using now to easily deploy your Symfony2 apps. It is available in the Ansible Galaxy and in Github so anyone can contribute enhancements and fixes.

To use the role in your project, once you installed Ansible, run the following command to make the role available in your computer:

$ ansible-galaxy install servergrove.symfony2

The next step is to define the inventory (the list of servers where you will deploy your application). Create a file app/config/hosts.ini

[prod]
server.example.com

Next, you need to create a YAML file that will contain the variables and basic instructions for Ansible, name it app/config/deploy.yml

---
- hosts: prod
  vars:
    symfony2_project_name: your_project
    symfony2_project_root: /var/www/vhosts/example.com
    symfony2_project_repo: git@github.com:username/your_repo.git
    ansible_ssh_user: your_ssh_username

  roles:
    - servergrove.symfony2

  tasks:
    - local_action: osx_say msg="Deployment complete." voice=Zarvox

We defined a configuration for the prod group of servers. We define the variables needed for the role to deploy our project. It will run the role and when it finishes it will run the osx_say task locally that will inform us that the process has completed (this only works in Mac OSX, but you can replace it with an action that will send an email or some other form of notification).

Once you have these, you are ready to deploy your application with Ansible, do it with the following command:

$ ansible-playbook -l prod -i app/config/hosts.ini -e "symfony2_project_release=1" app/config/deploy.yml -vvv
  • the -l option defines which list of servers we want to run this for
  • the -i option defines the inventory file that contains the list of servers
  • the -e option allows us to define variables that will be used in the playbook. In this case we tell Ansible that we will deploy a release named “1″. You can replace this with `date +%Y%m%d%H%M%S` to name our releases with the current date/time.
  • Then we define the playbook we will use and the -vvv option allows us to see what Ansible is doing.

The servergrove.symfony2 role will perform the following tasks:

  • creates the directory structure to host your app, following a similar structure from Capifony:
  • clones the git repository
  • if composer is not installed or is outdated, it will download the phar in the project directory
  • runs composer install –no-dev –optimize-autoload
  • runs app/console assetic:dump
  • creates/updates current symlink to point to new release

This is the first version of this role, we hope to improve the functionality with new features, if you have any suggestions, please open an issue or send your pull request.

As we said before, Ansible does not need anything special in the server, it only needs to be able to connect to the server using SSH. This means that it can be used with any of our VPS plans and in all Developer and Business shared hosting plans, making the deployment of Symfony2 apps really easy and predictable.

abril 01 / 2014
Author Pablo
Category PHP, Symfony, Tutorials
Comments 7 Comments

Upcoming Conferences

WeCamp

PHP New Zealand

PHP Summer Camp Croatia

PHPNE

MadisonPHP

brnoPHP

SymfonyLiveLondon

ZgPHP

PHPSouthAfrica

PHPNWUK

SymfonyLiveNYC

PHPForumParis

PHPARG

PHPWorld

TechMeetupUY

SymfonyConMadrid