This is the 9th post in this series about the Symfony2 components. Today’s post covers one of the most popular PHP packages: the Finder component. According to Packagist it is the 14th most used PHP package, and the 3rd Symfony2 component, only after the EventDispatcher and Console ones.

The Finder component gives you an excellent lens to find files and directories

Installation

Installation is really easy using Composer:

{
    "require": {
        "symfony/finder": "2.4.*"
    }
}

If you have never used Composer before, check out our Composer 101 post.

The component

The Finder component helps us to find files and directories using a fluent interface. It provides useful methods to find what we are looking for, based on name, location, size, modification date as well as many other filters.

<?php

use Symfony\Component\Finder\Finder;

include_once __DIR__. '/vendor/autoload.php';

$finder = new Finder();
$finder->in(__DIR__)->files();

foreach ($finder as $file) {
    echo $file->getRealpath() . PHP_EOL;
}

In the above example, we create a Finder instance and configure it to find only files in the current directory, as we are using the files() method. Then, as the Finder instance is a PHP iterator (implements IteratorAggregate), we can use it in a foreach loop. For each of the files found we print its real path.

Instead of iterating over the Finder with the foreach loop, we can also use some of the functions that PHP provides to deal with iterators, such as iterator_to_array and iterator_count:

$filesNumber = iterator_count($finder);
$files = iterator_to_array($finder);

var_dump($filesNumber, $files);

Assuming we only have a PHP file in the current directory, we get as output:

int(1)
array(1) {
  '/Users/raulfraile/Sites/sgposts/finder/index.php' =>
  class Symfony\Component\Finder\SplFileInfo#13 (4) {
    private $relativePath =>
    string(0) ""
    private $relativePathname =>
    string(9) "index.php"
    private $pathName =>
    string(48) "/Users/raulfraile/Sites/sgposts/finder/index.php"
    private $fileName =>
    string(9) "index.php"
  }
}

Criteria

The official documentation for this component has a huge list of available options to find files and directories and it does not make sense to repeat all of them in this article. Here is an overview of a few important ones:

  • in($dirs): The location is the only mandatory criteria. Accepts one or more directories.
  • files() / directories(): Restricts the matching to files/directories only.
  • depth(): Restrict the depth of traversing. Accepts expressions such as ‘== 0′ or ‘<3′.
  • exclude($dirs): Exclude directories.
  • ignoreVCS($ignoreVCS): Ignores version control directories such as “.git”.
  • name($pattern): Find files that matches the pattern. Patterns can be globs, strings, or regular expressions.
  • contains($pattern): Find files with content that matches the pattern. Patterns can be strings or regular expressions.
  • filter($closure) / sort($closure): These methods accept a closure as input parameter, so we can customize filtering and sorting.

Under the hood

Adapters

The component relies on the Adapter pattern to provide different ways to find files/directories depending on the user’s machine. It ships with 3 adapters: PhpAdapter, GnuFindAdapter and BsdFindAdapter.

The PhpAdapter is always available and performs the search using PHP iterators, whilst GnuFindAdapter and BsdFindAdapter are only available in Unix/BSD operating systems. Their goal is to improve the performance by using shell commands like “find, “sort”, “cut” or “grep”. For example, these are the generated commands for some common criteria using the BsdFindAdapter (e.g. MacOS):

// find  -E '/dir' -noleaf -mindepth 1 -not \( -regex '.*(^|/)\..+(/|$).*' \)
$finder->in(__DIR__);

// find  -E '/dir' -noleaf -mindepth 1 -maxdepth 1 -type f -not \( -regex '.*(^|/)\..+(/|$).*' \)
$finder->in(__DIR__)->files()->depth(0);

// find  -E '/dir' -noleaf -mindepth 1
$finder->in(__DIR__)->ignoreDotFiles(false);

// find  -E '/dir' -noleaf -mindepth 1 -type f \( -name '*.php' \) -not \( -regex '.*(^|/)\..+(/|$).*' \)
$finder->in(__DIR__)->files()->name('*.php');

// find  -E '/dir' -noleaf -mindepth 1 -type f -not \( -regex '.*(^|/)\..+(/|$).*' \) | grep -v '^$' | xargs -I{} grep -I -l -Ee 'hello' {}
$finder->in(__DIR__)->files()->contains('hello');

All the adapters must implement the AdapterInterface interface, which defines the methods to set the criteria. Two of these methods are especially important so the Finder instance can best choose an adapter: isSupported() and getName(). If the adapters extend from the generic AbstractAdapter, it forces to implement the method canBeUsed() instead of isSupported() since the generic implementation provides a cache layer.

The IteratorAggregate interface

As we saw before, the Finder instance implements two PHP interfaces: IteratorAggregate and Countable.

The Countable interface is straightforward since it only defines the count() method. This method is called when we use the count() built-in function, but not with iterator_count().

The IteratorAggregate interface defines one method as well, getIterator(), which must return an object that implements the Traversable interface. Once we use the Finder instance in a foreach loop or as a parameter in iterator_count() or iterator_to_array() functions, this method is called and a new iterator is created.

It is important to note that when searching through multiple locations (passed through the in() method), an instance of AppendIterator is created to have a list of iterators (one for each directory), so iterates one after the other.

SplFileInfo

The component does not use the standard SplFileInfo class for the results, but a class extending it. This class adds support for relative paths and for getting the contents of the files directly with methods such as getRelativePath(), getRelativePathname() and getContents().

Globs

Globs are used in Unix-like environments to perform pattern matching based on wildcard characters. For example, “*.php” to find all the files having the “php” extension. The support for globs in PHP is limited (e.g. it does not work with remote files), but the component solves these limitations converting globs into regular expressions (a tradeoff between performance and usability). So, when using the PhpAdapter with globs, we are actually not using globs internally, but regular expressions.

For example, the following criteria:

$finder->in(__DIR__)->files()->name('*.php');

Generates this regular expression:

"^(?=[^\.])[^/]*\.php$"

Who’s using it

As I said before, it’s one of the most used PHP libraries according to Packagist, so it is not easy at all to list all the projects using it. Here is a short list of them:

More info

Photo: Investigating Nana & Grampas House, by Jae Malone