MongoDB (v. 2.2+) provides a mechanism to expire data from collections by setting a TTL -time to live- on indexes. This is a great feature if you have data that needs to persist in the database for a specific period of time. The TTL feature allows you to expire data using the MongoDB daemon instead of having to create cron jobs to delete expired data.

How it works

The MongoDB daemon (mongod) checks regularly for documents in collections with a TTL and removes the expired documents. To set the TTL we have to create a special index for a date BSON type. This means that if you don’t already have a date field in your collection, you’ll need to create that’s what’s used by mongod to determine if a collection has expired. Here is how to create the index:

To set a TTL of 5 minutes for the “logs” collection in the “servergrove” database, having a date field called “created“:

Using the mongo shell

db.servergrove.logs.ensureIndex( { "created": 1 }, { expireAfterSeconds: 300 } )

PHP with the mongo driver

$collection = $this->mongo->selectCollection('servergrove', 'logs');
    'created', array('expireAfterSeconds' => 300)

Doctrine ODM

use Doctrine\ODM\MongoDB\Mapping\Annotations as ODM;

 class Logs
     /** @ODM\Id */
     protected $id;
     @ODM\Index(name="logs_ttl", expireAfterSeconds=300)
     protected $created;

Limitations and constraints

There are some constraints documented in the official documentation. In my opinion, the most important ones are:

  • Precision
    The background task that is executed by mongod to remove the expired documents runs every minute. That means that we could get an expired document just because the process has not been executed yet. Obviously, the smaller the TTL is, the more likely we get an expired document.
  • Size
    In theory, TTL collections will probably have lots of insertions first and then deletions as documents expire. This may cause storage fragmentation, so to minimize it, TTL collections set the usePowerOf2Sizes flag. This means that MongoDB will allocate space in sizes that are powers of 2, allowing to reuse space much more effectively. As a downside, TTL collections disk usage will probably be higher.
  • Capped collections
    We cannot combine TTL with capped collections. Capped collections are like circular buffers, once a collection fills its allocated space, it frees space for new documents. We could be tempted to combine both features to remove old data based on date and disk space, but due to the capped collections behavior this is not possible. Capped collections guarantee preservation of the insertion order, which is also identical to the order on disk.

Photo by Julian Lim: “Stopwatch”