Tuesday, May 13, 2014

Doctrine2 optimization with APC cache and Symfony2

Some parts of a website are common to every pages and needs to request data from database. To put this data in a cache system is often a first step for optimization. Here is my return on experience about caching data with Doctrine2 in a Symfony2 project.


Before caching data, deal with metadata

Doctrine metadata store informations about entities structure, relations and so on. It's really safe to put in cache memory because it's nearly never changed. Have a look to the config in app/config/config_prod.yml :

# Doctrine Configuration
doctrine:
    dbal:
        ...
    orm:
        metadata_cache_driver: apc

(Still) Before caching data, deal with queries

Doctrine2 uses DQL, its own meta language to generate SQL queries. When you use a QueryBuilder or directly writes your queries in DQL, SQL query is generated each time data are requested. Next config allows Doctrine2 to put generated SQL in cache too. Have a look to the config in app/config/config_prod.yml :

# Doctrine Configuration
doctrine:
    dbal:
        ...
    orm:
        metadata_cache_driver: apc
        query_cache_driver: apc

Finally caching data !

Here it is, you will now put data from DB in cache. Compared to previous configurations, enabling result_cache_driver will not automatically put data in cache (Fortunately !). You then need to explicitely tell to Doctrine to put data in cache. You need to add a congig in app/config/config_prod.yml and then to call a method on a query builder :

# Doctrine Configuration
doctrine:
    dbal:
        ...
    orm:
        metadata_cache_driver: apc
        query_cache_driver: apc
        result_cache_driver: apc
namespace Vendor\MyBundle\Entity\Repository;

use Doctrine\ORM\EntityRepository;

/**
* MyEntityRepository
*
* This class was generated by the Doctrine ORM. Add your own custom
* repository methods below.
*/
class MyEntityRepository extends EntityRepository
{

    /**
     *    With a query builder
     */
    public function getSomeData()
    {
        $qb = $this->createQueryBuilder('alias');

        $qb
            ->where('alias.status = 1')
            ->orderBy('id', 'ASC')
        ;

        $query = $qb->getQuery();

        $query->useResultCache(true, 60, 'my_data_unique_cache_id');

        return $query->getResult();
    }

    /**
     *    With DQL
     */
    public function getAnyData()
    {
        $dql = 'SELECT alias FROM VendorMyBundle:MyEntity alias WHERE 1';

        $query = $this->getEntityManager()->createQuery($dql);

        $query->useResultCache(true, 60, 'my_data_unique_cache_id_2');

        return $query->getResult();
    }

}

The method AbstractQuery::useResultCache($bool, $lifetime = null, $resultCacheId = null) set whether or not to cache the results of this query and if so, for how long and which ID to use for the cache entry. I personnaly recommend to define a resultCacheId, it allows to identify cache entry in an APC manager (an apc.php file is often provided with php-apc package).

Why not caching raw SQL data

I just explained how tu put DB result in cache ... BUT you should understand why it's not really efficient. If you request data with *ToMany relations, the SQL JOIN will return many lines with duplicated data. Let's see the following DQL statement :

SELECT p, c FROM VendorBlogBundle:Post p INNER JOIN p.comments c WHERE p.id = 1

This query will return a Post object (or Array result) with a collection of comments. But raw SQL result will return as many lines as there are comments for Post #1, each line having Post #1 data and the comment data :

p.id   | p.title    | p.content    | p.created_at   | c.author   | c.content                | c.created_at
1      | Post title | Post content | 2014-01-13     | pece       | Thanks for feedback      | 2014-01-21
1      | Post title | Post content | 2014-01-13     | atrandafir | You saved me! Thanks! :) | 2014-01-20
1      | Post title | Post content | 2014-01-13     | Gilde      | Note that this will  ... | 2014-01-19
...

So raw data returned from SQL statement is OFTEN HEAVIER than hydrated result (object or array). That's why you should prefer to manually cache hydrated result rather than native Doctrine2 result cache. Let's see how to :

namespace Vendor\BlogBundle\Entity\Repository;

use Doctrine\ORM\EntityRepository;
use Doctrine\Common\Cache\ApcCache;

/**
*PostRepository
*
* This class was generated by the Doctrine ORM. Add your own custom
* repository methods below.
*/
class PostRepository extends EntityRepository
{

    /**
     *    With a query builder
     */
    public function getPosts()
    {
        $cacheDriver = new ApcCache();

        if ($cacheDriver->contains('_posts'))
        {
            return $cacheDriver->fetch('_posts');
        }

        $qb = $this->createQueryBuilder('p');
        $qb
            ->addSelect('c')
            ->innerJoin('p.comments', 'c')
            ->orderBy('p.createdAt', 'DESC')
        ;

        $posts = $qb->getQuery()->getResult();

        $cacheDriver->save('_posts', $posts, 60));

        return $posts;
    }

    /**
     *    With DQL
     */
    public function getAnyData()
    {
        $cacheDriver = new ApcCache();

        if ($cacheDriver->contains('_posts_2'))
        {
            return $cacheDriver->fetch('_posts_2');
        }

        $dql = 'SELECT p, c FROM VendorBlogBundle:Post p INNER JOIN p.comments c WHERE 1 ORDER BY p.createdAt DESC';

        $query = $this->getEntityManager()->createQuery($dql);

        $posts = $query->getResult();

        $cacheDriver->save('_posts2', $posts, 60));

        return $posts;
    }

}

Get some info about manual APC caching with Symfony2 in one of my previous posts

Full Symfony2 caching configuration for Doctrine2 is available at Official Symfony2 reference documentation - Caching drivers

3 comments :

Antoine Chauvin said...

How much is it faster after that ?

pece said...

It really depends on data used by your pages. Heavy queries are not processed, and if you cache hydrated data, hydration process is not done as long as your cache is alive. On an e-commerce site, with a dynamic menu depending on products available for sale, and displaying a list of products with applied discounts, I gained more than 80% of server-side load.

Diogo Barreiros said...

Thank, this helped a lot

Post a Comment

Comments are moderated before being published.