Zend Framework – Zend_Cache

zend-cache

In this article I will introduce you to the Zend_Cache component of the Zend Framework. This component is used for improving the performance of your web application by saving generated data for later reuse.

Caching is typically used when the cost of generating the data is high, and the data doesn’t change often enough to justify generating it from scratch upon every request.

While the idea behind caching is straightforward, there are certain considerations and complexities to be aware of. You need to consider the type of data you want to cache and the situations in which to use it.

In this article we will look at caching the following types of data:

  • Arbitrary data structures (such as data returned from a remote web feed or web service)
  • The entire output of a HTML page

Additionally, we will look at managing your cached data using tags. Tags allow you to clear relevant records without having to clear your entire cache.

Note: A typical application that makes use of caching will likely have more than one cache. Don’t necessarily restrict your thinking to using only a single cache.

This article requires Zend Framework, downloadable from http://framework.zend.com. At time of writing, the current version of Zend Framework is 1.10.4.


There are three fundamental concepts when implementing a caching solution:

  1. Adding data to the cache
  2. Serving data from the cache
  3. Clearing data from the cache

When you request data from a cache and it is found, this is called a “cache hit”. If the data is not found this is a “cache miss”. When using a cache there are two basic objectives:

  1. Minimize cache misses (the data you want should nearly always be in cache already if possible)
  2. Never serve stale data (if the caller gets incorrect or outdated data then so what if it came back quickly?)

Adding Data To The Cache

Typically adding data to the cache and serving from the cache are combined into a single step. The algorithm for this is as follows:

  1. Caller requests some data from the cache
  2. Data in cache? Return it to the caller
  3. Not found in cache? Generate the data, save it to the cache, return it to the caller.

While this is typically straightforward to implement, you must be aware that the first caller will suffer the penalty of generating the data. To counter this, you can “prime” the cache ahead of time. This is typically implemented by one or more scripts that are automatically run (such as in a cron job). These scripts are stubs that generate all of the data and save it into the cache so it will be available upon request.

Later in this article we’ll also looked at tagging cache records. You can specifiy any number of tags when saving a record to the cache which will help you identify the record later.

Also worth mentioning is that cache records have a lifetime associated with them. This is the amount of time that the data can be trusted before we assume it’s no longer valid. You can specify a default lifetime when creating a cache, and you can also override this value on a per-record basis. Zend_Cache will automatically manage the lifetime and clearing of expired cache records.

Serving Data From The Cache

In order to serve data from the cache, you must determine which data is being requested. When using Zend_Cache, every unique resource has its own cache identifier.

Once the identifier is known you request the corresponding data from the cache. To create an identifer you use the input data that helps determine the data being generated.

The following listing demonstrates this at the most basic level. It assumes you have one set of navigation for logged-in users and a different navigation for users that aren’t logged in.

Typically the cache ID will be dynamically generated using other data, such as GET/POST data or function arguments. The following listing shows an example of how you might generate a cache ID based on arguments passed to a function.

Tip: You can only use a-z, A-Z, 0-9 and underscores in your cache ID. If you’re using complex data to generate the cache ID, simply use md5() to generate a compatible ID.

Clearing Data From The Cache

One of the primary goals listed above was that the user should never be served stale data (unless it is acceptable to do so). This goal is achieved by clearing the cache when dependent data is updated.

Let’s use my company’s Content Management System (Recite CMS) as an example. One of the most expensive operations when displaying a page in Recite CMS is to generate the web site navigation. It is also an extremely important aspect that we want to maximise performance for. Because of this, we cache as much of this data as we can.

If a content editor creates a new page on their web site, they expect for that new page to appear in the web site navigation. Therefore, when they create the new page, in the background we must clear the cached navigation data.

There are two ways to approach clearing data from the cache:

  1. Clear the entire cache
  2. Clear the related cache records

Ideally you will only clear the related cache records (this is where tagging the cache records becomes useful!), but on occasion it can be difficult to isolate the records so the “brute force” approach of clearing the entire cache is required.

Later in this article we will look at how to remove cache records when using Zend_Cache.

Now that we’ve covered the theory of caching, let’s move on to actually implementing a Zend_Cache-based solution.


Setting up your web application to use Zend_Cache involves the configuration of a front-end and a back-end driver.

The front-end is used to determine exactly which data is to be cached, while the back-end deals with the actual storage of the cached data. You can develop your own front-ends and back-ends, but typically you will not need to.

If your cache is to be used in various locations within your web application, a good strategy will be to create the cache when bootstrapping your application.

Front-Ends

The following front-ends are available:

  • Zend_Cache_Core – This is the “main” front-end from which all other front-ends extend from. This is what you typically use for storing arbitrary data or variables.
  • Zend_Cache_Frontend_Page – This front-end captures all output from your PHP script. Next time somebody accesses the page it will returned the save data rather the processing the script.
  • Zend_Cache_Frontend_Output – This is similar to the page front-end, except it will save arbitrary output in your PHP script (as opposed to all of the output).
  • Zend_Cache_Frontend_File – This front-end uses a “master file” as its indicator when the cache needs to be refreshed.
  • Zend_Cache_Frontend_Function – This front-end can be used as a proxy to most PHP functions. It will call the function you specify and save its response for future use.
  • Zend_Cache_Frontend_Class – This front-end is similar to the function front-end, but allows caching of object and static method calls.

In my opinion, the function and class caches should not typically be used – if you develop a function or class that is expensive to run, then that code should be sufficiently optimized to leverage caching itself internally.

Tip: Many of the Zend Framework components (such as Zend_Locale) allow you to specify a cache that they can use internally. You can follow this practice yourself with your own expensive classes.

Back-Ends

There are a number of different back-ends available. We’re going to use the Zend_Cache_Backend_File back-end in this article, which is used for writing cache data to the filesystem.

Some of the other available back-ends include:

  • Zend_Cache_Backend_Sqlite – Stores cache records in a sqlite database.
  • Zend_Cache_Backend_Memcached – Store cache records in memory.
  • Zend_Cache_Backend_Apc – Lets the APC extension for PHP deal with storing data.

For a comprehensive list of the available back-ends, refer to http://framework.zend.com/manual/en/zend.cache.backends.html.

Creating a Cache

In order to create a cache, you call the Zend_Cache::factory() method. When calling this method, you must specify which front-end and back-end drivers to use, as well as options for both of them.

As mentioned previously, we will use the Zend_Cache_Backend_File back-end. You can specify this by using File as the name of the driver when calling Zend_Cache::factory(). The only parameter this driver needs is the filesystem path where cache records will be saved. This path must be writable by the web server.

To get started we will use the Zend_Cache_Core front-end driver. This is specified using Core when calling Zend_Cache::factory().

There are many parameters you can specify for the front-end (see http://framework.zend.com/manual/en/zend.cache.frontends.html for a list), such as the caching lifetime (the default is one hour). For now we’ll just enable the automatic_serializationoption. Doing so allows us to cache a PHP variable (such as an array) and Zend_Cache will make this data storable automatically.

The following listing shows how to create a cache. The returned object is an instance of Zend_Cache_Core which you can then use throughout your web application.

Even though this script doesn’t yet do anything useful, you will be able to see if your cache is working when you run the script. Specifically, if the filesystem path is not writable, a PHP warning will appear.


Now that you have a cache set up, you can read and write cache records.

To read data from the cache, use the load() method. This method accepts a cache ID as its first argument. This tells the cache which record is required.

If the record is found, the original data is returned. If you’re using automatic serialization and the data you saved was something like a PHP array, then it will be automatically deserialized and you will receive it back in its original form.

If the record is not found then false is returned. You can easily test for this, and then generate the data if required. The data can then be saved to the cache using the save() method. This will make it available for subsequent calls to load().

The save() method accepts the data as its first argument. You can specify the cache ID as the second argument if you want to, but if you don’t then the cache ID used in the load() call will be used.

Caution: If you’re calling save() from a cache-priming script then you likely wouldn’t have called load() ahead of time. In this case you should specify the cache ID as the second argument to save().

The following listing demonstrates making use of the load() and save() methods. To demonstrate the feel of slow code that caching is used to improve, I’ve added a call to PHP’s sleep() function. This will make the first load slow (when it writes to cache) and subsequent loads (reading from cache) much faster.

This demonstrates the most basic of cache usage. Next up we will look at some more advanced cache usage, including caching entire page output and using tags.


As discussed earlier, one of the front-ends available for Zend_Cache is the Zend_Cache_Frontend_Page class. This front-end will automatically save all output from a PHP script to its cache then serve it in future when the page is re-requested.

Using this front-end is extremely simple. All you need to do is call the start() method on the page. If the current page is found in cache it will be returned and the script will exit, otherwise it will continue to process the script normally.

In fact, you don’t even need to tell the cache to save. It detects when the entire script finishes and handle all of the cleanup and saving to cache automatically. You do not need to specify the cache ID when you call start(). You can if you like, but otherwise it will generate the cache ID automatically based on the front-end settings.

When configuring this front-end, you need to specify which pages it is allowed to cache (based upon the page URI). This is achieved using a series of regular expressions, stored in the regexps parameter.The regular expression serves as the array key, while its value is an array of options. You must enable the cache for the given regular expression (or you can disable it if it’s a subset of another regular expression).

You must also indicate how cache IDs should be generated. For each of get, post, session, files and cookies you must indicate if the values are used when generating the ID, and if an ID can be generated when values are present.

For example, if a page can be cached when there are values in $_COOKIE, specify cache_with_cookie_variables. If the actual values in $_COOKIE are to be used when generating the ID, specify make_id_with_cookie_variables.

Note: Getting these settings right can take some tweaking initially, but once you figure it out they shouldn’t need to be changed.

The following code demonstrates basic usage of Zend_Cache_Frontend_Page. Typically your pages will do a lot more than this one, but this should serve to demonstrate how it works. In this code we specify just one regular expression, which will include every page in the site.

When writing this code, I had cookies set on the domain I was testing. To allow caching with these set, I enabled cache_with_cookie_variables.

One other aspect of this code to be aware of is the debug_header parameter. Setting this to true means a short message is displayed (but not saved in cache) at the top of a page when a cached version is served.

This is useful for debugging when making changes to your web site. My web applications typically have some kind of flag to indicate if the site is in development mode. This allows me to do something like 'debug_header' => IN_DEVELOPMENT.

Saving Headers

There’s no rule that states pages must be in HTML format. You can use this front-end to serve dynamic JavaScript or CSS (or even images) if you really wanted to. Note however that doing so may require custom HTTP headers.

If the page being cached uses any custom headers (such as Content-Encoding for gzipped pages, or Content-Type if you’re not serving HTML), you need to tell the cache to store these headers.

This is achieved by specifying the memorize_headers parameter. If you do not specify these headers to remember, cached pages will be returned and the headers won’t all be set. This may cause the page to not be rendered correctly.

The following listing demonstrates specifying headers to be remembered.

Canceling the Cache

In some circumstances you might get half way through processing a script then realize that it shouldn’t be cached. This might occur if you have a blanket page cache on every single script (and every URI) but some scripts mustn’t be cached.

To prevent the page from being saved to cache you can call the cancel() method on the cache object.


Earlier in this article I stated that the cache should never serve stale content. The way to avoid this is to clean cache records when they become stale (even if their expiry date hasn’t yet been reached).

One way of doing this is to clear the entire cache (by calling $cache->clean() with no arguments), however this will mean all data in the cache is removed. This potentially means a lot of unnecessary processing power will be devoted to re-populating the cache.

The preferred solution for this is to clear only the related records. To do this we first need to be able to identify the related records. This is achieved by tagging records when they are saved to the cache. A tag is simply a string used to identify the record. You can save any number of tags with a cache record.

As we saw earlier in this article, a record is saved to the cache using the save() method. The first argument is the data to save, while the second argument is the cache ID (which can be left blank if called after load()). The third argument to save() is an array of tags to save with the data.

Note: You can specify the tags without having to specify the cache ID – just pass null as the second argument to save().

Saving a Cache Record With Tags

The following code demonstrates saving the cache record with two tags. We will make use of these tags shortly.

Clearing Records Based on Tag

When calling the clear() method on the cache, you can specify that only records with certain tags are deleted. This allows to target your cache deletes, rather than clearing the entire cache.

When clearing by tag you must specify one or more tags to clear by. You must also decide if you want records with any of the tags to be removed, or if records must have all of the tags to be removed.

If records must have every single tag specified (in the second argument), pass Zend_Cache::CLEANING_MODE_MATCHING_TAG as the first argument to clean().

If a record need only have one of the tags specified in the second argument, pass Zend_Cache::CLEANING_MODE_MATCHING_ANY_TAG as the first argument.

The following listing would remove the record created in the previous listing.

Real-World Example of Tagging

To demonstrate how to use tagging in practical terms, let’s look at how Recite CMS uses Zend_Cache for displaying news articles.

There are two ways of displaying news articles: a list of news articles, and the details of a single news article.

When the list of articles is requested, the page is cached with a tag of news_articles.

When the article details page is requested, the page is cached with a tag of news_article_{$id} (that is, it uses the database ID of the article). For instance, if the article being viewed has an id of 123, the tag is news_article_123.

When article 123 is updated (or deleted) using the Control Panel, the following code is executed:

Likewise, when an article is created, the cache is cleared by tag news_articles.

While this may appear obvious at first, consider exactly what is happening:

  • When any article is created, edited or deleted, the list of news articles will no longer to be cached.
  • When an article is created, edited or deleted, only its details page will be removed from the cache.

Here you have the best of both worlds – when an article is updated, only that article’s details page is cleared from cache. Every other article’s details page remains in the cache!


At this stage you should have a reasonable understanding of how Zend_Cache works. It is simple but can be very powerful.

One of the most important things I can recommend is to clearly document all of your caching points. That is, know exactly where your cache is being read from, being written to, or being cleared. Know exactly how tags are structured and what impact clearing on them will have.

It’s very easy to let a complex caching system get out of control and to have no idea exactly how it is operating. I recommend including logging on hits and misses and tracking the frequency of cache modifications.

To summarize, in this article we looked at how to implement a caching solution in your PHP web site using the Zend_Cache component of the Zend Framework.

I first introduced you to how caching works and how it can be leveraged to improve your web site’s performance. Next I introduced you to Zend_Cache and its front and back-ends drivers.

Next I showed you how to cache the output of entire PHP scripts using the page front-end. I then showed you how to tag cache records, then clean the cache using tags. Finally, I described a real-world implementation of Zend_Cache and its tagging mechanism.

Further Reading

 

 

Share on FacebookTweet about this on TwitterShare on Google+Share on LinkedIn

Leave a Reply

Your email address will not be published. Required fields are marked *