If you haven't read that, please check it out and come back. It's OK, I'll wait here. I've got a snack.
Improving on cachedFile()
Back? OK, well I've made some improvements to cachedFile() and thought I'd share them. Here are the new capabilities:
1) The function now extracts the file type from the extension
2) It handles images, and specifies their dimensions for faster and smoother rendering by the browser
3) It caches all the information it calculates about a file for faster performance on subsequent requests
#1 and #2 are pretty straightforward: you can use the function like this:
cachedFile('foo.png');
cachedFile('subdirectory/bar.png','class="buz"')
...and it outputs something like this:
<img src="/images/foo.png?1241452378" width="16" height="15" />
<img src="/images/subdirectory/bar.png?1241452378" width="20" height="17" class="buz" />
<img src="/images/subdirectory/bar.png?1241452378" width="20" height="17" class="buz" />
I put a cache in your cache so you can cache while you cache
But what about #3? What's this caching business? How can we add caching to a caching function?
Let's back up a bit.
First off, cachedFile() was a bit of a misnomer. This function is really for BUSTING a cache.
1) First, we configured our web server to tell the browser "you can cache these types of files for a whole year - don't ask for them again."
2) Second, we made sure that the browser saw each filename as the combination of the ACTUAL filename, like 'foo.png', and the file's time stamp, resulting in 'foo.png?1241452378' (or something like that). Those numbers represent the last time the file was changed; they're the same time stamp you see on any file on your computer.
3) Third, since the time stamp is automatically pulled from the file, we verified that we can update the file, which will update the time stamp, which will trick the browser into thinking it's never seen that file before, and therefore requesting it again.
The end result: the browser asks for a file once, then never again (at least for a year) - until the moment you change the file. As soon as it's updated, the browser asks for a new copy; until then, it uses the one it cached.
So, instead of cachedFile(), we could have called the function browserCacheBuster(). (But we won't, because I think that sounds cheesy.)
Now, this is all great, but the server is doing a bit of work for each file. Like before, each time you ask our function for a file, it has to go and determine the time stamp. In addition, my new features mean that for image files, it has to compute the width and height of the image.
This is all very fast in human terms, but how will it scale? What if you're using cachedFile() to spit out the same image tag several hundred times on the same page?
In that case, it might be nice to remember what you calculated last time. "Foo.png? Oh yeah, I remember him. I wrote down his dimensions and time stamp right here. No need to calculate them again."[1]
Memoization
To make this happen, we're going to use a design pattern called memoization. It works like this:
1) Before you calculate a result or pull it from the database, see if you've already got that result stored in a cache
2) If not, figure out your result and store it in your cache. If so, skip this step.
3) Now you've verified that you've got it in your cache, so return it from there.
For a given input, the first time the function runs, it will check the cache, find nothing, calculate a result from the input, store the result in cache, and return. Every time after that, it will just check the cache, find a result for that input, and return.
Does it matter?
But is there any point in doing this? Are we prematurely optimizing? Maybe. Let's see how much performance gain this really gets us.
I did a little not-very-scientific testing: added some caching to cachedFile(), called it from a loop a few hundred times, and timed the results using PHP's microtime(). I tried this with js, css, and image files, and did five or ten iterations of each.
Not a great sample size, but here's what I found: for .js files, having a cache made the function 2.72 times faster. For .css files, it made it 3.18 times faster. But for image files, having a cache made the function 119.63 times faster!
Clearly, computing those image dimensions is a bit expensive for the server, and we don't want to do it more than necessary.[2] Caching cuts the workload considerably.
Enough talk - code time
OK, let's see how our function looks with these changes. (The cache is stored in a global variable so it will persist between function calls. To offset this minor sin, I have labeled it clearly and awkwardly to prevent accidental meddling from elsewhere.)
$GLOBAL_cachedFile_cache = null;
function cachedFile($name, $attr=null){
global $GLOBAL_cachedFile_cache;
if (!isset($GLOBAL_cachedFile_cache[$name])){
$root = $_SERVER['DOCUMENT_ROOT'];
$filetype = substr($name,strripos($name,'.')+1);
/* Configuration options */
$imgpath = '/images/';
$csspath = '/stylesheets/';
$jspath = '/scripts/';
switch ($filetype){
case 'css':
$output = '<link rel="stylesheet" type="text/css" href="/includes/';
$output .= $name;
$output .= '?' . filemtime($root . $csspath . $name) . '" ';
if($attr){
$output .= $attr . ' ';
}
$output .= '/>' . "\n";
break;
case 'js':
$output = '<script type="text/javascript" src="/includes/';
$output .= $name;
$output .= '?' . filemtime($root . $jspath . $name) . '"';
$output .= '</script>' . "\n";
break;
case 'jpg':
case 'gif':
case 'png':
//This code will get run in any of the three cases above
$output = '<img src="' . $imgpath . $name;
$output .= '?' . filemtime($root . $imgpath . $name) . '"';
$imgsize = getimagesize($root . $imgpath . $name);
$output .= ' ' . $imgsize[3];
if($attr){
$output .= ' ' . $attr;
}
$output .= ' />';
break;
}
$GLOBAL_cachedFile_cache[$name] = $output;
}
echo $GLOBAL_cachedFile_cache[$name];
}
Magnanimousness
What's that? Want to use this code somewhere? Well, sure. No, you don't have to thank me, or license it, or anything. Just name your kid after me or send me a solid gold pickle.
Humility
And of course, perhaps I did something very stupid here. Well, that's what comments are for.
[1]You might worry if this will create problems. After all, if we cache the time stamp, won't we miss the fact that the file has been updated and defeat our purpose? No worries: the cache only lasts as long as the page script is running. So if you update a file while a user is loading the page, they won't see it. But on the next reload, they will.
[2]In fact, it would be reasonable not to do it at all; there are lots of factors in how fast a site performs and seems, but how quickly it renders is certainly one of them. This is meant to help with that, but costs processor speed. You'll have to decide what works best for your site.
Your close script tag has an extra < in it.
ReplyDeleteYou have /includes hard-coded into your output string for js and css; don't you want csspath and jspath, like how img does it?
I notice you call filemtime in three places. Maybe it would be better to call it once, like
$modtime = filemtime($root . $path . $name)
before the switch statement. But, then you'd need a case statement to determine what $path was, right? So I'd suggest a lookup table instead of three separate $*path variables:
$paths = array("css" => "/includes",
"js" => "/includes",
"jpg" => "/imgs",
"png" => "/imgs",
"gif" => "/imgs");
$path = $paths[$filetype];
But then, you're duplicating "/imgs" three times (unless you do some case statement again, which gets even messior), and you're more likely to change image directories than your call to filemtime, so maybe this suggestion is actually worse!
Finally, you cache $output at the end of each case statement; you could do that at the end of your if block instead.
Overall, great freakin work, this is cool :) I didn't see anything relevant for "php memoize" on the web, so you may be breaking ground. If it turns out that calculating image width and height is the slow part of the img work (which I assume it must be), you might find it worthwhile to cache a map from imagename+datetime => image dimensions, somewhere that persists longer than a single page render, so that after the first time a user loads a file, every other user gets its dimensions for free.
@Michael - Thanks for the feedback!
ReplyDelete1) Fixed the syntax problem. Posting code in Blogger requires horrible contortions, and this was a side effect. I mean, I'm using a boatload of non-breaking space HTML characters for the indention. Just wretched. Maybe I'll find a better solution.
2) I changed the directory names in the example, but I'd expect anyone who adopts this code to set them according to their own project's organization.
3) I hear ya about duplicating the filemtime() references, but I think (as you seem to have concluded) that this is the most straightforward way, so that there's only one piece of logic that has to determine file type.
4) I like your suggestion to cache $output at the end of the if statement, and have modified the code accordingly.
As for caching a map of image sizes on a more persistent basis - that's an interesting idea. Right now, I think it would be overkill for me, but I will keep it in mind for future projects.
Thanks again!