As of the current AWS API, setting the Expires and Cache Control headers for all objects in an AWS S3 bucket requires a script. It is possible to do it one file at a time through the AWS control panel, but that is tedious if not impossible for buckets with lots of files.
I was hoping the AWS CloudFront ‘edit cache behavior’ screen would be a way of getting around this. There is a CloudFront distribution option for overriding an origin’s cache settings. For whatever reason, this did not work. I tried creating a totally new distribution, editing an old one, waiting overnight, nothing made a difference. Maybe it is fixed by now and worth another try?
With the ‘edit cache behavior’ being a dead end, I whipped up a script to explicitly set the headers at the S3 source on a per file basis.
Requirements:
- PHP binary installed on the command line. I’ve used php 5.3.6 successfully with this script.
- Download of the AWS PHP SDK
- AWS credentials, found in the AWS control panel under ‘Security Credentials’.
It turns out using the AWS PHP SDK, there is a trick to getting this done. It involves using the copy_object method with the metadataDirective set to REPLACE. Not as clear of an API as it could be, but it got the job done. Hopefully this saves a few readers some frustration.
// updates all the files in the s3 bucket to expire in 3 years // configuration define("AWS_KEY", "YOUR AWS KEY"); define("AWS_SECRET_KEY", "YOUR AWS SECRET KEY"); $bucket = "YOUR BUCKET NAME"; // path to the the php sdk you downloaded require_once("aws_php_skd/sdk.class.php"); // AWS S3 paginates the list of items in bucket // in this case, we will go X at a time: $MAX_KEYS = 200; $s3 = new AmazonS3(); $marker = null; $n = 0; // prime the list $list = $s3->list_objects($bucket, array( 'marker' => $marker, 'max-keys' => $MAX_KEYS )); // loop through the paginated list of files in the bucket while(count($list->body->Contents) > 0) { foreach($list->body->Contents as $file) { $filename = $file->Key; $marker = $filename; $n++; echo $n . " PROCESSING: " . $filename . "\n"; // replace the existing object with a copy of itself, // plus set the new expires header $response = $s3->copy_object( array('bucket' => $bucket, 'filename' => $filename), array('bucket' => $bucket, 'filename' => $filename), array( 'acl' => AmazonS3::ACL_PUBLIC, 'metadataDirective' => 'REPLACE', 'headers' => array( "Cache-Control" => "max-age=94608000", "Expires" => gmdate("D, d M Y H:i:s T", strtotime("+3 years")) ) ) ); if(!$response->isOK()) { echo $n . " PROBLEM: "; var_dump($response); } } // get more for the next loop $list = $s3->list_objects($bucket, array( 'marker' => $marker, 'max-keys' => $MAX_KEYS )); } echo "DONE!\n";
More Information about Cache-Control and Expires headers:
Setting the expires and cache-control header is a good idea if you want client browsers to cache your content. It saves on bandwidth and makes your site run faster (especially in concert with CloudFront). 3 years out seemed like a good number to me.
There are two headers to set. Expires is from the HTTP 1.0 standard, while Cache-Control was introduced in the HTTP 1.1 standard. Depending on what kind of client is connecting, a different standard might be used. I’d think HTTP 1.1 would be the more popular choice given it came out in 1999!
Keep in mind, you will need to run this script every so often, since in three years, the expires header runs up. The cache-control header works like a TTL, and it is based off when the client gets the file. However, the expires header is a fixed date. The way the expires header is designed is annoying.