BackupBuddy and Amazon S3

BackupBuddy, the popular WordPress plugin to back up your WP site, has an option to back up to Amazon S3 storage.  S3 storage is a great solution because it’s cheap, easy to use, and can be secure if you set it up right.  Unfortunately I couldn’t find any decent instructions on how to set up S3 for BackupBuddy.  The few references I found only showed how to set up wide-open S3 access, with a disclaimer this it was not very secure.

We do a fair amount of work with S3 storage here so I worked out the details.  These are my recommendations for setting up your S3 access credentials on the Amazon AWS side. Note that this is based on WP version 4.2.2 and BackupBuddy  These recommendations may not apply to other versions.

Your AWS account and IAM

First you have to have an AWS account, of course.  You can sign up here.  In case you don’t notice in all of the agreements and terms of service, someone gaining access to your root AWS account could not only mess with all your AWS resources, but could spend a lot of your money.  (That’s why Amazon recommends securing the login and actually deleting the access keys for your root account!)  What you need is to set up an “IAM” account, which is sort of a proxy for your root account, except that you can limit what the account can do.  Read about IAM here.

Bottom line is that in BackupBuddy’s “Remote Destinations” setup, you should never use your AWS root account access keys.  You really should create an IAM account with limited privileges and use the IAM account’s access keys to do your backups. 

Setting up minimum privileges for your IAM user

Amazon S3 objects (for example, your backups) are saved into “buckets.”  Access to each bucket may be controlled individually, and that’s what I recommend:  each site you’re backing up should have its own S3 bucket, and should have privileges ONLY for that bucket.  I also recommend you create your S3 buckets manually using the AWS console so you don’t have to give your IAM user the ability to create buckets.  Then the minimum s3 privileges your IAM user needs are:

  • s3:ListBucket (for your bucket)
  • s3:PutObject (for items in your bucket)
  • s3:GetBucketLocation (for your bucket)

I don’t think you really need s3:GetBucketLocation to run BackupBuddy backups, but if you don’t include it, you’ll get an error when you click the “test” button on BackupBuddy’s Remote Destinations page, and the test button is your friend since it can save you tons of time troubleshooting.  (Note: the test button works by trying to create a small test file in your S3 bucket, but it won’t be able to delete it because of the tight rules I’m recommending.  Don’t worry – that’s not a problem.)

Setting up a limited-access policy can be tricky.   In the IAM Management Console, create your IAM user, then click “Create User Policy” and enter this JSON policy code (using your bucket name, of course.)

Notice that the s3:ListBucket and s3:GetBucketLocation actions are allowed for the bucket, but the s3:PutObject action must be allowed for items inside the bucket, indicated by the trailing “/*”.  For security reasons, this policy prevents reading or deleting items once they are created in the bucket.

Deleting old backups automatically

Since the policy prevents deleting existing backups, BackupBuddy won’t be able to limit the number of backups by deleting old ones.  It’s best to avoid any settings that will make it try.

For deleting (or archiving) old backups, the Amazon S3 “Lifecycle Management” rules work well and are easy to set up.  For example you can set up a rule to delete backups older than 90 days, and AWS will take care of it for you.  Setting up a rule to archive some or all of your backups to “Glacier” storage can provide additional security.

Deploying MediaWiki via Chef

After deploying a few custom servers to support both Ruby and php applications, setting up a server to run MediaWiki seemed like a breeze.  And it is, sort of, so long as you know where to draw the line.

It seemed plausible to create a Chef cookbook to deploy a ready-to-run MediaWiki installation.  And I wasted a lot of time trying.  At first I was delighted that there was so much setup documentation, but as I dug  in to it, I realized that it was virtually all written for a barely-technical audience. Important details were missing and once you got outside the “recommended” web-based install directions, there were even factual errors.

So after trying a number of things,  documented and undocumented, I finally conceded that the only really supported way to do an install is using the web install method that MediaWiki provides.  It’s a shame really, since all it does is set up the database and create the LocalSettings.php file.  But the docs are simply not sufficient to support automating anything further.

Here’s what works

You can use Chef to set up:

  • the basic server niceties, like a firewall, email support for crontab jobs, and the like
  • the basic LAMP stack and php using the supported apache2, php and mysql cookbooks
  • MySQL users and create the database.  While the MediaWiki installer would create the production MySQL user and database for you, there’s no harm in letting Chef do it first, and the value is that Chef will maintain the resources.
  • the MediaWiki image and initial files.  I automated pulling the MediaWiki tar file, and expanding it into the public www directory, with a guard to prevent if it already exists.

But that’s all!  Efforts to build out the database using the MediaWiki scripts failed, and the LocalSettings.php file just seems to require manual editing whatever you do.  So after the Chef build you still have to run the web-based setup, and copy (and probably edit)  the generated LocalSettings.php back to the document root directory of the site.

That’s not how we like to roll here… it’s not terrible but it’s a step backwards toward the manual setup + “servers as pets” thing, and for now we’re stuck with it.

Update: it gets worse

After backing off from trying to automate any actual MediaWiki installation, I thought I was out of the woods.  Since I was planning to add Semantic MediaWiki and Semantic Forms, I thought it would be OK to install these packages using Composer as part of the server build.  (Nope) That results in an immediate error when trying to run the MediaWiki web-based installer.

So now my “automated” install has to work like this:

  • Use Chef to converge a new server
  • Run the manual web-based MediaWiki installer
  • Copy the LocalSettings.php file back up to the new server
  • Manually add the extensions I want to use via php Composer (and any other required install steps, including a frequent requirement to edit the LocalSettings.php file.)

It’s really not looking much like an automated solution anymore, but that seems like the best I can do.  After using tools like Chef and Capistrano for deployment, I am finding the process for installing MediaWiki to be rather unsatisfying.


What’s Wrong with Chef and Other Rants

My apologies, this is a placeholder for a number of things I have meant to write about using Chef to deploy our cloud servers.  It turns out the task took longer and was a lot more convoluted than imagined, even after considering Hofstadter’s Law, so the writing has all been pushed back.  But here’s the proposed list of topics worth writing (& reading) about:

Last first, since it is current.

New theme :-(

Today I upgraded a raft of things on this WordPress site, one of them my base theme.  And the site looked like crap-o-la.

Apparently, creating a child theme to avoid losing custom styling when upgrading the base theme… is virtually no help at all.

So it goes.  Spending the afternoon fiddling with CSS seemed like a terrible idea, so I just picked  a new theme.

It seems like most of theme authors assume you really don’t have much to say because most the the page is filled with header graphics, etc.  Well maybe this one (WP-Forge) errs on the side of boring simplicity.  So it goes.  It works and I like it

Announcing encoding-inspector

First there was the encoding_sampler Gem

Last November, I created the encoding_sampler Gem (RubyGems, github) to help find the desired encoding for a particular text file. From the Gem description:

EncodingSampler helps solve the problem of what to do when the character encoding is unknown, for example when a user is uploading a file but has no idea of its encoding (or typically, even what “character encoding” means.) EncodingSampler extracts a concise set of samples from the selected file for display so the user can choose wisely.

If you deal with client data files, you know the problems… the client’s eye’s glaze over when you ask about character encoding. And the encoding problems may be sparse; If there’s one line with an “®” character 2MB down in the file, you’re just not going to find it by inspection. Encoding_sampler does the heavy lifting.

But… while getting the encoding right is critical, we have found that building an encoding detection tool into our general data import applications is overkill. It complicates the user interface and confuses our clients.

So… the encoding-inspector web service

My solution was to create a separate web application to determine a text file’s encoding:

encoding-inspector, at

The inspiration comes from common tools like the W3C Validations Service, and this online JSON parser. Encoding-inspector is essentially a front-end to the encoding_sampler Gem. Here’s a screen shot of a typical result:

Typical encoding-inspector output

Results provide simple visual samples so non-programmer/non-tech users can understand the options and choose the right encoding.  In this example, you can see that UTF-8 is the correct encoding.

encoding-inspector is built and maintained by Roll No Rocks and it’s hosted by TRI Resources International. It’s our hope that people find this resource useful, and that we can continue to provide it as a free service.