This is the story of how Arq came to use Amazon Web Services (AWS) for backing up files.
Back in 2008 Apple announced their Time Capsule. I was really excited because I really wanted a backup solution that didn’t require me to remember anything (like periodically plugging in an external drive or making sure a NAS box was available). I also wanted a solution that I could control. Time Capsule seemed like perfect. I could set up Time Machine, it would back up to my Time Capsule whenever I’m at home, and I’d never have to think about it again.
The reality of Time Machine and Time Capsule wasn’t so wonderful in my case. Time Machine struggled to finish backing up most of the time, and had a lot of trouble with my habit of frequently closing my laptop lid and putting my Mac to sleep. I also couldn’t figure out how to make it “just work” for both my wife’s Mac and mine on the same Time Capsule. There were a number of other, smaller issues as well, like its inability to back up just the changed parts of my enormous mail file. I learned this was due to its design based on Unix hard links. The Time Capsule solution also didn’t provide any off-site protection in case all my computer equipment were stolen.
Online Backup in 2008
I looked at the commercial online backup solutions available at the time, and none of them really “felt” like backup because I didn’t have any control over the backup data. I wanted to be able to verify the backup data were really there and safe. And if the backups are on someone else’s hardware, they should be encrypted with a key that controlled by me, not the hardware owner. None of the available options had client-side encryption.
To me, backups need to feel “solid” and trustworthy. I couldn’t find a solution that felt solid and trustworthy enough for me.
So, I set about building Arq.
I chose Amazon S3 because it had a really nice API, was purported to be very stable and reliable, and was delivered by Amazon, a stable company that seemed to be in cloud computing for the long haul, so I could be reasonably sure my data would be around in the future.
Back in 2009 when I had just started working on the first version of Arq, I would tell people about it at meetups around town, including the Amazon S3 costs. People usually reacted with something like, “So, that sounds like Mozy but much more expensive. Doesn’t sound like a great idea.” But it was something I really wanted, so I kept at it. (I had no idea it would become so popular. Apparently a lot of other people want a high-quality backup app with reliable storage options.)
Anyway, backup to Amazon S3 turned out to be a great solution I think. Whenever you have an internet connection you get backed up automatically. You never have to worry about running out of backup space because S3 is like an infinitely-large disk drive in the sky. And, unlike EC2, S3 has had almost zero downtime since I started using it in 2009 (the only downtime incident I could find reference to on the interwebs was 6 hours of downtime in 2008).
S3 can get expensive compared to the “unlimited” offerings like Carbonite, but what you get is a very simple, stable storage service with a simple API; Arq provides the backup function on top of it. It’s like a power company — they provide 120 volts of AC, all the time, and your appliances provide functionality on top of it. To me this model feels more solid, and backup needs to be solid.
A year ago Amazon announced a new storage option called Glacier. It’s 1/10th the storage cost of S3, but it incurs fees if you retrieve your data, especially if you retrieve it rapidly. If you retrieve less than 5% of your stored data per month (pro-rated daily) there’s no fee; but if you retrieve lots of data all at once, the retrieval fees can add up. Arq supports backing up to Glacier, and when you restore using Arq it first asks at what rate you’d like to download and shows you the estimated fee for that rate. It’s especially suited to second-tier backup; if you’ve already got a local backup then the Glacier backup is just in case both your computer and your local backup fail. I back up my photos and music to Glacier (because it’s a lot of data) and everything else to S3 (because restore is faster/cheaper).
File Sync on S3
The other thing I’ve wanted for a while is a file sync solution that I control. Dropbox is an excellent solution, but I wanted control the same way I have with Arq. I wanted client-side encryption; I wanted the cloud data to “feel” solid and trustworthy; and I wanted total flexibility — no limits on file sizes, number of files, or total storage space. I basically wanted my own Dropbox system, running in my own AWS account. So I built that. It’s called Filosync, and I really love it. Give it a try if you’re looking for that sort of thing.