Archive for the ‘backup’ Category

Using Arq with IAM


August 23rd, 2012

This post is for system administrators who support Arq on multiple computers. If that’s you, please read on!

IAM and Arq

If you need to install Arq on many computers using the same S3 account but you don’t want Arq to see the other computers’ backup data, use Amazon’s IAM (Identity and Access Management) to restrict what Arq sees.

The easiest way to do this is as follows:

  1. Use your main keys to install and configure Arq on a computer.
  2. Quit Arq and quit Arq Agent.
  3. Create an IAM user and capture its access key ID and secret access key.
  4. Look in (home)/Library/Arq/config/app_config.plist for the localS3BucketName and localComputerUUID values.
  5. Set up an IAM user with a policy that allows full access only to /<localComputerUUID> in the localS3BucketName, as well as “ListBucket” access (see example IAM policy below).
  6. Open the Keychain Access app and change the “Arq S3″ entry’s Account and Password fields to the access key ID and secret access key of that IAM user.
  7. Launch Arq.

Example IAM Policy

For computer with the following values:

  • localS3BucketName = akiaiyuk3n3tme6l4hfa.comhaystacksoftwarearq
  • localComputerUUID = 32D9D7A2-3B3E-4BE7-B85B-0605AF24F570

the IAM policy would look like this:

 "Statement": [
     "Sid": "Stmt1344522941209",
     "Action": [
     "Effect": "Allow",
     "Resource": [
     "Condition": {
       "StringLike": {
         "s3:prefix": "32D9D7A2-3B3E-4BE7-B85B-0605AF24F570/*"
     "Sid": "Stmt1344522997713",
     "Action": [
     "Effect": "Allow",
     "Resource": [

The first part gives “s3:ListBucket” permission for the user’s bucket, but only with a prefix starting with 32D9D7A2-3B3E-4BE7-B85B-0605AF24F570/* (her UUID).

The second part gives permission for all actions for resources starting with akiaiyuk3n3tme6l4hfacomhaystacksoftwarearq/32D9D7A2-3B3E-4BE7-B85B-0605AF24F570/*.

Answer Files and IAM

For information on automating Arq configuration using answer files and IAM, please read the Arq manual’s Configuring Arq Using an Answer File section.

Arq plugin for Sidekick


April 28th, 2012

Arq Forum member jmah did some reverse-engineering of Arq and posted a message about a plugin he wrote for Sidekick which tells Arq to back up whenever he returns home.

The source code is on github.

Really clever! I love it.

Arq 2.6.9 is out


April 28th, 2012

Arq version 2.6.9 is now available!

This minor update fixes several minor issues, including the issue where some backup sets weren’t appearing under “Other Backup Sets”.

It’s a free update for all Arq users. Pick “Check for Updates” from the Arq menu to get the update.

As always, full release notes for all Arq versions are on the release notes page.

7 facets of a good Mac backup strategy


January 23rd, 2012

I’ve been studying the computer backup industry for 3 years now and I’ve been selling my own online backup product, Arq, since February 2010. I’ve seen and heard lots of different approaches to backing up one’s computer. Here are some backup lessons I’ve learned.

1. Assume your hard drive will fail very soon

Expect imminent disk failure no matter how old or new your hard drive is. The other day a customer sent me email saying Arq was reporting input/output errors. I told him it was probably a hardware problem and he should replace his hard drive ASAP. He said it’s an SSD that he installed 2 days ago, so that can’t be it. A few days later he wrote back saying the SSD was the culprit.

SSDs in my opinion are worse than spinning drives because they seem to fail catastrophically more often. Spinning drives often fail more gradually, giving you a chance to copy your data off, which is especially good if you haven’t been doing backups — but you are doing backups, right?

2. Automate it

Any backup approach that requires you to remember something has one big problem: you’ll forget. If you have to plug in an external hard drive for your backup approach, you won’t do it. At least not often enough.

3. Keep it simple

Choose simple backup processes to minimize the opportunity for error. Apple’s Time Machine is a great example of a simple app. Arq asks almost no questions — the defaults are fine. SuperDuper is just as simple — you just click one button and it makes a clone of your hard drive. All of these apps have lots more options, but you can safely ignore them.

4. Use multiple backup systems

This goes against the “keep it simple” advice, but counting on just one backup strategy is risky. When it comes time to recover from failure, you want as many opportunities to get your stuff back as possible. You don’t want to wake up one morning to a disk failure and then find out that you’d accidentally deleted your one backup app 6 months ago and you’ve lost 6 months of work. Or find out that your one online backup provider lost your data, or disappeared altogether.

Speaking of online backup: make sure one of the backup systems you use is off-site, to protect against theft, fire, lightning strike, flood, etc. For example, rotate your clone backup drives keeping one at the office (if your office is in a different location than your home!) or use an online backup service. I use 2 systems — one local and one off-site (explained below).

5. Minimize recovery time if possible

If you need to recover your entire computer from a Time Machine backup, you’re supposed to use Apple’s Migration Assistant app. Migration Assistant can be very slow however, especially when restoring from a Time Capsule over the network. If you have a clone of your hard drive made with an app like SuperDuper, you’ll be back in business in a minute — just plug the clone drive in, hold down the Option key, and boot your computer from the clone.

One potential downside of recovering with a clone is that in your haste to get back to work you may forget all about the fact that you’ve got no clone anymore! This can easily happen if you use a desktop computer — you won’t even notice that you’re running off the external hard drive.

At your earliest convenience you need to get another hard drive and clone to it, in case your clone fails. Having multiple backup systems helps mitigate this problem too.

6. Protect against corruption and “user error”

One of your backup systems should be a “versioning” system. Time Machine and Arq are 2 examples of this. They keep hourly backups of your files for the past 24 hours, daily backups for the past month, and weekly backups until they reach your storage budget (Arq) or the target disk is full (Time Machine).

Clones of your hard drive are great, but they’re only the latest version of your stuff. If a file becomes corrupt, the next time you clone your hard drive you’ll replace your old clone’s copy of the file with the new corrupt one.

One of your backup systems should keep multiple copies of your files over time to guard against corruption as well as the occasional what-was-I-thinking-when-I-deleted-half-that-document moments.

7. Avoid services whose interests aren’t aligned with yours

If you’re choosing an online backup provider, pay close attention to the data retention policies, especially with the “unlimited” offerings. Backblaze, for instance, will delete backups of your external drive if it hasn’t been connected within the past 30 days.

Also consider who has access to your stuff. With Backblaze you can pick your own encryption password, but if you need to restore your stuff you’ll have to give them your password; they decrypt your stuff and leave it in an unencrypted zip file on their servers; if you have them send you a disk with your stuff, your files will be sent through the mail unencrypted on that disk.

Also, any service that offers web access to your backups obviously has the ability to read your stuff (so that they can serve it to you through a web browser).

My Approach

I do all my work on 1 laptop (a MacBook Pro). I clone my laptop’s 2 internal hard drives (an SSD plus a spinning drive) using SuperDuper whenever I think of it. Arq backs up hourly all day long, from wherever I am, as long as there’s an internet connection. My computer doesn’t really go anywhere for very long that doesn’t have an internet connection, so this works for me.

If my SSD boot drive fails, I can’t boot from my Arq backups in S3, but I can get up and running quickly from the clone (which will probably be out-of-date) and then replace my key files with the latest versions from my Arq backups.

I feel good about my data at S3 not going anywhere. It’s in my own S3 account, and Amazon promises 99.999999999% (that’s 11 9s) of durability over 12 months.

In the worst case, if both my computer and my clone are damaged/lost/stolen I can download all my stuff from S3 using Arq, but it’ll take a while.

(SuperDuper and Arq are Mac-only. If you’re on Windows, you could try Acronis True Image for cloning and CloudBerry Backup for backup to Amazon S3.)

I should probably add a third option. Any suggestions? Send me email or post a comment!

Arq 2.4 is out!


December 29th, 2011

Arq version 2.4 is now available!

This update includes support for the new “sa-east-1″ (São Paulo, Brazil) S3 region.

It also now checks whether Amazon S3 is experiencing long “eventual consistency” delays, abort backup and budget enforcement activities until the next backup interval to avoid potential data loss due to incorrect (old) values being returned from S3.

It’s a free update for all Arq users. Pick “Check for Updates” from the Arq menu to get the update.

As always, full release notes for all Arq versions are on the release notes page.

Arq 2.3 is out!


December 7th, 2011

Arq version 2.3 is now available!

This update includes support for the new “us-west-2″ (Oregon) S3 region.

It’s a free update for all Arq users.

As always, full release notes for all Arq versions are on the release notes page.

Online Backup and Redundancy


June 21st, 2011

Do you use an online backup product/service? Ever wonder where your data are actually being stored? Ever wonder how safe and reliable that storage is?

It comes down to 1 question:


How much redundancy do you have?


Let’s look at the types of redundancy. But first a word about tape:

Disk vs. Tape Backup

In the past most backup systems used tape for storage. Tape was slow but it had much higher capacity than disk drives. Another killer feature was redundancy. Best practices for tape-based backup include keeping multiple historical tapes containing backups of your files at various points in history. Perhaps you needed to keep historical data for compliance reasons, but you also kept multiple tapes for redundancy.

This redundancy also helps protect you from data loss. If your most recent backup tape isn’t readable, you can always use the prior backup tape. You will lose the most recent items but that’s better than complete data loss.

RAID Is Not Backup

Most online backup offerings don’t use tape. They use disk. It’s cheaper now (and getting cheaper all the time), faster, and easier for the provider to use. Also, it’s “random access” — you don’t have to wind through the tape to get the file you want. But unlike tape there’s no extra disk with last week’s data.

Many providers use RAID arrays to protect against failure of an individual disk drive. This RAID can be effective in mitigating that risk, but it can fail.

How does your provider mitigate against disk failure within their data center?

Multi-Site Redundancy

In addition to risk of disk failure, there’s the risk that a data center experiences some catastrophe. Does your provider replicate your data across multiple data centers? They may store your files in an underground former bank vault with armed guards, but what if the vault takes on water or suffers a lightning strike? Can they withstand the loss of one data center, or even more than one, without losing your data?

Ongoing Integrity Monitoring

Unlike paper or film which degrade gracefully (yellowing and fading but still readable), magnetic media (disks and tapes) often fail catastrophically — one minute they’re readable and the next they’re not. Corruption happens. If you’re going to keep your data on disk, you should periodically verify the data’s integrity. Does your provider verify your backups on your behalf?

Provider’s Recovery Strategy

If an online backup provider loses a customer’s data, the only option is to start uploading the current files from the customer’s computer and hope the upload finishes before the customer suffers a disk failure or other form of data loss (e.g. customer inadvertently deleting an important file). Historical data are gone forever; the history of changes to your files can’t be recreated.

You Get What You Pay For

Most consumer-oriented online backup offerings are focused on price. Consumers would rather pay $5/month for “unlimited” backup. (Many providers limit things in one way or another by excluding certain file types or deleting old backups of external drives, but that’s another blog post). Customers get some sort of data protection, but it often comes with one or more of the risks described above.

Amazon S3 (“Simple Storage Service”) takes a different approach. It focuses on durability. S3 is:

  • Designed to provide 99.999999999% durability and 99.99% availability of objects over a given year.
  • Designed to sustain the concurrent loss of data in two facilities.

S3 is just a cloud storage system. It doesn’t come with software. That’s why I wrote Arq. Because it uses your S3 account for storage it’s a very reliable online backup solution.

Questions For Your Provider

Ask your online backup provider the following questions:

  • Where are my data stored?
  • How many data centers are my data redundantly stored at?
  • If you lose my data in one of your data centers, can you repair by retrieving it from another data center?
  • How many data centers can simultaneously lose some of my data without you permanently losing my data?
  • Do you regularly verify the integrity of my data and repair corruption using your redundant copies of my data?
  • What’s your durability design goal?

Then decide what price vs. redundancy trade-off is right for you.


A little less data loss in the world


December 13th, 2010

I’m passionate building a software business as an indie Mac developer, but I’m equally passionate about helping people protect themselves from data loss.

Back in February 2010 I ran several online backup applications through a test suite called Backup Bouncer, hoping it would increase awareness among users and attract enough attention to get the providers of those applications to fix the issues. The results weren’t good. Backblaze failed 19 of 20 tests, Mozy failed 16, Carbonite failed all 20, Dropbox failed 19 and CrashPlan failed 12.

On June 30 someone tweeted Crashplan with a link to the Backup Bouncer test result asking when they’d address the restore errors I had documented:

Screen shot 2010-12-13 at 10.23.13 AM.png

Crashplan replied that all the issues would be fixed in the next release.

Screen shot 2010-12-13 at 10.29.02 AM.png

Finally in early December they released a new version that passes all but 1 of the tests.

Data Safety for Everyone

I’m very happy that Crashplan have fixed those issues, and I like to think I helped in a small way to make that happen. Of course I think everyone should use Arq ;) but even if they use a different product no one should suffer from data loss.

Hopefully Mozy, Carbonite, Backblaze and Dropbox will fix their issues with restoring metadata as well.

How I recovered after an OS X reinstall


September 19th, 2010

The other day I reinstalled OS X. My computer had become extremely sluggish and I wanted to see if the performance would improve if I reformatted my hard disk and started over. Along the way I learned a few lessons about restoring using Arq. Here’s what I did:

Before Wiping Out My Data

Before I went through with it, I made sure I had all my data backed up. Arq had backed up the following:

  • ~/Library (excluding Logs and Caches)
  • ~/Documents
  • ~/Music
  • ~/Pictures/iPhoto Library (my photos)
  • ~/src (my source code)
  • /Applications
  • /Library/Application Support


I inserted the Snow Leopard installation disk, shut down the computer, and then started it holding down the Option key. I clicked on the DVD and the computer booted from it. I formatted the disk and installed OS X. I created a user with the same name as I was using before.

Next I downloaded and installed Arq. I launched Arq and entered the same S3 keys and encryption password I was using before.

Finally it was time to restore using Arq.

Initial Restore

Instead of waiting for absolutely everything to be restored from S3, I restored files in several steps.

Restoring ~/Library

The first step was to restore ~/Library from my “other computer” (the previous incarnation of my computer). I opened the triangle next to “Other Computers”, found my old computer, opened the triangle next to “Library” and selected the latest backup:


Then I clicked “Restore…” and Arq restored the Library folder to ~/Restored by Arq/Library (because a Library folder already existed).

When that restore was done, I closed all open applications, deleted the contents of ~/Library, and dragged everything from ~/Restored by Arq/Library to ~/Library.

Back in Business

At that point I could use Mail, iCal and Address Book. I selected a few applications in Applications backup folder and restored them as well.

I also wanted to sync my calendars with my iPhone, so I plugged it in and it sync’d. Later I’ll delete the iTunes files in ~/Music and replace them with the backed-up files.

Restoring Everything Else

Now that the computer felt “back to normal”, I restored my “src” folder (where all my work files are). Then I got back to work, restoring the really large folders (Documents, Music and Pictures) at my leisure over the next few days.


The multi-step restore approach was a big time-saver and got me up and running fairly quickly. The Library folder was relatively small (really small in fact, with the exception of Mail).

I learned that reformatting the hard drive helped a little with sluggishness, but the long-term fix is likely the purchase of an Optibay and an SSD.

I also learned that restoring this way is fairly complicated. So I’m thinking about how to make a product that would restore more seamlessly while also allowing people to get back to work before absolutely everything is restored. There’ll be more to come on that.

Deleting other computers’ backups


September 18th, 2010

If you’ve transferred your work to a new computer and don’t need the old computer’s backups in your S3 account anymore, you’ll need to delete them. Arq does not currently provide a mechanism for deleting those backups, but you can delete them through the AWS Management Console. Here’s how to do that:

First, open the AWS Management Console (

Next, select the bucket that Arq uses for its backups (named “.com.haystacksoftware.arq”).

Now you’ll have to determine the computer UUID that you want to delete. To do this, look at the computerinfo file within each one:

  1. double-click on a computer UUID
  2. control-click on the file computerinfo and pick “Download”
  3. open the downloaded file with TextEdit
  4. if the “computer name” matches the one you want to delete, you’ve found the right computer UUID.

Here’s an example “computerinfo” file:

<plist version="1.0">
        <string>Stefan Reitshamer’s MacBook Pro</string>

In that example, the computer name is “Stefan Reitshamer’s MacBook Pro”.

Now that you’ve found the right computer UUID to delete, go back and select the bucket itself to see all the computer UUIDs again. Then control-click on the computer UUID you want to delete, and pick “Delete” from the pop-up menu. AWS Management Console will delete all the objects for that computer UUID.

WARNING: This delete cannot be undone! Please be careful when deleting.