Backups: Deja-Dup and Obnam

At this point, everyone in my social circle knows that I’m obsessed with backups. I value my data, and I want to know that I have another copy available to me in the event of theft, fire, flood, disk failure, etc. I also value my privacy as well as the integrity and authenticity of my data. In other words, I want a comprehensive encrypted backup of my digital assets.

Over the past 2 years, I’ve been using Deja-Dup (a GUI front-end to duplicity) to create automatic encrypted backups of my user data, and it’s been great. It’s easy to configure what files to include and exclude. It uses GPG to provide symmetric encryption, and optionally stores the passphrase in the Gnome keyring which is stored in an encrypted format and only decrypted when you login to your account. It also runs as a daemon in the background, so it can initiate backups automatically without manual intervention or user cronjobs.

My only complaint about Deja-Dup is that it’s tied to a single location. Over the past few months, I’ve wanted to rotate external hard drives, but Deja-Dup doesn’t have a mechanism to seamlessly allow this. While I’ve been communicating with the maintainer of the program, I’m not entirely sure that Deja-Dup will meet my current use case. While I plan to keep using Deja-Dup as one of my tools for local backups, I think I’ll have to use something else for my rotating backups.

I started interviewing friends and colleagues about their backup strategies, and that’s how I learned of Lars Wirzenius’s tool “Obnam”. It doesn’t have a GUI but I adore the command-line, so it’s really all the same to me. It’s easy to configure using INI style files. It uses GPG to provide assymetric encryption (while this requires a GPG keypair and not just a passphrase, it’s less susceptible to brute force attacks than symmetric encryption). In other words, it’s comprehensive and encrypted.

My only complaint after looking into it was that you couldn’t automate it as easily as Deja-Dup. With Obnam, you could set up a cronjob to routinely backup your data. However, that would require you to use a GPG key without a passphrase, which is suboptimal. If someone got their hands on your GPG key and your encrypted backup, they’d have unfettered access to your data. Of course, you could argue that if they have your GPG key, they might already have access to your unencrypted data and not *need* your backups. Alternatively, you could probably use a GPG key with a passphrase, and just store that passphrase in a file and feed that into your cronjob. However, it’s the same problem. You have the keys to the kingdom written down, which makes it that much less secure. Of course, if you’re willing to sacrifice some security for convenience, then why not?

It’s a tough one. As one interviewee mentioned, you have to consider your threat model. Lars also talks about this in the Obnam manual: http://code.liw.fi/obnam/manual/obnam-manual.en.html#backup-strategies

In my mind, there are a few scenarios:

1) Access to your unencrypted data on your computer = insecure
2) Access to an encrypted backup and a GPG key without a passphrase = insecure
3) Access to an encrypted backup and a GPG key with a recorded passphrase = insecure
4) Access to an encrypted backup and no GPG key = secure
5) Access to an encrypted backup and GPG key with passphrase = (reasonably) secure

In the first case, the attacker already has access to your system, so any speculation about backups is moot.

In the latter cases, the attacker has gained access to your encrypted backup either by obtaining a physical copy (stealing a physical disk) or illegitimately accessing a backup server containing the encrypted data. As for how they have obtained a copy of your key, there are a few ways. Perhaps there was a copy of your key on the backup server. Perhaps they’ve found a backup of your key (although you should protect this with encryption as well). Suffice it to say, it’s possible that they have your GPG key and your encrypted backup. However, if you have a key with a passphrase and you haven’t written that down, you’re as secure as can be.

When I asked my interviewees (including Lars Wirzenius himself) about how they use Obnam, they stated that they often didn’t use automated backups with Obnam. They used automated alarms and other methods to remind themselves to run their Obnam backup scripts using their passphrase protected GPG keys.
I think there’s some merit to this idea. First, as I’ve mentioned, it’s the most secure method. If your passphrase is just in your head, then it doesn’t matter if someone gets your private key and your encrypted backup. Second, it makes you much more active in backing up your data. While humans are more prone to forget manual backups than a computer is to forget an automatic backup, it’s useful to develop a habit of consciously backing up your data. This mindset translates well to the work place and to other devices which you might not be backing up as you should.

In any case, I now have Obnam working using a GPG keypair with a strong passphrase. While I haven’t decided how I want to prompt myself to remember to initiate manual backups, I’m sure I’ll think of something.

Current ideas include:
1) Automated alerts via a calendar
2) A highly visible launcher on the desktop (or perhaps on the sidebar in Unity once I transition fully to Ubuntu)
3) A GUI pop-up reminder as part of a login or logout script
4a) A daemon or cronjob that runs automatically but requires manual intervention via gpg-agent to obtain the passphrase
4b) A daemon or cronjob that runs automatically which uses gpg-agent and another mechanism to provide the passphrase which has been saved but stored in an encrypted format

In the end, I think it all comes down to convenience vs security. In any case, I’m quite liking Obnam so far. If you’re thinking about how to do your backups better on Debian or Ubuntu, you should give it a try!

Expanding the Programming Repertoire

Lately, I’ve been looking into Deja Dup, duplicity, the Gnome Keyring, and writing my own scripts to manage backups.

Deja Dup is written in Vala, duplicity is written in Python, the Gnome Keyring has a C API, and my own scripts… well… it would be most convenient to have them written as shell scripts.

When I first started programming, I took a 12 hour class in PHP, and experimented with that for a while. When I got my first systems librarian job, I wrote tools using PHP, and I started working with Perl. Now, I write much less PHP and much more Perl (also more HTML/CSS and Javascript). On occasion, I find myself reading Java and C programs just to double-check what exactly is going on under the hood of applications like DSpace and Zebra.

So… at work I mostly use PHP, Perl, and Javascript. Oh, and shell scripts. Can’t forget those.

At home… I don’t really program. If I do, it’s a shell script here or there.

So it’s been fun playing around a bit writing programs in Python and C. I’ve only read in Vala so far, although I’ve thought about contributing/forking Deja Dup, so you never know.

At the moment, I figure I’ll either write my backup scripts in Python, or create a tool in C to lookup a password in the Gnome Keyring and supply that to a shell script which runs duplicity. Honestly, the latter option sounds preferable. I suppose I could also write a tool in Python and rely on the Python bindings for the Gnome Keyring library. Either option is fine with me really. Neither language is common to me, so I’m equally happy playing with one or the other (or both as I have so far).

Using a shell script seems the way to go, as it’ll seem easiest to interact with the operating system itself that way. I suppose a person could use “system” or other functions to run “duplicity” and “mountpoint”. It just often seems that shell scripts are the easiest. Plus, if I create a separate tool in Python or C, and then utilize that in my shell script, I can always utilize that tool later for another script.

Hmm… PHP, Perl, Javascript, Python, Java, C, Vala. I suppose that is a growing repertoire of programming languages. I suppose the next thing I should play with is Ruby. It doesn’t seem especially useful, but it’s popular enough. I did a script or two with it a few years ago, but it didn’t really stand out for me. Node.js is another one that I want to play around with. I keep hearing more and more about it…

I suppose it’s a useful thing to know a little about a lot. Add Scala to the above list and that’s quite a few of the languages being used today. You hear occasionally about Erlang and Go, I suppose. But like I was saying… I don’t really program much at home. I figure most of my home computing needs can be met using existing programs. There are all sorts of tools for networking, security, media. Really it’s just backups that I’m not terribly happy about at the moment.

Of course, easier said than done when it comes to fixing it up the way I want. After talking to Michael Terry, the maintainer for Deja Dup, I realize that simplicity is often actually complexity. Even now when I’m thinking about writing up my own scripts, I think perhaps it’ll be more complex than I think.

How much do I hard-code into my script? If I don’t hardcode the data, where do I store it? Gsettings? A config file? How do I parse the config file? If I use C, I could use the Glib key-value parser. If I use Python, I could use the ConfigParser. Shell scripts… not so good for something like that. Then again, do I really need to abstract things that much? Probably not. But if I do make it more abstract, I could perhaps post it on Github or perhaps package it someday to be included in a distro… so then I don’t have to necessarily manage it myself (even if I were still the maintainer of the code).

Hmmm… things to think about… and things that I’ll be talking about in another post soon.