Online Technology Tutorials: Friend and Foe

I was working today on installing DSpace 4.x on Amazon Web Services using the official DSpace installation documentation alongside some of my own notes, which were informed by some online tutorials.

Long story short:

Online tutorials often do not provide the best instructions.

They often provide functional instructions, which is how one might get marked as a solution on a forum, but it’s worthwhile to take them apart, analyze the code, and do some experiments.

In the case of DSpace, I chatted to some actual PostgreSQL developers on IRC and read their official docs. That made me actually understand what I was doing with pg_hba.conf, so I could do it the right way and not a potentially dangerous way (mentioned in a tutorial) that got me my desired result at the expense of system security.

I poked around in the configuration and guts of Tomcat7 and Maven 3, and now I understand that pointing to the desired version of Java in the alternatives system isn’t always going to do the trick. You might need to edit the application’s configuration files or export an environmental variable.

I suppose my take away from all of this is…tutorials can be useful when you’re initially trying to get something to “work”, but you should probably take the time to figure out what each instruction means before trying to set up a production system.

I know that might sound obvious, but I’m sure it’s more often the exception than the norm.






GPG: Public Key Cryptography

I’m not really sure how to start talking about cryptography. We’ve heard the word for years, especially in the past year in responses to PRISM/NSA surveillance.

A lot of people have a lot of different ideas about whether or not we need cryptography. Some examples include:

  • I have nothing to hide, so I don’t need it.
  • I have nothing to hide, but I don’t want people snooping.
  • I have all the things to hide, and I don’t want anyone knowing about them.
  • I would like to be able to verify the originator of a document.
  • Crypto-what?

Personally, I find cryptography interesting for myriad reasons including the fact that I’m just fascinated by locks.

Professionally, librarians and archivists are in the business of sharing accurate and authentic information, and this can help. By applying a digital signature to a document with your private key, you’re able to endorse that document as having come from you. People who have your public key are then able to confirm that the document has come from you and hasn’t been tampered with in any way.

In terms of encrypting information so that only designated individuals can read it, this can come in handy when you’re sharing private information or the network over which you’re communicating is owned/operated by someone who has a vested interest in spying on or preventing access to information.

We use encryption every day when we log in to our email accounts or buy things online. You don’t want just anyone seeing your passwords, credit card information, medical information, etc. It’s best to keep it safe so that only you and nominated individuals can see your data.

Now, GPG can be overwhelming for novice users, but I would recommend taking a read through “The GNU Privacy Handbook”. It explains how to use GPG and explains some of the broader cryptographic concepts.

You might find it useful to consult Wikipedia and internet forums for other information to support what you read here.

Personally, I’m excited at the prospect of being able to verify the authenticity of digitally signed software and documents! Perhaps I’ll send a few encrypted emails from time to time, but I think the signatures perhaps have some of the most obvious and convenient value.

Sharing SSH Keys between Cygwin and PuTTY

I’m rather pleased about this one.

I have a Windows laptop and a Linux server. I prefer using Linux and I have all sorts of server tasks to do, but I might not always want to do them at the actual work station. So, I use the SSH protocol to remotely work on my Linux server.

In Windows, I accomplish this using PuTTY (a small terminal emulator/SSH client) or Cygwin (a Linux-like environment for Windows).

PuTTY directly remotes into the server, while Cygwin provides a Linux-like interface in Windows (i.e. it gives me lots of nice Linux packages like rsync, grep, vim, openssh, etc) that can also remote in. The handy thing about Cygwin is that you can do Linux-like work on your Windows computer.

Anyway, as we know, when we’re SSHing into another server, we don’t want to use a password. We want to use SSH keys, because they’re way more difficult to brute force than a regular old password. That said, we also want to add a passphrase for decrypting our SSH private key so that an attacker doesn’t gain access to your server just by possessing your private key.

But…we don’t want to type this passphrase all the time, so we use a SSH key agent like Pageant to store our keys for us. We type the passphrase in once and then Pageant handles all the requests for the SSH private key for the rest of our OS session.

PuTTY and Pageant work perfectly together (, especially when we automatically load up our keys upon login to Windows (

However, Cygwin needs some extra help. Fortunately, there is a Cygwin add-on for that called ssh-pageant ( Download the binary file, copy it into Cygwin, add a line of code to your .bashrc file, and now Cygwin can share the keys stored in Pageant!

Enter your passphrase upon login, and then you have easy, yet secure, access to your server via SSH using PuTTY or Cygwin!

P.S. In my case, I’m thinking of scripting an automated rsync backup between my laptop and my server using Cygwin (actually, I already have a bash script that I can run manually to do this). Most tutorials suggest using a private key without a password, but I don’t like that idea. So, I’m thinking that I’ll load Pageant at login, enter my passphrase, then hopefully either trigger an event or have a script waiting to start the bash script that initiates the rsync backup.

That might sound convoluted but it’s not really. It’s also free, cross-platform, and flexible.

How many of us actually perform backups? We all say it’s a good idea but how often do you think “I’ll do it eventually”? Why not do it now?

I figure there are 3 reasons to have a home server:

1) Central file storage/file serving/file sharing (Store all your important files, especially files that don’t often change such as music, photos, video, etc, in a central location. Store once, access many times from different devices.)

2) Backups (You’re not going to store everything on your server. You’re going to have a certain amount of content that lives on your laptop, or other mobile device. By keeping a synced backup on your server, you ensure that you keep your data even if you have a hardware failure, lose your device, etc.)

3) Remote access (Maybe you need a copy of a file from home, or you want to take care of something on your server. Remote in and do the work!)

Of course, you could argue the relative merits of having a home server versus a server on the cloud versus a cloud-based service, but I’ll save that for another time.

For now…I’m consolidating my media on a central server, syncing my mobile devices (i.e. keeping mirrored backups), and managing both of those via remote access (although only internally. No holes in my firewall to the outside world…yet.).



Regular Expressions: Genius/Idiot Dichotomy

Today, I was faced with a little bug that I introduced into some code nearly a month ago, while fixing another bug.

I had enabled use of regular expressions for a Javascript DataTables fnFilter API method.

Originally, I thought I was being quite clever, when I wrapped the string to search for (i.e. the “needle”) in start and end string anchors (e.g. ^needle$). It fixed a problem I was having when I searched for “D” but got results for “D” and “DD”. Done and dusted.

Or so I thought.

A few weeks later, people aren’t finding any results.

That’s because while my solution worked great for times when the string to search (i.e. the “haystack”) was a single value (e.g. “D” or “DD”), it didn’t work so great when the haystack string was “A<br/>B<br/>D”.

Suddenly, I felt like an idiot.

How had I missed that? It was so obvious, in hindsight, that my original solution wasn’t going to cut it. So, I started trying out a few different ideas, but I wasn’t entirely sure how DataTables was filtering the data, so it was more guesswork than I would’ve liked.

Then, I realized that my original solution was all right, if I added another option. I should look for the start and end of the string, OR an angle bracket. So, “^needle$” became” (^|>)needle(<|$)”. This meant that I was able to find “needle<br/>”, “<br/>needle”, “needle”, or “<br/>needle</br>”.

Perfection! Genius! That fixed it! I tested it with lots of different examples and it worked perfectly!

Amazing how just a few characters can make the difference between feeling like an idiot and feeling like a genius.

(PS: Of course, I’m neither a genius or an idiot, but probably somewhere in between. Alas, it seems to me that programming is often about dancing between these two extremes.)

(PPS: Regular expressions are actually a lot of fun. I think the temptation is to avoid them because they can be difficult to understand, but they’re incredibly powerful and a valuable tool to anyone who needs to manipulate data.)