Questions about Linked Data (in libraries)

In the library world, there is a lot of buzz about “Linked Data” and “BIBFRAME”, but I haven’t really found much information about practical uses for either.

There’s information about BIBFRAME (http://www.loc.gov/bibframe/faqs/) and people who are looking at implementing it (http://www.loc.gov/bibframe/implementation/register.html), but it all seems rather vague.

The Oslo Public Library and the National Library of Sweden are both working on new library software systems that will rely on data stored in RDF, but those are still in development and there’s no concise summary of their efforts that I’ve found yet.

While I haven’t explored it extensively, DSpace 5.x provides methods for converting its internal metadata into RDF which is stored in a triple store and made accessible via SPARQL endpoints. Here’s an example record: http://demo.dspace.org/data/handle/10673/6/xml?text. However, that record seems quite basic. If you look at its links, they’re to real files or to HTML pages. They’re not linked to other records or resources described in RDF.

So, it seems to me that it’s straightforward to publish data as Linked Data. You just serialize it using RDF. You store it in a triple store, and you provide a SPARQL endpoint. Your data is open, accessible, linkable.

However, how do you actually make “use” of Linked Data in a way that is useful to humans?

In the case of the DSpace record, it doesn’t contain many links, so you could just run it through a XSLT and get usable human readable information. However, look at a BIBFRAME record like this one: http://bibframe.org/resources/sample-lc-1/148862. It has a lot of links. How could this be made usable to a human?

My first guess is that you have a triple store with your Linked Data, and then you have some other sort of storage system to contain the dereferenced data. That is, once in a while, your server follows all of those links and stitches together a cached human readable copy of the record, which is then shown to the human. I can’t imagine this is done for every single web request as that it would be a lot of work for the server…

Plus, how would you index Linked Data? You need to store human readable/writeable terms in your indexes so that you can retrieve the correct record(s) for which they’re searching. Do you index the cached human readable copy that you generate periodically?

In the case of the BIBFRAME record, let’s say that the Library of Congress is storing that linked data record, dereferencing it, indexing it, and showing it to users. What happens when someone else wants to use that BIBFRAME record to describe a resource that they have stored locally at their library? Do they download it using OAI-PMH, store the reference copy, dereference it in order to index it and show it to users, and maybe add a local item record in their own library system?

Linked Data seems strange for libraries. Let’s look at the dbpedia entry for Tim Berners-Lee: http://dbpedia.org/page/Tim_Berners-Lee. In this case, dbpedia isn’t claiming to store Tim Berners-Lee somewhere. They just have a metadata record describing him. dbpedia serves as a central repository for records.

So with libraries… wouldn’t it make sense for the Library of Congress (or some other significant library entity) to be a central repository of records, and libraries themselves would only need to keep simple records that point to these central repositories? I suppose it couldn’t be that simple as not all central repositories have #alltherecords, so you might need to do original cataloguing and in that case you’re creating your own Linked Data record… although why would it be significant for you to create a Linked Data record at some small library in the middle of nowhere?

Also, when you download BIBFRAME records from the Library of Congress, they would wind up being re-published via your SPARQL endpoints, no? Or, when downloading the record, would you re-write the main IRI for the record to be an IRI for your particular library and its database? Otherwise, aren’t you just hosting a copy of the original record? What happens if someone links to your copy of the BIBFRAME record… and you’re updating your copy from the original Library of Congress BIBFRAME record? Doesn’t that set up a really inefficient and silly chain of links?

I think that’s most of my questions about Linked Data… it mostly boils down to:

1. While Linked Data records are machine readable, how do you dereference them to create human readable and indexable records?

2. How do you copy catalogue with Linked Data? All the links are pointing to the original, do you need to re-write these links to point to your local storage? Or are you just downloading a copy, dereferencing to index and show humans, and then adding local data separately to refer to physical items?

Forcing a stack trace in Perl

Have you ever tried to debug someone else’s Perl code and found it frustrating when it dies and the only error message is for a general function in a module that isn’t explicitly referenced at all in the code?

Do you curse the name of the person who wrote that code and wish that they had used Carp.pm, so that you could have something… anything… on which to base your debugging efforts?

Well, good news! You can trap the DIE signal and use Carp::confess() to force a stack trace. Now you can trace that “die” back to the actual code at hand, so you don’t have to waste time dumping variables, printing your own error messages at different points of logic, or slowly tracing the class lineage of your objects back to that low level API in some XS code which isn’t stored on CPAN or your local system anyway.

Here’s the code I added to the script I was debugging:

use Carp;
$SIG{ __DIE__ } = sub { Carp::confess( @_ ) };

That’s it. I got my stack trace, and I was able to quickly find the problematic line of code and start troubleshooting the actual problem.

While the problem was what I had suspected from the moment I got the first support call, this way I had evidence to prove it and a suggestion for how the original developer could improve their work.

Using multiple network adapters on Windows

I imagine many of us don’t think too much about networking our computers. We care that we have Internet access, that we can connect our phone to the WiFi, and maybe that we can access files on a NAS. But after that, we sort of bury our heads in the sand a bit.

So my work station has built-in network adapters for Ethernet and wireless access. For the most part, I just use the wired Ethernet connection. It’s fast and it connects to a gateway that gets me where I need to go. I rarely use the WiFi, but sometimes it’s useful for connecting to other devices wirelessly. Sometimes, I have active connections over both adapters.

That seems a bit weird though… how does that work? Surely I just need one connection to the Internet, yeah? How does my computer know which connection to use?

Well, the default is to use the Ethernet adapter. If I do “tracert koha-community.org” on my Windows Command Prompt, I’ll notice the first hop is through the gateway I connect to via the Ethernet adapter. Every time.

But if I want to SSH into another device on the wireless network, I can do that too, because my computer’s network routing table has an entry that says all requests to a certain IP address range should go through the wireless gateway instead of the Ethernet gateway.

That’s pretty cool.

Well, sometimes, I need to use a third network adapter. There’s a 3G modem I use to connect to another network. Unfortunately, I’ve noticed that it doesn’t add an entry into my routing table that tells my computer to send requests to a certain IP address range to it instead of the default gateway. If I type in “route print”, I see that the Ethernet and wireless interfaces handle destinations that match their own IP address, but nothing for this modem. I’m not sure why, but it doesn’t matter too much, because the address I want to reach wouldn’t be in that typical IP address range anyway.

The problem is that if I try to connect to the IP address I want, it’ll go through my Ethernet adapter, and it won’t get where it needs to go. So I need to add a route to tell my computer to send requests for that IP address through the 3G modem.

So I look up the interface number for the modem using the following command:

“netsh int ipv4 show interfaces”

I then type out something like:

“route add XX.XX.XXX.X mask 255.255.255.0 YY.YYY.Y.YY if Z”

In this case, the Xs are the destination range, and the Ys are the gateway I want to use. Z is the interface number that I found using that command starting with “netsh”.

And that’s it! Now when I try to reach XX.XX.XXX.X in my browser or through a SSH client, it connects using the 3G modem. However, if I try to SSH into someone else’s server or visit http://koha-community.org in my web browser, it’ll use the default Ethernet adapter.

Anyway, there’s way better resources out there which explain everything, but this is just a little reminder to myself about how to do this (on Windows) and why it might be necessary.

Edit: If you want this route to persist after rebooting, you can add the “-p” flag between “route” and “add”.