Questions about Linked Data (in libraries)

In the library world, there is a lot of buzz about “Linked Data” and “BIBFRAME”, but I haven’t really found much information about practical uses for either.

There’s information about BIBFRAME (http://www.loc.gov/bibframe/faqs/) and people who are looking at implementing it (http://www.loc.gov/bibframe/implementation/register.html), but it all seems rather vague.

The Oslo Public Library and the National Library of Sweden are both working on new library software systems that will rely on data stored in RDF, but those are still in development and there’s no concise summary of their efforts that I’ve found yet.

While I haven’t explored it extensively, DSpace 5.x provides methods for converting its internal metadata into RDF which is stored in a triple store and made accessible via SPARQL endpoints. Here’s an example record: http://demo.dspace.org/data/handle/10673/6/xml?text. However, that record seems quite basic. If you look at its links, they’re to real files or to HTML pages. They’re not linked to other records or resources described in RDF.

So, it seems to me that it’s straightforward to publish data as Linked Data. You just serialize it using RDF. You store it in a triple store, and you provide a SPARQL endpoint. Your data is open, accessible, linkable.

However, how do you actually make “use” of Linked Data in a way that is useful to humans?

In the case of the DSpace record, it doesn’t contain many links, so you could just run it through a XSLT and get usable human readable information. However, look at a BIBFRAME record like this one: http://bibframe.org/resources/sample-lc-1/148862. It has a lot of links. How could this be made usable to a human?

My first guess is that you have a triple store with your Linked Data, and then you have some other sort of storage system to contain the dereferenced data. That is, once in a while, your server follows all of those links and stitches together a cached human readable copy of the record, which is then shown to the human. I can’t imagine this is done for every single web request as that it would be a lot of work for the server…

Plus, how would you index Linked Data? You need to store human readable/writeable terms in your indexes so that you can retrieve the correct record(s) for which they’re searching. Do you index the cached human readable copy that you generate periodically?

In the case of the BIBFRAME record, let’s say that the Library of Congress is storing that linked data record, dereferencing it, indexing it, and showing it to users. What happens when someone else wants to use that BIBFRAME record to describe a resource that they have stored locally at their library? Do they download it using OAI-PMH, store the reference copy, dereference it in order to index it and show it to users, and maybe add a local item record in their own library system?

Linked Data seems strange for libraries. Let’s look at the dbpedia entry for Tim Berners-Lee: http://dbpedia.org/page/Tim_Berners-Lee. In this case, dbpedia isn’t claiming to store Tim Berners-Lee somewhere. They just have a metadata record describing him. dbpedia serves as a central repository for records.

So with libraries… wouldn’t it make sense for the Library of Congress (or some other significant library entity) to be a central repository of records, and libraries themselves would only need to keep simple records that point to these central repositories? I suppose it couldn’t be that simple as not all central repositories have #alltherecords, so you might need to do original cataloguing and in that case you’re creating your own Linked Data record… although why would it be significant for you to create a Linked Data record at some small library in the middle of nowhere?

Also, when you download BIBFRAME records from the Library of Congress, they would wind up being re-published via your SPARQL endpoints, no? Or, when downloading the record, would you re-write the main IRI for the record to be an IRI for your particular library and its database? Otherwise, aren’t you just hosting a copy of the original record? What happens if someone links to your copy of the BIBFRAME record… and you’re updating your copy from the original Library of Congress BIBFRAME record? Doesn’t that set up a really inefficient and silly chain of links?

I think that’s most of my questions about Linked Data… it mostly boils down to:

1. While Linked Data records are machine readable, how do you dereference them to create human readable and indexable records?

2. How do you copy catalogue with Linked Data? All the links are pointing to the original, do you need to re-write these links to point to your local storage? Or are you just downloading a copy, dereferencing to index and show humans, and then adding local data separately to refer to physical items?