OAI-PMH (The Open Archives Initiative Protocol for Metadata Harvesting) and Koha

For those of you who don’t know, OAI-PMH is the Open Archives Initiative Protocol for Metadata Harvesting. In other words, it’s a “application-independent interoperability framework” used for harvesting and disseminating metadata.

For more information, see this link:

http://www.openarchives.org/OAI/openarchivesprotocol.html

 

While Koha has for years had the capacity to act as a OAI-PMH server (i.e. it’s been able to parse OAI-PMH requests and serve back records in MARC and DC formats), it hasn’t had a mechanism for harvesting records. However, with my latest project, I’m hoping to change that.

At Prosentient Systems, we often host Koha (the LMS) alongside Dspace (the digital repository), and interconnect them using home-grown scripts. Since these scripts are home-grown, they are not easy to share nor are they easy for non-technical people to implement. As a result, we’ve been looking at better ways of connecting the two systems, and OAI-PMH seems to be the solution.

I’m told that Dspace can act both as a OAI-PMH harvester and discriminator, so we would be sending OAI-PMH requests to Dspace, transforming them using XSLT from Dublin Core to MARC21, then importing them into Koha using established mechanisms.

That’s the plan!

At the moment, I’m still in the proof-of-concept stages, but I have set up a harvester (using the Perl module HTTP::OAI::Harvester), which retrieves records from Koha in Dublin Core format and prints them out.

That might not sound that impressive, but it took some time to become familiar with the HTTP::OAI modules and to realize that the example code on CPAN wasn’t 100% accurate. After some trial and error, I was able to get down to the XML::LibXML::Document object that I wanted, so that I could print that out as a string of Dublin Core XML.

Next steps?

Store the harvesting configuration (e.g. baseurl, last harvested date, sets, metadataPrefix, etc.) in the Koha MySQL database, store the Dublin Core XML in the database, and ensure that the harvesting script could run successfully as a cronjob.

After that, I will focus on writing the best Dublin Core => MARC21 XSLT ever, so that the most accurately translated records will make it into Koha from the original datasource (in Prosentient’s case: the client’s Dspace).

I’m really excited about this project. I think it will help with linking Koha to external data providers. Client owned digital repositories are one thing, but I noticed that the National Library of Medicine in the USA also has a OAI-PMH server for PubMed (http://www.pubmedcentral.nih.gov/oai/oai.cgi). More information can be found here: http://www.ncbi.nlm.nih.gov/pmc/tools/oai/.

Wouldn’t it be nice if your Koha users could search through PubMed records from within Koha, so that they could see what you have both locally and available electronically through PubMed?

Admittedly, having local copies of remote records might not always be ideal, if only for the fact that it could increase the size of your database substantially. However, aren’t we taught that data redundancy is a good thing? Updated records in a OAI-PMH repository (I just noticed that repository is the official term for a OAI-PMH server) can be detected and the local version can be updated as a result. There is also a “deleted” record status, so you can remove deleted records that might not point to an actual resource anymore.

In any case, I think that this should be a fun and educational project for myself, and I’m sure that lots of people will find a use for this in the future!

Cronjobs in Debian

When I first started developing and managing Koha, the most mysterious things in my mind were cronjobs and Zebra indexing.

Well, I’ve learned a lot about Zebra indexing over the past year. I think I’ve already shared quite a bit on this blog and I’m sure there is more to come. Maybe I’ll even re-visit some old entries to give better explanations sometime.

However, right now I’d like to talk about cronjobs. Cronjobs are basically commands that are scheduled to occur at specific times. They can look scary: 15 17 * * * COMMAND.

That would translate as: “Run this ‘COMMAND’ every day of every month at 5:15pm.

Here is a graphic I stole from http://www.debian-administration.org/articles/56:

 

*     *     *     *     *  Command to be executed
-     -     -     -     -
|     |     |     |     |
|     |     |     |     +----- Day of week (0-7)
|     |     |     +------- Month (1 - 12)
|     |     +--------- Day of month (1 - 31)
|     +----------- Hour (0 - 23)
+------------- Min (0 - 59)

Another useful site I stumbled across was this: http://forums.hostsearch.com/showthread.php?2693-Crontab-explained

Now...if you use the koha-common packages for Koha...you should never ever have to touch anything related to cron or think about cronjobs. They'll just happen automagically (http://en.wiktionary.org/wiki/automagical). 

This is just for folks who either don't use the koha-common packages, or for those who just want to know how cron works.

In Debian, you'll notice that there are other areas than the crontab that can be used for scheduling cronjobs. These are cron.d, cron.daily, cron.hourly, cron.weekly, cron.monthly (http://www.debian-administration.org/articles/56). In all cases except cron.d, any scripts that you put in these directories will be executed daily, hourly, weekly, or monthly (the exact times being determined in the crontab). cron.d is there for your more specific cronjobs. For instance, maybe you want COMMAND X to run every 5 minutes. You would set up a special script (like koha-common or anacron) and type in: "*/5 * * * * COMMAND X"

Easy as pie! Easier, in fact.

There’s certainly more to cron than what I’ve laid out (you can get more sophisticated), but that’s what those other links are for ;).

Open Source Software for Libraries: Experiments

Since I first arrived at Prosentient Systems in January 2012, I’ve been working on and experimenting with open source software for libraries. Initially, I was somewhat hesitant, since my background is in English, French, and Library and Information Studies. However, the more I’ve learned and experimented, the more I’ve wanted to continue working with open source software!

To that end, I recently purchased a new desktop computer (my first since 2003). With a 2.6ghz and 4gb of RAM, it’s reasonably fast enough to do whatever I might want to do.

Step 1: Install Debian (i.e. Linux)

DONE!
(This involved installing the base system, adding myself as a sudo user, getting the sound working [I like working to music] by manually editing the alsa-base.conf to detect my integrated Intel sound card, installing Chromium as a browser, setting up some firewall software, and installing vim and vim-runtime [as Debian only comes with vim-common and vim-tiny files which aren’t quite enough for my text editing needs].)

(I’ve also thought about setting up an SSH server so I can also remote into my Linux box from my Windows netbook using PuTTY, but…I’m willing to put that off for the time being.)

Step 2: Install Koha using the community-generated Debian packages

IN PROGRESS

(Well, not quite. It’s 9:11pm, so it’s Doctor Who time. However, later in the week, I’m going to give it a go using the instructions available at koha-community.org.

Since I’m already a fairly active Koha developer, I’m pretty familiar with the code, and I’ve already done quite a bit of troubleshooting, so I’m not really worried about this. I can handle MySql (the database). I can handle Zebra (the indexing engine). Admittedly, to date, I’ve only done an assisted standard install and an assisted dev install (since I didn’t have root access to the server). However, I’m pretty confident that I can get Koha up and running. Actually, after I do a standard install via the packages, I might set up a regular dev install and a standard install (from the Git clone)

I might also try installing from a downloaded Tarball, as well as setting up a dev install using “Koha Gitify”.

There are lots of different ways of obtaining Koha code and setting up an instance, and I want to try them all!)

So thinking again about the knowledge that I might need to set up Koha…I like to think again of the LAMP acronym.

L->Linux (I’ve installed Debian, and I’m reasonably proficient using the command line)
A->Apache (I see how Koha uses apache, I’ve gone through the files, and I’ve set up my own web server in the past using apache, so…it might take a few tries, but it’ll be all right)
M->Mysql (it’s a relational database. While I typically interact with it using a GUI, I’m sure I can handle it from the command-line as well. The GUI might be a bit faster and provide easier scrollback, but I can also install one if I really want to.)
P->Perl (While I still want to improve my design skills, I have yet to meet a Perl script/module in Koha that I haven’t been able to understand [thanks to my own persistence and the generous help of some very skilled and amicable Koha community developers]. Given time, all the code in Koha is understandable.)

I doubt that everyone setting up Koha is going to need to have an in-depth knowledge in all these areas, but I’m sure it helps. Hopefully, it will mean I don’t have to hassle people in #koha too much ;).

Step 3: Install DSpace

EVENTUALLY

(While I have quite a bit of experience modifying the DSpace JSPUI and some of its Java classes, I don’t have extensive experience writing Java, compiling Java, working with Postgresql, or troubleshooting DSpace. So…this might be a project I leave for a little while. I’m keen, but I’m more active in the Koha community and I find that Koha is much more relevant to the majority of libraries than DSpace.)

Step 4: Who knows?

I’m thinking of trying out lots of different systems. Here is a list that I’m pondering:

1) Archivematica (originating from Vancouver, BC – it is used for digital preservation)
2) VuFind (a PHP-based discovery layer for library applications)
3) WordPress (as a Content Management System)
4) Evergreen (the open source library management system/integrated library system)
5) Drupal (the CMS)
6) Islandora (digital library/archive)
7) Fedora Repository (digital library/archive)
8) Greenstone (digital library/archive)
9) Kete (digital library/wiki?)

Does anyone have any ideas about other open source software for libraries that I haven’t mentioned and that might be worth trying out?

While I only have Linux on this machine, I could create another partition for Windows XP (I still have my old desktop install disk laying around) or I could set up a Windows XP VM in Virtual Box. So…send me a message, post a comment, or give me a shout and let me know anything else I should try.

For now…Doctor Who Series 7 Finale!

Cataloguing Series…MARC 490 and MARC 830

When cataloguing an item from a series, are you ever confused by why you need to put the “series title” in both the MARC 490 and MARC 830 fields?

I remember being told to “just do it” by teachers and library staff, but in actuality…these fields have very different purposes. I was reminded of this today when someone asked if Koha supported MARC Authorities for Series. At first, I was confounded. Authority records for series? That seemed bizarre. Then I realized…they meant does Koha support MARC Authorities for uniform titles…

Here is the general info from the Library of Congress:

490 – Series Statement (R)

http://www.loc.gov/marc/bibliographic/bd490.html

830 – Series Added Entry-Uniform Title (R)

http://www.loc.gov/marc/bibliographic/bd830.html

 

The key to it all resides in a footnote on the 490 page:

Indicator 1 – Series tracing policy
1 – Series traced [REDEFINED, 2008]
Prior to 2009, series for which the transcribed form and the traced form were the same were in field 440, and field 490 was not used. If the transcribed form and the traced form were different, the transcribed form was in field 490 and Indicator 1 had value “1” (Series traced differently) The traced form was in an 8XX field. Beginning in 2009, field 440 is not used and the transcribed form of the series name is in field 490 with the traced form in 8XX, even if the names are the same.

In other words, if you have an authority record for the uniform title of a series, the title from that authority record (MARC 130 – Heading-Uniform Title (NR) in the authority record) will go in the 830 field, and the transcribed form (i.e. the series title that appears on the title page/source of information) will go in the 490 field. You’ll then change the first indicator for 490 to 1 (as mentioned above) and you’re all good!

Error Messages in Koha

I noticed a while ago that Koha 3.8.0 showed a “Debug is on (level 1)” message in the member/patron entry template, but I had forgotten about it until someone brought it up again recently.

So I started researching the cause of this Debug level.

I checked the system preference “Debug Level” in the system preferences, but it was turned off (i.e. set to 0). After my research, it seems to me that the Debug Level system preference does very little in Koha. It seems to me that it might handle varying levels of error messages for fatal errors, which I’ve never actually encountered first-hand.

Anyway…

What was causing this Debug level to be set to one?

Well, I found out that the script behind the memberentrygen.tt template was setting a local scalar variable from a global variable ($ENV{DEBUG”).

Ok. That makes sense. I can understand that there are different debug settings. But…where is this $ENV{DEBUG} being set?

Well, I Googled. I grepped. I eventually found a record of an email about an old Koha patch (http://lists.katipo.co.nz/public/koha/2010-December/026789.html) which talked about a command called “Setenv Debug 1” in the koha-httpd.conf file (which is an Apache web server configuration file that tells Apache how to serve and log Koha).

Sure enough…I found that very same command in all of my Apache configuration files! Awesome! If I turn that off, it’ll get rid of that original annoying message!

But…let’s step back for a second…that “Setenv Debug 1” command is probably there for a reason!

If we grep (a Linux/MacOS command/utility) our Koha files, we’ll notice that a variable called $debug is set by $ENV{DEBUG} in other files than just memberentry.pl. In fact, it is set in some pretty important Perl modules that can broadcast it across the entire Koha instance! So…if we grep $debug (remembering to escape the $ sign with a backslash /), we’ll notice that this environmental variable turns on A LOT of back-end system logging.

We definitely don’t want to turn this off…

So…$ENV{DEBUG} is important for our system logs…and the Debug Level system preference is (potentially) important for fatal error logging…

But what about the “Software Error” messages that get up when you make errors in your Perl scripts (we all make mistakes, which is why we test!)? Surely those have to come from somewhere…

Are they affected by this environmental variable and system preference?

As it turns out…nope!

They are produced and handled by Carp.pm, which is part of the core Perl5 lib (to the best of my knowledge). To find it…you can go to your Perl5 lib (probably in usr/lib/perl5) and grep for Carp.pm. You might wind up with a few results. You can also try grepping using some of the text from the “Software Error” messages that you’re receiving in your browser. Note that not all of the text will be “greppable” because the error message text is the product of concatenated (i.e. joined together) variables (i.e. dynamic storage containers) and strings (i.e. lines of text).

Anyway, you’ll find it in Carp.pm.

 

But wait…in your “Software Error” messages…you’ll notice that there is an email address! Where’s that coming from?

Carp.pm will tell you that it’s from $ENV{server_admin}.

Great! But…where is that from?

Well, these folks (http://www.perlmonks.org/bare/?node_id=456111) mention that it is set by the ServerAdmin directive (i.e. command) in the Apache configuration file, which we know is koha-httpd.conf.

Sure enough, we go there, and the address next to ServerAdmin is the same one that we see in our error messages.

Cool, n’est-ce pas?

In all honesty, all these conclusions took some time, and a lot of grepping, Googling, guessing, and reading through lines of code.

But…I fixed the problem and learned a lot doing it!

 

Koha vs The World

Marshall Breeding article about the library automation marketplace in 2010 (the article is from 2011). It talks a fair bit about Koha and quite a few other library management systems (or rather their companies). It mostly looks at the numbers of new customers, new sales, total installed (apparently a grand total despite being near the label for 2010).

As expected, SirsiDynix (Symphony, Horizon) and ExLibris (Voyager, Aleph) are the biggest players. Innovative Interfaces (Millennium) was also another expected powerhouse.

I was surprised to see EOS with ~1000 installs, since I was not very impressed with their software. Of course, that’s not to say that I was impressed by Horizon, Voyager, or Millennium either. It’s just that those latter 3 are huge in academic and public libraries, while EOS markets mainly to special libraries. I’ve only heard of one client who uses EOS and they’ve moved away from it. I know heaps on those other three systems.

Since Koha is supported by multiple vendors, it takes a bit more work to see its net installs, but it’s formidable as well. It’s ~1000 as well (ByWater Solutions, Equinox Software, and PTFS – Liblime). Of course, the numbers for Koha are only for the 3 big US companies. There are lots of smaller vendors and institutions using Koha throughout the USA.

These numbers are also self-reported by the vendors, so…caveat emptor.

http://www.libraryjournal.com/lj/home/889533-264/automation_marketplace_2011_the_new.html.csp

 

Brendan Gallagher, CEO of ByWater Solutions, provided me with this link, which is much more contemporary and very interesting in terms of the migration toward ByWater Solutions by other systems (especially users of the proprietary PTFS – Liblime version of Koha).

This report is also by Marshall Breeding, but these numbers are probably even less comprehensive since they are just compiled from one library listserv.

http://www.librarytechnology.org/ils-turnover.pl?Year=2013

 

Automated estimation of how much the Koha project has cost in terms of coding…

https://www.ohloh.net/p/koha/estimated_cost

 

Kuali is a very well funded academic and research open-source library “environment”, but…it doesn’t look like they’ve gotten very far and it doesn’t look like it was actually designed for librarians or archivists to use…

But it’s an interesting concept. I like that they’re trying to re-envision how “resources” should be handled by an automated system. Yet, one problem with that is the users of this system might completely alienate themselves from many if not all other systems out there. While you could argue that systems that follow existing standards aren’t innovative, they are functional and interoperable.

Mind you, this system seems to want to take plugins and multiple data formats into account, so maybe it really can do it all.

Or rather…maybe it WANTS to do it all, but I think it is a very long way away from achieving that. Presently, this system seems more like an accounting system than a resource management system designed to describe and facilitate access to print and electronic materials…

http://www.kuali.org/ole

Koha in Canada || Origin: Koha

Wondering about the origin story of Koha?

Sure, you may have heard that it was originally created in New Zealand and that it is open source, but how much do you really know?

Check out this code4lib article: http://journal.code4lib.org/articles/1638

So…I’m living and working in Australia, but I’m originally from Canada. Koha has a pretty strong presence in Australia, New Zealand, the USA, Europe, India, Africa, and probably a few other places that I haven’t mentioned.

But not Canada.

Or at least…information about libraries using Koha in Canada is rather sparse!

inLibro is one company in Québec that offers hosted Koha services. I think there might be one other that advertises as well.

Other than that…I think most adoptions of Koha have been by individual institutions. For instance, check out this link about how Prince Edward Island (a petit province in Canada) uses Koha for all of its school libraries!

http://www.gov.pe.ca/index.php3/index.php3?number=news&newsnumber=7681&dept=&lang=E

Here’s another link that takes you to the PEI school Koha catalogue:

http://211.schoollibrary.edu.pe.ca/cgi-bin/koha/opac-detail.pl?biblionumber=183759

I would love to hear about more Koha projects in Canada, so leave comments if you know of any. I’ll continue to do research and try to promote it among folks that I know.

If you’re interested in taking a look at Koha for yourself, consider downloading the Live CD:

http://wiki.koha-community.org/wiki/Koha_LiveCD

I haven’t investigated it fully myself, but it should contain a self-contained Linux (Ubuntu) operating system, Koha, Zebra (the indexing software), and everything else you need to get started using Koha! It’s not generally recommended for production installs, but I imagine it is a great way to get started using Koha and maybe it is suitable for a little library run by volunteers. I’m going to experiment with that at a later date ;).

 

Zebra indexing, Bib-1 Attributes, CCL, and more…

Unfortunately, I don’t have time to really elaborate today, but perhaps I will come back and explain later. Until then, here is a list of links…

Yaz-Client (for querying Z39.50 databases like Zebra)

http://www.indexdata.com/yaz/doc/yaz-client.html

Bib-1 Attributes

http://www.loc.gov/z3950/agency/defns/bib1.html

Bib-1 Attributes not supported in Zebra

http://www.indexdata.com/zebra/doc/querymodel-rpn.html#querymodel-bib1-nonuse

Zebra Query Model

http://www.indexdata.com/zebra/doc/querymodel-zebra.html

A bit of talk about how Koha and Zebra link together using CCL/PQF

http://koha.1045719.n5.nabble.com/Search-at-the-beginning-of-an-expression-td3364395.html

CCL Special Attribute Combos

http://www.indexdata.com/yaz/doc/tools.html#ccl.special.attribute.combos

In terms of adjusting Zebra/Koha settings, you’ll want to look at:

bib1.att

ccl.properties

record.abs

Bib1.att lists and maps bib-1 attributes (default from the library of congress spec and special added ones for Koha)
Ccl.properties maps CCL to BIB1/PQF
Record.abs maps MARC fields to BIB1

Also, since it’s really hard to find information about how to construct PQF queries using BIB1…

There are 6 types of Bib-1 attributes. Each type has a variety of codes within that type. To create a query, you would type something like the following into Yaz or whatever else you’re using that utilizes PQF:

f @attr 1=4 computer

f stands for find in yaz

@attr stands for attribute (you need to write one of these for each attribute you’re creating)
1=4 stands for type 1: use attributes with a code of 4 for title
1=4 -> use attribute=title

It’s actually quite straightforward, but it’s quite rare to find it spelled out for you on the Web!

Koha and MarcEdit

I haven’t read much of “Terry’s Worklog“, but I came across it when I was doing some Koha-related research, and it seems like he’s up to some interesting work integrating Koha and MarcEdit.

Actually, it looks like he’s up to all sorts of interesting projects, but I just haven’t had the time to look too deeply…yet.

<edit date=”18 February 2013″>

After watching the video Using MarcEdit to Add Koha Items, I upgraded my MarcEdit instance and checked out some of the features. It looks like MarcEdit is even more awesome than I remember. There are all sorts of batch changes that you can do to MARC records using the tools in the MarcEditor.

I also noticed that MarcEdit can load MARC data from databases or via Z39.50/SRU or OAI.

It’s worth mentioning that the Koha API that Terry Reese mentions is the Koha HTTP API. He explains his interaction with the API in the following posts here and here.

It appears that the Koha API is primarily for retrieving, adding, and updating bibliographic and item records, and that searches can be done through a Zebra-based API where you can pass a query to Zebra via HTTP and have it return MARCXML records in response.

</edit>