Regular Expressions: Genius/Idiot Dichotomy

Today, I was faced with a little bug that I introduced into some code nearly a month ago, while fixing another bug.

I had enabled use of regular expressions for a Javascript DataTables fnFilter API method.

Originally, I thought I was being quite clever, when I wrapped the string to search for (i.e. the “needle”) in start and end string anchors (e.g. ^needle$). It fixed a problem I was having when I searched for “D” but got results for “D” and “DD”. Done and dusted.

Or so I thought.

A few weeks later, people aren’t finding any results.

That’s because while my solution worked great for times when the string to search (i.e. the “haystack”) was a single value (e.g. “D” or “DD”), it didn’t work so great when the haystack string was “A<br/>B<br/>D”.

Suddenly, I felt like an idiot.

How had I missed that? It was so obvious, in hindsight, that my original solution wasn’t going to cut it. So, I started trying out a few different ideas, but I wasn’t entirely sure how DataTables was filtering the data, so it was more guesswork than I would’ve liked.

Then, I realized that my original solution was all right, if I added another option. I should look for the start and end of the string, OR an angle bracket. So, “^needle$” became” (^|>)needle(<|$)”. This meant that I was able to find “needle<br/>”, “<br/>needle”, “needle”, or “<br/>needle</br>”.

Perfection! Genius! That fixed it! I tested it with lots of different examples and it worked perfectly!

Amazing how just a few characters can make the difference between feeling like an idiot and feeling like a genius.

(PS: Of course, I’m neither a genius or an idiot, but probably somewhere in between. Alas, it seems to me that programming is often about dancing between these two extremes.)

(PPS: Regular expressions are actually a lot of fun. I think the temptation is to avoid them because they can be difficult to understand, but they’re incredibly powerful and a valuable tool to anyone who needs to manipulate data.)

Open Source Software for Libraries: Experiments

Since I first arrived at Prosentient Systems in January 2012, I’ve been working on and experimenting with open source software for libraries. Initially, I was somewhat hesitant, since my background is in English, French, and Library and Information Studies. However, the more I’ve learned and experimented, the more I’ve wanted to continue working with open source software!

To that end, I recently purchased a new desktop computer (my first since 2003). With a 2.6ghz and 4gb of RAM, it’s reasonably fast enough to do whatever I might want to do.

Step 1: Install Debian (i.e. Linux)

DONE!
(This involved installing the base system, adding myself as a sudo user, getting the sound working [I like working to music] by manually editing the alsa-base.conf to detect my integrated Intel sound card, installing Chromium as a browser, setting up some firewall software, and installing vim and vim-runtime [as Debian only comes with vim-common and vim-tiny files which aren’t quite enough for my text editing needs].)

(I’ve also thought about setting up an SSH server so I can also remote into my Linux box from my Windows netbook using PuTTY, but…I’m willing to put that off for the time being.)

Step 2: Install Koha using the community-generated Debian packages

IN PROGRESS

(Well, not quite. It’s 9:11pm, so it’s Doctor Who time. However, later in the week, I’m going to give it a go using the instructions available at koha-community.org.

Since I’m already a fairly active Koha developer, I’m pretty familiar with the code, and I’ve already done quite a bit of troubleshooting, so I’m not really worried about this. I can handle MySql (the database). I can handle Zebra (the indexing engine). Admittedly, to date, I’ve only done an assisted standard install and an assisted dev install (since I didn’t have root access to the server). However, I’m pretty confident that I can get Koha up and running. Actually, after I do a standard install via the packages, I might set up a regular dev install and a standard install (from the Git clone)

I might also try installing from a downloaded Tarball, as well as setting up a dev install using “Koha Gitify”.

There are lots of different ways of obtaining Koha code and setting up an instance, and I want to try them all!)

So thinking again about the knowledge that I might need to set up Koha…I like to think again of the LAMP acronym.

L->Linux (I’ve installed Debian, and I’m reasonably proficient using the command line)
A->Apache (I see how Koha uses apache, I’ve gone through the files, and I’ve set up my own web server in the past using apache, so…it might take a few tries, but it’ll be all right)
M->Mysql (it’s a relational database. While I typically interact with it using a GUI, I’m sure I can handle it from the command-line as well. The GUI might be a bit faster and provide easier scrollback, but I can also install one if I really want to.)
P->Perl (While I still want to improve my design skills, I have yet to meet a Perl script/module in Koha that I haven’t been able to understand [thanks to my own persistence and the generous help of some very skilled and amicable Koha community developers]. Given time, all the code in Koha is understandable.)

I doubt that everyone setting up Koha is going to need to have an in-depth knowledge in all these areas, but I’m sure it helps. Hopefully, it will mean I don’t have to hassle people in #koha too much ;).

Step 3: Install DSpace

EVENTUALLY

(While I have quite a bit of experience modifying the DSpace JSPUI and some of its Java classes, I don’t have extensive experience writing Java, compiling Java, working with Postgresql, or troubleshooting DSpace. So…this might be a project I leave for a little while. I’m keen, but I’m more active in the Koha community and I find that Koha is much more relevant to the majority of libraries than DSpace.)

Step 4: Who knows?

I’m thinking of trying out lots of different systems. Here is a list that I’m pondering:

1) Archivematica (originating from Vancouver, BC – it is used for digital preservation)
2) VuFind (a PHP-based discovery layer for library applications)
3) WordPress (as a Content Management System)
4) Evergreen (the open source library management system/integrated library system)
5) Drupal (the CMS)
6) Islandora (digital library/archive)
7) Fedora Repository (digital library/archive)
8) Greenstone (digital library/archive)
9) Kete (digital library/wiki?)

Does anyone have any ideas about other open source software for libraries that I haven’t mentioned and that might be worth trying out?

While I only have Linux on this machine, I could create another partition for Windows XP (I still have my old desktop install disk laying around) or I could set up a Windows XP VM in Virtual Box. So…send me a message, post a comment, or give me a shout and let me know anything else I should try.

For now…Doctor Who Series 7 Finale!

Error Messages in Koha

I noticed a while ago that Koha 3.8.0 showed a “Debug is on (level 1)” message in the member/patron entry template, but I had forgotten about it until someone brought it up again recently.

So I started researching the cause of this Debug level.

I checked the system preference “Debug Level” in the system preferences, but it was turned off (i.e. set to 0). After my research, it seems to me that the Debug Level system preference does very little in Koha. It seems to me that it might handle varying levels of error messages for fatal errors, which I’ve never actually encountered first-hand.

Anyway…

What was causing this Debug level to be set to one?

Well, I found out that the script behind the memberentrygen.tt template was setting a local scalar variable from a global variable ($ENV{DEBUG”).

Ok. That makes sense. I can understand that there are different debug settings. But…where is this $ENV{DEBUG} being set?

Well, I Googled. I grepped. I eventually found a record of an email about an old Koha patch (http://lists.katipo.co.nz/public/koha/2010-December/026789.html) which talked about a command called “Setenv Debug 1” in the koha-httpd.conf file (which is an Apache web server configuration file that tells Apache how to serve and log Koha).

Sure enough…I found that very same command in all of my Apache configuration files! Awesome! If I turn that off, it’ll get rid of that original annoying message!

But…let’s step back for a second…that “Setenv Debug 1” command is probably there for a reason!

If we grep (a Linux/MacOS command/utility) our Koha files, we’ll notice that a variable called $debug is set by $ENV{DEBUG} in other files than just memberentry.pl. In fact, it is set in some pretty important Perl modules that can broadcast it across the entire Koha instance! So…if we grep $debug (remembering to escape the $ sign with a backslash /), we’ll notice that this environmental variable turns on A LOT of back-end system logging.

We definitely don’t want to turn this off…

So…$ENV{DEBUG} is important for our system logs…and the Debug Level system preference is (potentially) important for fatal error logging…

But what about the “Software Error” messages that get up when you make errors in your Perl scripts (we all make mistakes, which is why we test!)? Surely those have to come from somewhere…

Are they affected by this environmental variable and system preference?

As it turns out…nope!

They are produced and handled by Carp.pm, which is part of the core Perl5 lib (to the best of my knowledge). To find it…you can go to your Perl5 lib (probably in usr/lib/perl5) and grep for Carp.pm. You might wind up with a few results. You can also try grepping using some of the text from the “Software Error” messages that you’re receiving in your browser. Note that not all of the text will be “greppable” because the error message text is the product of concatenated (i.e. joined together) variables (i.e. dynamic storage containers) and strings (i.e. lines of text).

Anyway, you’ll find it in Carp.pm.

 

But wait…in your “Software Error” messages…you’ll notice that there is an email address! Where’s that coming from?

Carp.pm will tell you that it’s from $ENV{server_admin}.

Great! But…where is that from?

Well, these folks (http://www.perlmonks.org/bare/?node_id=456111) mention that it is set by the ServerAdmin directive (i.e. command) in the Apache configuration file, which we know is koha-httpd.conf.

Sure enough, we go there, and the address next to ServerAdmin is the same one that we see in our error messages.

Cool, n’est-ce pas?

In all honesty, all these conclusions took some time, and a lot of grepping, Googling, guessing, and reading through lines of code.

But…I fixed the problem and learned a lot doing it!

 

Koha vs The World

Marshall Breeding article about the library automation marketplace in 2010 (the article is from 2011). It talks a fair bit about Koha and quite a few other library management systems (or rather their companies). It mostly looks at the numbers of new customers, new sales, total installed (apparently a grand total despite being near the label for 2010).

As expected, SirsiDynix (Symphony, Horizon) and ExLibris (Voyager, Aleph) are the biggest players. Innovative Interfaces (Millennium) was also another expected powerhouse.

I was surprised to see EOS with ~1000 installs, since I was not very impressed with their software. Of course, that’s not to say that I was impressed by Horizon, Voyager, or Millennium either. It’s just that those latter 3 are huge in academic and public libraries, while EOS markets mainly to special libraries. I’ve only heard of one client who uses EOS and they’ve moved away from it. I know heaps on those other three systems.

Since Koha is supported by multiple vendors, it takes a bit more work to see its net installs, but it’s formidable as well. It’s ~1000 as well (ByWater Solutions, Equinox Software, and PTFS – Liblime). Of course, the numbers for Koha are only for the 3 big US companies. There are lots of smaller vendors and institutions using Koha throughout the USA.

These numbers are also self-reported by the vendors, so…caveat emptor.

http://www.libraryjournal.com/lj/home/889533-264/automation_marketplace_2011_the_new.html.csp

 

Brendan Gallagher, CEO of ByWater Solutions, provided me with this link, which is much more contemporary and very interesting in terms of the migration toward ByWater Solutions by other systems (especially users of the proprietary PTFS – Liblime version of Koha).

This report is also by Marshall Breeding, but these numbers are probably even less comprehensive since they are just compiled from one library listserv.

http://www.librarytechnology.org/ils-turnover.pl?Year=2013

 

Automated estimation of how much the Koha project has cost in terms of coding…

https://www.ohloh.net/p/koha/estimated_cost

 

Kuali is a very well funded academic and research open-source library “environment”, but…it doesn’t look like they’ve gotten very far and it doesn’t look like it was actually designed for librarians or archivists to use…

But it’s an interesting concept. I like that they’re trying to re-envision how “resources” should be handled by an automated system. Yet, one problem with that is the users of this system might completely alienate themselves from many if not all other systems out there. While you could argue that systems that follow existing standards aren’t innovative, they are functional and interoperable.

Mind you, this system seems to want to take plugins and multiple data formats into account, so maybe it really can do it all.

Or rather…maybe it WANTS to do it all, but I think it is a very long way away from achieving that. Presently, this system seems more like an accounting system than a resource management system designed to describe and facilitate access to print and electronic materials…

http://www.kuali.org/ole

Zebra indexing, Bib-1 Attributes, CCL, and more…

Unfortunately, I don’t have time to really elaborate today, but perhaps I will come back and explain later. Until then, here is a list of links…

Yaz-Client (for querying Z39.50 databases like Zebra)

http://www.indexdata.com/yaz/doc/yaz-client.html

Bib-1 Attributes

http://www.loc.gov/z3950/agency/defns/bib1.html

Bib-1 Attributes not supported in Zebra

http://www.indexdata.com/zebra/doc/querymodel-rpn.html#querymodel-bib1-nonuse

Zebra Query Model

http://www.indexdata.com/zebra/doc/querymodel-zebra.html

A bit of talk about how Koha and Zebra link together using CCL/PQF

http://koha.1045719.n5.nabble.com/Search-at-the-beginning-of-an-expression-td3364395.html

CCL Special Attribute Combos

http://www.indexdata.com/yaz/doc/tools.html#ccl.special.attribute.combos

In terms of adjusting Zebra/Koha settings, you’ll want to look at:

bib1.att

ccl.properties

record.abs

Bib1.att lists and maps bib-1 attributes (default from the library of congress spec and special added ones for Koha)
Ccl.properties maps CCL to BIB1/PQF
Record.abs maps MARC fields to BIB1

Also, since it’s really hard to find information about how to construct PQF queries using BIB1…

There are 6 types of Bib-1 attributes. Each type has a variety of codes within that type. To create a query, you would type something like the following into Yaz or whatever else you’re using that utilizes PQF:

f @attr 1=4 computer

f stands for find in yaz

@attr stands for attribute (you need to write one of these for each attribute you’re creating)
1=4 stands for type 1: use attributes with a code of 4 for title
1=4 -> use attribute=title

It’s actually quite straightforward, but it’s quite rare to find it spelled out for you on the Web!

Koha and MarcEdit

I haven’t read much of “Terry’s Worklog“, but I came across it when I was doing some Koha-related research, and it seems like he’s up to some interesting work integrating Koha and MarcEdit.

Actually, it looks like he’s up to all sorts of interesting projects, but I just haven’t had the time to look too deeply…yet.

<edit date=”18 February 2013″>

After watching the video Using MarcEdit to Add Koha Items, I upgraded my MarcEdit instance and checked out some of the features. It looks like MarcEdit is even more awesome than I remember. There are all sorts of batch changes that you can do to MARC records using the tools in the MarcEditor.

I also noticed that MarcEdit can load MARC data from databases or via Z39.50/SRU or OAI.

It’s worth mentioning that the Koha API that Terry Reese mentions is the Koha HTTP API. He explains his interaction with the API in the following posts here and here.

It appears that the Koha API is primarily for retrieving, adding, and updating bibliographic and item records, and that searches can be done through a Zebra-based API where you can pass a query to Zebra via HTTP and have it return MARCXML records in response.

</edit>