Git Blame and Autocomplete=Off

 

Use Git as your source version system (SVC)? Want to know who edited a line of code? And when? And the hash of the commit that introduced that line of code?

Use “git blame”!

http://www.kernel.org/pub/software/scm/git/docs/git-blame.html

 

I noticed today that someone had included the autocomplete=off HTML directive (http://www.htmlcodetutorial.com/forms/_INPUT_AUTOCOMPLETE.html) to a self-checkout input field in  a form, and I wanted to know whose karma to increment in #koha, so I did a git blame of the template and found the original author’s name and the SHA1 hash for their commit.

I used “git show” to look up that commit and there I saw the addition of that directive.

 

(Explanation of autocomplete=off: Know how your browser will autocomplete some of your username and password fields? Or just cache a list of possibilities that you’ve tried before? Well, this turns that function off for that particular input field. Very handy for sensitive information being used for a web app on a public terminal!)

Playing with JSON in PHP

Recently, I asked for a list of web URLS to use for some software I’m testing, and I ended up receiving a zipped file containing a bunch of JSON files nested in a heap of subdirectories alongside some image files.

Well… I just wanted that one key-value pair from each JSON file within the subdirectories, which contained the URLs I desired.

I have access to a server with a PHP interpretor, so I decided to throw together a quick program to take care of the work of looping through the files and digging out the value I wanted from the JSON.

In PHP…

1) I used the RecursiveDirectoryIterator class to create an object (e.g. “di”) containing a bunch of  SplFileInfo objects, which represent everything found in that top level directory in which everything was originally unzipped (i.e. I created a PHP representation of the directory).

2) Next, I used the RecursiveIteratorIterator class and the “foreach” iterator to  and iterate through all the SplFileInfo objects within the “di” object. This allows me to look at all the files within the subdirectories.

3) Using Regular Expressions (in this case, the “preg_match” function), I was able to create a condition that would just be executed on files ending with the JSON file extension.

4) Using the “file_get_contents” function, I retrieved the contents from the JSON files and decoded them into PHP objects using the “json_decode” function.

5) Then, I pulled the “url” using the $object->value syntax.

At this point, you could do whatever you want with the urls. I decided to store them in an array and then print them out, but I could’ve just as easily printed them.

I’m certain that there are other ways to achieve the same effect, but it was nice being able to write a little program to look through a file system directory and extract the particular text that I wanted.

Now, I can use this as the basis for my next set of tests…

 

Cataloguing Series…MARC 490 and MARC 830

When cataloguing an item from a series, are you ever confused by why you need to put the “series title” in both the MARC 490 and MARC 830 fields?

I remember being told to “just do it” by teachers and library staff, but in actuality…these fields have very different purposes. I was reminded of this today when someone asked if Koha supported MARC Authorities for Series. At first, I was confounded. Authority records for series? That seemed bizarre. Then I realized…they meant does Koha support MARC Authorities for uniform titles…

Here is the general info from the Library of Congress:

490 – Series Statement (R)

http://www.loc.gov/marc/bibliographic/bd490.html

830 – Series Added Entry-Uniform Title (R)

http://www.loc.gov/marc/bibliographic/bd830.html

 

The key to it all resides in a footnote on the 490 page:

Indicator 1 – Series tracing policy
1 – Series traced [REDEFINED, 2008]
Prior to 2009, series for which the transcribed form and the traced form were the same were in field 440, and field 490 was not used. If the transcribed form and the traced form were different, the transcribed form was in field 490 and Indicator 1 had value “1” (Series traced differently) The traced form was in an 8XX field. Beginning in 2009, field 440 is not used and the transcribed form of the series name is in field 490 with the traced form in 8XX, even if the names are the same.

In other words, if you have an authority record for the uniform title of a series, the title from that authority record (MARC 130 – Heading-Uniform Title (NR) in the authority record) will go in the 830 field, and the transcribed form (i.e. the series title that appears on the title page/source of information) will go in the 490 field. You’ll then change the first indicator for 490 to 1 (as mentioned above) and you’re all good!

SQL For Finding Records From One Table That Don’t Exist In Another Table In Your Database

Here are some useful snippets of SQL to use when trying to find records in one table that don’t exist in another table in your database…
SELECT *
FROM Call
WHERE phone_number NOT IN (SELECT phone_number FROM Phone_book)

or

SELECT *
FROM Call
WHERE NOT EXISTS
(SELECT *
FROM Phone_book
WHERE Phone_book.phone_number = Call.phone_number)

or

SELECT *
FROM Call
LEFT OUTER JOIN Phone_Book
ON (Call.phone_number = Phone_book.phone_number)
WHERE Phone_book.phone_number IS NULL

Snippets are taken from this Stackoverflow Thread:
http://stackoverflow.com/questions/367863/sql-find-records-from-one-table-which-dont-exist-in-another

Serial Cataloguing

Over the past 6 years, I’ve seen and experienced a few different approaches to cataloguing serials using MARC. After all, typically when you enter an organization, you adopt whatever cataloguing conventions they practise there. This makes a certain amount of sense, since there is always organizational acculturation whenever you begin a new job. Standards are often considered to be guidelines more than rules.

I’m in a very different position now than I have been over the past few years though. At this point, I have people posing questions to me about how they’re “supposed” to catalogue. This is quite another animal all together and as a result actually fairly difficult to answer.

If you consult the following links, you will notice a few different ideas about how people are “supposed” to catalogue serials.

Serial Cataloguing

http://special-cataloguing.com/node/1403

CONSER Cataloging Manual

http://www.itsmarc.com/crs/mergedprojects/conser/conser/contents.htm

University of Illinois at Urbana-Champaign: Serials Cataloging

http://www.library.illinois.edu/cam/procedures/serguide.html

Arizona State Museum: Instructions for Serials Cataloging

http://www.statemuseum.arizona.edu/library/cataloging_manual/serialscat.shtml

OK…but how are you ACTUALLY supposed to do it?

Well, it seems to me that the MARC 362 field (http://www.loc.gov/marc/bibliographic/bd362.html)  is supposed to be used to record the “beginning” and “end” dates of a publication. This may also include the sequential designation (i.e. Vol., No., etc.) in the case of periodical publications.

Then, the MARC 863 (http://www.loc.gov/marc/holdings/hd863865.html) field is supposed to be used to record the actual holdings. In some cases, this might involve multi-part items that are not periodicals, but that’s outside the scope of this post. In regards to serials, there are various levels enumeration, which allow you to specify your holdings at various levels of detail. Perhaps you just want one 863 entry per year. Perhaps you want one every month. I suppose this is where a certain amount of localized convention comes into the picture.
What I would like to point out is that 853 and 863 fields seem to be directly linked, and thus subfields should be used for their linked purpose. If you are marking an item as missing, use a X or Z subfield to write that information out as a “note”. That’s where it belongs. Follow the examples specified in the Library of Congress webpage I have linked to above.

Without a doubt, serial cataloguing is a complicated beast, but hopefully this post will elucidate things a bit and prompt further research on the part of those doing serials cataloguing.

Error Messages in Koha

I noticed a while ago that Koha 3.8.0 showed a “Debug is on (level 1)” message in the member/patron entry template, but I had forgotten about it until someone brought it up again recently.

So I started researching the cause of this Debug level.

I checked the system preference “Debug Level” in the system preferences, but it was turned off (i.e. set to 0). After my research, it seems to me that the Debug Level system preference does very little in Koha. It seems to me that it might handle varying levels of error messages for fatal errors, which I’ve never actually encountered first-hand.

Anyway…

What was causing this Debug level to be set to one?

Well, I found out that the script behind the memberentrygen.tt template was setting a local scalar variable from a global variable ($ENV{DEBUG”).

Ok. That makes sense. I can understand that there are different debug settings. But…where is this $ENV{DEBUG} being set?

Well, I Googled. I grepped. I eventually found a record of an email about an old Koha patch (http://lists.katipo.co.nz/public/koha/2010-December/026789.html) which talked about a command called “Setenv Debug 1” in the koha-httpd.conf file (which is an Apache web server configuration file that tells Apache how to serve and log Koha).

Sure enough…I found that very same command in all of my Apache configuration files! Awesome! If I turn that off, it’ll get rid of that original annoying message!

But…let’s step back for a second…that “Setenv Debug 1” command is probably there for a reason!

If we grep (a Linux/MacOS command/utility) our Koha files, we’ll notice that a variable called $debug is set by $ENV{DEBUG} in other files than just memberentry.pl. In fact, it is set in some pretty important Perl modules that can broadcast it across the entire Koha instance! So…if we grep $debug (remembering to escape the $ sign with a backslash /), we’ll notice that this environmental variable turns on A LOT of back-end system logging.

We definitely don’t want to turn this off…

So…$ENV{DEBUG} is important for our system logs…and the Debug Level system preference is (potentially) important for fatal error logging…

But what about the “Software Error” messages that get up when you make errors in your Perl scripts (we all make mistakes, which is why we test!)? Surely those have to come from somewhere…

Are they affected by this environmental variable and system preference?

As it turns out…nope!

They are produced and handled by Carp.pm, which is part of the core Perl5 lib (to the best of my knowledge). To find it…you can go to your Perl5 lib (probably in usr/lib/perl5) and grep for Carp.pm. You might wind up with a few results. You can also try grepping using some of the text from the “Software Error” messages that you’re receiving in your browser. Note that not all of the text will be “greppable” because the error message text is the product of concatenated (i.e. joined together) variables (i.e. dynamic storage containers) and strings (i.e. lines of text).

Anyway, you’ll find it in Carp.pm.

 

But wait…in your “Software Error” messages…you’ll notice that there is an email address! Where’s that coming from?

Carp.pm will tell you that it’s from $ENV{server_admin}.

Great! But…where is that from?

Well, these folks (http://www.perlmonks.org/bare/?node_id=456111) mention that it is set by the ServerAdmin directive (i.e. command) in the Apache configuration file, which we know is koha-httpd.conf.

Sure enough, we go there, and the address next to ServerAdmin is the same one that we see in our error messages.

Cool, n’est-ce pas?

In all honesty, all these conclusions took some time, and a lot of grepping, Googling, guessing, and reading through lines of code.

But…I fixed the problem and learned a lot doing it!

 

Koha vs The World

Marshall Breeding article about the library automation marketplace in 2010 (the article is from 2011). It talks a fair bit about Koha and quite a few other library management systems (or rather their companies). It mostly looks at the numbers of new customers, new sales, total installed (apparently a grand total despite being near the label for 2010).

As expected, SirsiDynix (Symphony, Horizon) and ExLibris (Voyager, Aleph) are the biggest players. Innovative Interfaces (Millennium) was also another expected powerhouse.

I was surprised to see EOS with ~1000 installs, since I was not very impressed with their software. Of course, that’s not to say that I was impressed by Horizon, Voyager, or Millennium either. It’s just that those latter 3 are huge in academic and public libraries, while EOS markets mainly to special libraries. I’ve only heard of one client who uses EOS and they’ve moved away from it. I know heaps on those other three systems.

Since Koha is supported by multiple vendors, it takes a bit more work to see its net installs, but it’s formidable as well. It’s ~1000 as well (ByWater Solutions, Equinox Software, and PTFS – Liblime). Of course, the numbers for Koha are only for the 3 big US companies. There are lots of smaller vendors and institutions using Koha throughout the USA.

These numbers are also self-reported by the vendors, so…caveat emptor.

http://www.libraryjournal.com/lj/home/889533-264/automation_marketplace_2011_the_new.html.csp

 

Brendan Gallagher, CEO of ByWater Solutions, provided me with this link, which is much more contemporary and very interesting in terms of the migration toward ByWater Solutions by other systems (especially users of the proprietary PTFS – Liblime version of Koha).

This report is also by Marshall Breeding, but these numbers are probably even less comprehensive since they are just compiled from one library listserv.

http://www.librarytechnology.org/ils-turnover.pl?Year=2013

 

Automated estimation of how much the Koha project has cost in terms of coding…

https://www.ohloh.net/p/koha/estimated_cost

 

Kuali is a very well funded academic and research open-source library “environment”, but…it doesn’t look like they’ve gotten very far and it doesn’t look like it was actually designed for librarians or archivists to use…

But it’s an interesting concept. I like that they’re trying to re-envision how “resources” should be handled by an automated system. Yet, one problem with that is the users of this system might completely alienate themselves from many if not all other systems out there. While you could argue that systems that follow existing standards aren’t innovative, they are functional and interoperable.

Mind you, this system seems to want to take plugins and multiple data formats into account, so maybe it really can do it all.

Or rather…maybe it WANTS to do it all, but I think it is a very long way away from achieving that. Presently, this system seems more like an accounting system than a resource management system designed to describe and facilitate access to print and electronic materials…

http://www.kuali.org/ole

Koha in Canada || Origin: Koha

Wondering about the origin story of Koha?

Sure, you may have heard that it was originally created in New Zealand and that it is open source, but how much do you really know?

Check out this code4lib article: http://journal.code4lib.org/articles/1638

So…I’m living and working in Australia, but I’m originally from Canada. Koha has a pretty strong presence in Australia, New Zealand, the USA, Europe, India, Africa, and probably a few other places that I haven’t mentioned.

But not Canada.

Or at least…information about libraries using Koha in Canada is rather sparse!

inLibro is one company in Québec that offers hosted Koha services. I think there might be one other that advertises as well.

Other than that…I think most adoptions of Koha have been by individual institutions. For instance, check out this link about how Prince Edward Island (a petit province in Canada) uses Koha for all of its school libraries!

http://www.gov.pe.ca/index.php3/index.php3?number=news&newsnumber=7681&dept=&lang=E

Here’s another link that takes you to the PEI school Koha catalogue:

http://211.schoollibrary.edu.pe.ca/cgi-bin/koha/opac-detail.pl?biblionumber=183759

I would love to hear about more Koha projects in Canada, so leave comments if you know of any. I’ll continue to do research and try to promote it among folks that I know.

If you’re interested in taking a look at Koha for yourself, consider downloading the Live CD:

http://wiki.koha-community.org/wiki/Koha_LiveCD

I haven’t investigated it fully myself, but it should contain a self-contained Linux (Ubuntu) operating system, Koha, Zebra (the indexing software), and everything else you need to get started using Koha! It’s not generally recommended for production installs, but I imagine it is a great way to get started using Koha and maybe it is suitable for a little library run by volunteers. I’m going to experiment with that at a later date ;).

 

Ghostery, Adblock+, etc.

Had a colleague getting some Javascript security errors when trying that HTML5 game…

Obviously, the first suspects were browser security plugins, which reminded me that I haven’t looked extensively at all the different ones out there like Ghostery, Adblock, NoScript, Requestpolicy, etc.

I’ll have to do that sometime down the line (hmm, so many ideas, so little time). Until then, here is a short discussion about ghostery, adblock, and some others…

Ghostery

Looks like it might be an interesting blog for security in general…

Zebra indexing, Bib-1 Attributes, CCL, and more…

Unfortunately, I don’t have time to really elaborate today, but perhaps I will come back and explain later. Until then, here is a list of links…

Yaz-Client (for querying Z39.50 databases like Zebra)

http://www.indexdata.com/yaz/doc/yaz-client.html

Bib-1 Attributes

http://www.loc.gov/z3950/agency/defns/bib1.html

Bib-1 Attributes not supported in Zebra

http://www.indexdata.com/zebra/doc/querymodel-rpn.html#querymodel-bib1-nonuse

Zebra Query Model

http://www.indexdata.com/zebra/doc/querymodel-zebra.html

A bit of talk about how Koha and Zebra link together using CCL/PQF

http://koha.1045719.n5.nabble.com/Search-at-the-beginning-of-an-expression-td3364395.html

CCL Special Attribute Combos

http://www.indexdata.com/yaz/doc/tools.html#ccl.special.attribute.combos

In terms of adjusting Zebra/Koha settings, you’ll want to look at:

bib1.att

ccl.properties

record.abs

Bib1.att lists and maps bib-1 attributes (default from the library of congress spec and special added ones for Koha)
Ccl.properties maps CCL to BIB1/PQF
Record.abs maps MARC fields to BIB1

Also, since it’s really hard to find information about how to construct PQF queries using BIB1…

There are 6 types of Bib-1 attributes. Each type has a variety of codes within that type. To create a query, you would type something like the following into Yaz or whatever else you’re using that utilizes PQF:

f @attr 1=4 computer

f stands for find in yaz

@attr stands for attribute (you need to write one of these for each attribute you’re creating)
1=4 stands for type 1: use attributes with a code of 4 for title
1=4 -> use attribute=title

It’s actually quite straightforward, but it’s quite rare to find it spelled out for you on the Web!