Git Blame and Autocomplete=Off

 

Use Git as your source version system (SVC)? Want to know who edited a line of code? And when? And the hash of the commit that introduced that line of code?

Use “git blame”!

http://www.kernel.org/pub/software/scm/git/docs/git-blame.html

 

I noticed today that someone had included the autocomplete=off HTML directive (http://www.htmlcodetutorial.com/forms/_INPUT_AUTOCOMPLETE.html) to a self-checkout input field in  a form, and I wanted to know whose karma to increment in #koha, so I did a git blame of the template and found the original author’s name and the SHA1 hash for their commit.

I used “git show” to look up that commit and there I saw the addition of that directive.

 

(Explanation of autocomplete=off: Know how your browser will autocomplete some of your username and password fields? Or just cache a list of possibilities that you’ve tried before? Well, this turns that function off for that particular input field. Very handy for sensitive information being used for a web app on a public terminal!)

Playing with JSON in PHP

Recently, I asked for a list of web URLS to use for some software I’m testing, and I ended up receiving a zipped file containing a bunch of JSON files nested in a heap of subdirectories alongside some image files.

Well… I just wanted that one key-value pair from each JSON file within the subdirectories, which contained the URLs I desired.

I have access to a server with a PHP interpretor, so I decided to throw together a quick program to take care of the work of looping through the files and digging out the value I wanted from the JSON.

In PHP…

1) I used the RecursiveDirectoryIterator class to create an object (e.g. “di”) containing a bunch of  SplFileInfo objects, which represent everything found in that top level directory in which everything was originally unzipped (i.e. I created a PHP representation of the directory).

2) Next, I used the RecursiveIteratorIterator class and the “foreach” iterator to  and iterate through all the SplFileInfo objects within the “di” object. This allows me to look at all the files within the subdirectories.

3) Using Regular Expressions (in this case, the “preg_match” function), I was able to create a condition that would just be executed on files ending with the JSON file extension.

4) Using the “file_get_contents” function, I retrieved the contents from the JSON files and decoded them into PHP objects using the “json_decode” function.

5) Then, I pulled the “url” using the $object->value syntax.

At this point, you could do whatever you want with the urls. I decided to store them in an array and then print them out, but I could’ve just as easily printed them.

I’m certain that there are other ways to achieve the same effect, but it was nice being able to write a little program to look through a file system directory and extract the particular text that I wanted.

Now, I can use this as the basis for my next set of tests…

 

Cataloguing Series…MARC 490 and MARC 830

When cataloguing an item from a series, are you ever confused by why you need to put the “series title” in both the MARC 490 and MARC 830 fields?

I remember being told to “just do it” by teachers and library staff, but in actuality…these fields have very different purposes. I was reminded of this today when someone asked if Koha supported MARC Authorities for Series. At first, I was confounded. Authority records for series? That seemed bizarre. Then I realized…they meant does Koha support MARC Authorities for uniform titles…

Here is the general info from the Library of Congress:

490 – Series Statement (R)

http://www.loc.gov/marc/bibliographic/bd490.html

830 – Series Added Entry-Uniform Title (R)

http://www.loc.gov/marc/bibliographic/bd830.html

 

The key to it all resides in a footnote on the 490 page:

Indicator 1 – Series tracing policy
1 – Series traced [REDEFINED, 2008]
Prior to 2009, series for which the transcribed form and the traced form were the same were in field 440, and field 490 was not used. If the transcribed form and the traced form were different, the transcribed form was in field 490 and Indicator 1 had value “1” (Series traced differently) The traced form was in an 8XX field. Beginning in 2009, field 440 is not used and the transcribed form of the series name is in field 490 with the traced form in 8XX, even if the names are the same.

In other words, if you have an authority record for the uniform title of a series, the title from that authority record (MARC 130 – Heading-Uniform Title (NR) in the authority record) will go in the 830 field, and the transcribed form (i.e. the series title that appears on the title page/source of information) will go in the 490 field. You’ll then change the first indicator for 490 to 1 (as mentioned above) and you’re all good!

SQL For Finding Records From One Table That Don’t Exist In Another Table In Your Database

Here are some useful snippets of SQL to use when trying to find records in one table that don’t exist in another table in your database…
SELECT *
FROM Call
WHERE phone_number NOT IN (SELECT phone_number FROM Phone_book)

or

SELECT *
FROM Call
WHERE NOT EXISTS
(SELECT *
FROM Phone_book
WHERE Phone_book.phone_number = Call.phone_number)

or

SELECT *
FROM Call
LEFT OUTER JOIN Phone_Book
ON (Call.phone_number = Phone_book.phone_number)
WHERE Phone_book.phone_number IS NULL

Snippets are taken from this Stackoverflow Thread:
http://stackoverflow.com/questions/367863/sql-find-records-from-one-table-which-dont-exist-in-another

Serial Cataloguing

Over the past 6 years, I’ve seen and experienced a few different approaches to cataloguing serials using MARC. After all, typically when you enter an organization, you adopt whatever cataloguing conventions they practise there. This makes a certain amount of sense, since there is always organizational acculturation whenever you begin a new job. Standards are often considered to be guidelines more than rules.

I’m in a very different position now than I have been over the past few years though. At this point, I have people posing questions to me about how they’re “supposed” to catalogue. This is quite another animal all together and as a result actually fairly difficult to answer.

If you consult the following links, you will notice a few different ideas about how people are “supposed” to catalogue serials.

Serial Cataloguing

http://special-cataloguing.com/node/1403

CONSER Cataloging Manual

http://www.itsmarc.com/crs/mergedprojects/conser/conser/contents.htm

University of Illinois at Urbana-Champaign: Serials Cataloging

http://www.library.illinois.edu/cam/procedures/serguide.html

Arizona State Museum: Instructions for Serials Cataloging

http://www.statemuseum.arizona.edu/library/cataloging_manual/serialscat.shtml

OK…but how are you ACTUALLY supposed to do it?

Well, it seems to me that the MARC 362 field (http://www.loc.gov/marc/bibliographic/bd362.html)  is supposed to be used to record the “beginning” and “end” dates of a publication. This may also include the sequential designation (i.e. Vol., No., etc.) in the case of periodical publications.

Then, the MARC 863 (http://www.loc.gov/marc/holdings/hd863865.html) field is supposed to be used to record the actual holdings. In some cases, this might involve multi-part items that are not periodicals, but that’s outside the scope of this post. In regards to serials, there are various levels enumeration, which allow you to specify your holdings at various levels of detail. Perhaps you just want one 863 entry per year. Perhaps you want one every month. I suppose this is where a certain amount of localized convention comes into the picture.
What I would like to point out is that 853 and 863 fields seem to be directly linked, and thus subfields should be used for their linked purpose. If you are marking an item as missing, use a X or Z subfield to write that information out as a “note”. That’s where it belongs. Follow the examples specified in the Library of Congress webpage I have linked to above.

Without a doubt, serial cataloguing is a complicated beast, but hopefully this post will elucidate things a bit and prompt further research on the part of those doing serials cataloguing.

Error Messages in Koha

I noticed a while ago that Koha 3.8.0 showed a “Debug is on (level 1)” message in the member/patron entry template, but I had forgotten about it until someone brought it up again recently.

So I started researching the cause of this Debug level.

I checked the system preference “Debug Level” in the system preferences, but it was turned off (i.e. set to 0). After my research, it seems to me that the Debug Level system preference does very little in Koha. It seems to me that it might handle varying levels of error messages for fatal errors, which I’ve never actually encountered first-hand.

Anyway…

What was causing this Debug level to be set to one?

Well, I found out that the script behind the memberentrygen.tt template was setting a local scalar variable from a global variable ($ENV{DEBUG”).

Ok. That makes sense. I can understand that there are different debug settings. But…where is this $ENV{DEBUG} being set?

Well, I Googled. I grepped. I eventually found a record of an email about an old Koha patch (http://lists.katipo.co.nz/public/koha/2010-December/026789.html) which talked about a command called “Setenv Debug 1” in the koha-httpd.conf file (which is an Apache web server configuration file that tells Apache how to serve and log Koha).

Sure enough…I found that very same command in all of my Apache configuration files! Awesome! If I turn that off, it’ll get rid of that original annoying message!

But…let’s step back for a second…that “Setenv Debug 1” command is probably there for a reason!

If we grep (a Linux/MacOS command/utility) our Koha files, we’ll notice that a variable called $debug is set by $ENV{DEBUG} in other files than just memberentry.pl. In fact, it is set in some pretty important Perl modules that can broadcast it across the entire Koha instance! So…if we grep $debug (remembering to escape the $ sign with a backslash /), we’ll notice that this environmental variable turns on A LOT of back-end system logging.

We definitely don’t want to turn this off…

So…$ENV{DEBUG} is important for our system logs…and the Debug Level system preference is (potentially) important for fatal error logging…

But what about the “Software Error” messages that get up when you make errors in your Perl scripts (we all make mistakes, which is why we test!)? Surely those have to come from somewhere…

Are they affected by this environmental variable and system preference?

As it turns out…nope!

They are produced and handled by Carp.pm, which is part of the core Perl5 lib (to the best of my knowledge). To find it…you can go to your Perl5 lib (probably in usr/lib/perl5) and grep for Carp.pm. You might wind up with a few results. You can also try grepping using some of the text from the “Software Error” messages that you’re receiving in your browser. Note that not all of the text will be “greppable” because the error message text is the product of concatenated (i.e. joined together) variables (i.e. dynamic storage containers) and strings (i.e. lines of text).

Anyway, you’ll find it in Carp.pm.

 

But wait…in your “Software Error” messages…you’ll notice that there is an email address! Where’s that coming from?

Carp.pm will tell you that it’s from $ENV{server_admin}.

Great! But…where is that from?

Well, these folks (http://www.perlmonks.org/bare/?node_id=456111) mention that it is set by the ServerAdmin directive (i.e. command) in the Apache configuration file, which we know is koha-httpd.conf.

Sure enough, we go there, and the address next to ServerAdmin is the same one that we see in our error messages.

Cool, n’est-ce pas?

In all honesty, all these conclusions took some time, and a lot of grepping, Googling, guessing, and reading through lines of code.

But…I fixed the problem and learned a lot doing it!

 

Koha vs The World

Marshall Breeding article about the library automation marketplace in 2010 (the article is from 2011). It talks a fair bit about Koha and quite a few other library management systems (or rather their companies). It mostly looks at the numbers of new customers, new sales, total installed (apparently a grand total despite being near the label for 2010).

As expected, SirsiDynix (Symphony, Horizon) and ExLibris (Voyager, Aleph) are the biggest players. Innovative Interfaces (Millennium) was also another expected powerhouse.

I was surprised to see EOS with ~1000 installs, since I was not very impressed with their software. Of course, that’s not to say that I was impressed by Horizon, Voyager, or Millennium either. It’s just that those latter 3 are huge in academic and public libraries, while EOS markets mainly to special libraries. I’ve only heard of one client who uses EOS and they’ve moved away from it. I know heaps on those other three systems.

Since Koha is supported by multiple vendors, it takes a bit more work to see its net installs, but it’s formidable as well. It’s ~1000 as well (ByWater Solutions, Equinox Software, and PTFS – Liblime). Of course, the numbers for Koha are only for the 3 big US companies. There are lots of smaller vendors and institutions using Koha throughout the USA.

These numbers are also self-reported by the vendors, so…caveat emptor.

http://www.libraryjournal.com/lj/home/889533-264/automation_marketplace_2011_the_new.html.csp

 

Brendan Gallagher, CEO of ByWater Solutions, provided me with this link, which is much more contemporary and very interesting in terms of the migration toward ByWater Solutions by other systems (especially users of the proprietary PTFS – Liblime version of Koha).

This report is also by Marshall Breeding, but these numbers are probably even less comprehensive since they are just compiled from one library listserv.

http://www.librarytechnology.org/ils-turnover.pl?Year=2013

 

Automated estimation of how much the Koha project has cost in terms of coding…

https://www.ohloh.net/p/koha/estimated_cost

 

Kuali is a very well funded academic and research open-source library “environment”, but…it doesn’t look like they’ve gotten very far and it doesn’t look like it was actually designed for librarians or archivists to use…

But it’s an interesting concept. I like that they’re trying to re-envision how “resources” should be handled by an automated system. Yet, one problem with that is the users of this system might completely alienate themselves from many if not all other systems out there. While you could argue that systems that follow existing standards aren’t innovative, they are functional and interoperable.

Mind you, this system seems to want to take plugins and multiple data formats into account, so maybe it really can do it all.

Or rather…maybe it WANTS to do it all, but I think it is a very long way away from achieving that. Presently, this system seems more like an accounting system than a resource management system designed to describe and facilitate access to print and electronic materials…

http://www.kuali.org/ole