March | 2016 | Libraries && Technology

First of all, I love the openness of open source. When you’re a curious person, you’re able to actually find the reasons why the tools you’re using behave in the way that they do.

Second, I love that I can decide to teach myself how to program in C, and then just go off and do it. The Internet is a wonderful resource, basic hardware is available cheaply, and free software like the free GNU Compiler Collection (GCC) means that you can code without opening your wallet.

Long story short, I was writing a C function to read arbitrarily long lines from a file (ie a function to automatically reallocate memory to the buffer as needed whilst reading from the file), and the snippets I was looking at online showed that people were doing things like “fgets(buffer + last, length, file)”. I found “buffer + last” to be utterly confusing.

“last” is an integer while “buffer” is a char pointer, which starts to make sense when you look into pointer arithmetic and the fgets source code: http://mirror.fsf.org/pmon2000/3.x/src/lib/libc/fgets.c

It all seemed straight forward enough to me except for two lines:

*p++ = c;

*p = 0;

“c”is a character obtained from a file via fgetc, and “p” is a copy of the buffer pointer.

It was clear that “*p++ = c;” was adding the “c” character onto the character array, but how? Well, according to the order of operations, it was really doing “*p = c; p++”, which meant it was dereferencing the “p” pointer and assigning the c variable value to that memory location. Then, it was changing the pointer to point to the next memory address.

That’s where *p = 0 comes in. We’ve moved on to the next address via that last “p++” so now we’re setting the next bit of memory to 0, which – if we look at our ascii table at http://www.asciitable.com/ – is decimal for NULL. Since C strings are null terminated, we’re indicating that’s the end of our string.

I found it confusing that when I printed “buffer” after the fgets() that it was showing the full string since I thought that I had moved the pointer to a different memory address, but then I realized it was actually the “p” pointer which had been subject to the pointer arithmetic and not the original “buffer” pointer. Of course “buffer” was still pointing to the same place, which allowed me to read the whole string out of the buffer, rather than just the last character assigned.

Now I understand how my C program is reading lines of arbitrary length from a file and printing them out in the terminal window.

With the help of “Valgrind”, I can also see that I’m not leaking memory as well, as I’m making sure to free() my pointers where needed. Really like Valgrind actually, as it also points out other mistakes which don’t necessarily lead to segfaults and such.

Now if I were really keen, I could probably optimise how much memory is reallocated when reading really long lines (in better ways than how others describe online), but there’s probably better ways to spend one’s time.

Now that I better understand pointers, pointer arithmetic, strings, and reading from files, I could in theory create a library for handling MARC records in C, or I could make a really basic encryption program, or both! At the very least, I’ll have a better understanding of how to read the source code of the Zebra indexing engine used by the Koha LMS. In fact, at this point, I might be able to start contributing patches to Zebra!

When I started working on Koha back in 2012, I had never coded in Perl. I had never used Template Toolkit. I had never used Git. I thought Zebra was just an animal, and that Apache referred to helicopters and some people indigenous to America. Now I can use these tools in my sleep. Considering that Perl, Git, and Zebra are all written in C, perhaps this is the next step in understanding those tools, how best to use them, how to fix them, and how to improve them.

Libraries && Technology

Technology tips, tricks, and tidbits from a Systems Librarian

Monthly Archives: March 2016

Lessons in C: teaching myself pointers and pointer arithmetic