RTE recently carried a news piece announcing the release of a large volume of historical digital images by the National Library of Ireland. One detail in the piece caught my eye: it quoted the library’s Digitisation Programme Manager Sara Smyth as saying
Since 2010, we have overhauled our digitisation workflows and put in place key technical infrastructures. We achieved this with limited full time technical resources and a very restricted budget by collaborating on international open source projects.
(emphasis added). This sounded like a great thing. I wondered what projects these were, so emailed Ms Smyth. She and a colleague wrote back, kindly supplying plenty of detail, which this post reports.
Engagement with open source projects generally
Perhaps unsurprisingly, Systems Librarian Eoghan Ó Carragáin says that the library community has meshed well with the Free / Open-Source Software community for some time:
There is a long tradition of open-source within the library domain and a very active community. One of the main channels for communication is the code4lib.org mailing list and conference, as well as other more targeted groups such as the annual Open Repositories conference.
From my outsider’s point of view, it seems there is a natural alignment of philosophy between free-as-in-freedom software and a library’s role in preservation. For an archived digital object to be useful, we need to be able to interpret it; this means having appropriate software. If the software is free software, then we are guaranteed access to it, and therefore to the archived digital object, indefinitely. The freedom to study the software’s source code means we have the knowledge required to interpret the digital object. This also applies if we are talking about metadata in digital form, for example a library’s catalogue.
Eoghan explains the evaluation strategy used, and the strengths of free-software solutions:
We evaluate open and closed-source options equally when planning a new implementation. In many cases it makes sense to adopt an existing commercial system. However, in the case of the some of the NLI’s core library systems, open source products such as VuFind offered functionality which simply weren’t offered by commerical solutions, and provided a degree of flexibility which met the NLI’s requirements more closely.
It seems likely to me that this flexibility comes from the freedom the NLI enjoys to adapt the software to its (and other libraries’) needs.
In the particular case of VuFind, the NLI’s involvement was extensive:
The main project to date has been the VuFind discovery interface, an Apache Solr based search interface lead by Villanova University in the United States. The NLI was an early adopter for the software and has played an active role in terms of contributing new features, project administration, road-map planning, as well as community support via mailing list contribution and conference presentations. When developing new functionality and features of general interest to the VuFind community, the NLI adopted a policy of developing directly against the upstream/trunk code-base thereby enhancing the core product while also distributing the maintenance cost of new features across the community.
Other free / open-source software at the NLI
The library makes use of free / open-source software in many areas; they gave me several examples. I am not familiar with the world of library and archival software, so had to look most of them up. There is some great work going on:
Image courtesy of
the National Library of Ireland.
- The VuFind project aims to be a replacement for the traditional on-line library catalogue; it has a demo system where you can experiment with it.
- The IIPImage system makes high-resolution explorable images available on the web. For example, the NLI’s presentation of an engraving of W. R. Hamilton, shown to the right.
- The Internet Archive BookReader is a system for presenting scanned-in books on the web: see their demo of a bird book.
- Solr is the Apache project’s search and indexing system. (This one I had heard of.) SolrMarc allows you to feed MARC records into your Solr system; the MARC formats are standards for machine-readable bibliographic and related information.
- OpenRefine is a system (previously known as Google Refine) for working with and cleaning up messy raw data. Looks useful for the sort of data where fully automated processing is not possible, but a semi-automatic approach, driven by human intelligence, is efficient.
- JHove is the JSTOR/Harvard Object Validation Environment, which is concerned with the problem of identifying, verifying, and characterising digital objects.
- FedoraCommons is not, as I first thought, something to do with the RedHat / Fedora Linux distribution, but rather a system for ensuring ‘durable, persistent access to digital data’.
A final example, where the NLI’s involvement is at an earlier stage:
… [I]n the case of our digital asset management system we have begun working with the Hydra Framework which is also being used by the Digital Repository of Ireland.
The Hydra Framework is a system for the deployment of ‘robust and durable digital repositories (the body) supporting multiple “heads”’.
Confirming the technology in use, there is currently a job opening for a software engineer at the Digital Repository of Ireland: The required and/or beneficial technical skills are heavily free-software / open-source: Ruby on Rails, Linux, Perl, Python, MySQL, Solr, vagrant, virtualbox.
Many thanks to Sara Smyth and Eoghan Ó Carragáin for their time in providing the above information.
[Updated 20140606 to include engraving of W. R. Hamilton.]