Monday, December 3rd, 2007

MARCThing: A simple, self-contained MARC and Z39.50 application

Over the past couple of weeks, LibraryThing has been rolling out major improvements to our cataloging system—a new system for retrieving and parsing book information we’re calling “MARCThing.”

MARCThing is a major advance for LibraryThing. We’ve sunk months of development time into it, but we’re not going to keep it to ourselves. We will be releasing all the code for non-commercial use in libraries and elsewhere.

When the dust settles, LibraryThing members will be able to draw on nearly 700 data sources worldwide, with greatly improved foreign character support and better data manipulation behind the scenes. With MARCThing underneath we will be able to introduce many new features and to reach a truly global audience. But we are confident that developers outside of LibraryThing will find many other, equally compelling uses for MARCThing, and make useful changes and extensions.

What it is. When I was given the task of improving LibraryThing’s cataloging system and other involving library data, I immediately thought of Solr, one of the most influential pieces of software to come out in the past couple of years. The big idea behind Solr is that it provides a “magic box”—an easy, self-contained interface to some very powerful but complex technology, the Lucene search engine. Solr hides the messy details of Lucene from the developer and provides all sorts of extra goodies in a self-contained package. The net result is you can instantly stick an extremely powerful search engine into your project with almost no work. This combination of power and ease-of-use has quickly made it a developer favorite, and spawned all sorts of interesting projects that never would’ve come out without Solr.

I wanted my own magic box that would handle the two main protocols used by libraries to transfer cataloging data, MARC and Z39.50, without anyone having to go into the details of how they work. And since I didn’t want to have to find or build another magic box, ever, I wanted something that could be easily used from any programming language.

Writing it was pretty easy—I used Django for the web part, Pymarc for MARC, and PyZ3950 for the Z39.50 support. With a good software library, working with Z39.50 or MARC records isn’t hard. The hard (or at least time-consuming) part of MARCThing was tracking down servers and dealing with oddball cases. There are many lists of Z39.50 servers out there, but the data is often incomplete, incorrect, or out of date. When you do find a Z39.50 server, oftentimes it’s non-standard in some way, or only has limited functionality. So the process of connecting to libraries using Z39.50 is fraught with guesswork and manual fiddling. That’s bad. The whole point of a standard should be to free you from guesswork.

How to use it. Using MARCThing is simple. Either send it some MARC records or what Z39.50 server you want to search and what you want to search on, and get back XML (or a variety of other formats) that you can use in applications without having to know a lick about library cataloging. All the messy details (and there are a lot of them) are hidden from view. Everything just works. You don’t need to know what a nonfiling indicator or a use attribute is, or the difference between MARC8 and UTF-8. You just need to know how to make an HTTP request.

What I hope is that this inspires allows people not in the library world to do cool things with library data. It’s sad that working with library data is such a hassle — there are so many underused resources out there. I won’t go too much into the technical problems with Z39.50 and MARC, but I do have a recommendation for anybody involved in implementing a standard or protocol in the library world. Go down to your local bookstore and grab 3 random people browsing the programming books. If you can’t explain the basic idea in 10 minutes, or they can’t sit down and write some basic code to use it in an hour or two, you’ve failed. It doesn’t matter how perfect it is on paper — it’s not going to get used by anybody outside the library world, and even in the library world, it will only be implemented poorly.

Open source plans. LibraryThing was already the only major cataloging site that used any library data. (The rest use Amazon’s data exclusively, a severe hurdle to book lovers in the US and an absolute barrier to those in most other countries.) It took us a long time to develop, and we have limited resources. We are not eager to give our competitors such a valuable tool — they can get their own library geeks. At the same time, we are eager to encourage non-profit use and to license its non-competing commercial use for a token amount.

We’re thinking of releasing the code under the Creative Commons Attribution-Noncommercial-Share Alike license, but it will depend on what people want to do with it. If you were bitten by a radioactive librarian and suddenly gained the power to search 700 libraries worldwide, what would you do?

Stay tuned; code is coming soon!

Labels: django, librarything for libraries, marcthing

0 Comments:

Leave a Reply