Monday, July 20th, 2009

LTFL: Non-ISBN Matching

Short Story. We’ve been going through so many big changes at LibraryThing lately that we let a pretty substantial improvement go by without giving it the fanfare it deserves: the LibraryThing for Libraries (LTFL) Cataloging Enhancements now pick up many non-ISBN items. All LibraryThing for Libraries libraries will see better coverage (5-15%), and academic libraries with older materials should be especially pleased:

Some examples:


The coolest thing about the LibraryThing office: Need a photo of an old book? Grab iphone, swivel chair 180 degrees and shoot. Second coolest thing: The only hot Web 2.0 company with a 1774 edition of Terence.

Long Story. Our enhancements usually run on the basis of the ISBN. ISBNs are easy to pick out of the HTML without knowing the structure of the page ( /[0-9Xx]{10,13}/*, if you speak regular expressions*), and most books have them, so they’re our primary way of knowing what content to load for a particular page.

However, as a part of our reviews enhancement, we developed a JavaScript library called the LibraryThing Connector that, among other things, screen-scrapes the title and author of the book out of the HTML. This is what allows our reviews to work on any item a library owns, whether or not it is in LibraryThing or has an ISBN. It’s tricky stuff, because it requires specific code for every type of library software that we provide reviews for.

To get title-matching therefore, we take the title and author extracted by the Connector and feed it to our own “What Work” fuzzy matching API. Of course, this method is far from foolproof, so we err on the side of caution, only loading enhancement data if we’ve got a strong match on both the title and the author. We haven’t seen any false positives yet, but even with being pretty strict about matching, based on real world stats, we’re able to provide around 5-15% more content in the catalog. Academic libraries will get more of a boost out of this, because they tend to have a lot more non-ISBN items than public libraries.

We did this because it’s fun and useful and kind of magic, but more importantly because we want to constantly improve our products. LibraryThing for Libraries is a subscription service. Every year when it is time for a library to renew with us, we want it to be clear that they’re getting something better from us than they were a year ago, and that even better things are in store for the future. It’s more fun and challenging for us that way, but it’s also something we know works pretty well as a business strategy too.

In my mind a big reason why LibraryThing.com has succeeded is that a membership comes with an expectation of improvement. We don’t call a membership an investment, but you get to expect that you will be able to do more and better and cooler things with LibraryThing over time, and that it will become more valuable to you. As a result of this, our members become deeply involved in the site and how it works, and if a LibraryThing membership is a great investment, members end up making an even greater investment of their knowledge and enthusiasm right back. It’s a great thing to be a part of, so I hope it’s a philosophy we can keep bringing to the library world as well. — Casey

*Pace Casey, who wrote this post, ISBNs are/([0-9]{9}[0-9X}|97[89][0-9]{10})/i !

Labels: librarything for libraries, ltfl, new features

0 Comments:

Leave a Reply

WP-SpamFree by Pole Position Marketing