Archive for December, 2007

Sunday, December 16th, 2007

Library of Congress report comments

Midnight was the deadline for public comment on the Library of Congress report*. As the time-stamp on the auto-reply attests, I submitted my comments at 11:59:53. I could have used those seven seconds!

I’m not satisfied with my letter. But others have posted theirs, and I ought to do the same. (For starters, see the list collected by Karen Schneider.) I had planned a point-by-point analysis of the report, but found myself increasingly drawn into and depressed by the larger issues.

Meetings and committees bore and irritate me–who am I to pretend? I have no faith in a committee to solve the data problems of library-land. The problems are too deep, and time is running out. If the problems are to be solved, it will be on a less loftly level. I have more faith in what might come from a startup or a conversation at Code4Lib than I have in any committee, no matter how well intentioned. All the outsiders need is the freedom to act. Above all that means open data.


In general, I applaud your report. You identify important issues and think creatively about them. I’m sure some good will come of it.

Unfortunately, I am cynical about the ability of your committee, or any similar committee, to bring about meaningful, timely change. Bibliographic data is a big ball of rope with the ends hidden away inside–a Gordian Knot, if you will.

Like the Gordian Knot, the current situation is a powerful reminder of greatness won and deserved–and a totally mess. The data formats are inadequate, but insofar as few systems have made full use of them anyway, fixing the format is not enough. The ILS vendors are barriers to change, but libraries’ purchasing cycles and priorities made them so. Library culture is a pillar of democratic culture, but does not appropriately value initiative or rapid change; too many forward-thinking librarians I know are waiting on their boss’ retirement. The most important library institution, OCLC, subsists upon the continuance of the current regime, and is powerful enough to maintain that need. And–for all its virtues–the LC has no clear mandate in this area.

When a system is broken, but self-reinforcing, you need something outside the system to effect change. So the library world needs outsiders. Some will be true outsiders. LibraryThing is one of those, I suppose. But most will be librarians and other library professionals acting outside the culture and institutional structures of the library world.

What the outsiders want is freedom of action. They don’t want the “sharing” of data, but open data. Open up your data fully, and the change you plan for will spring up when you’re not looking.

Change across an entire profession and industry is hard. But you can be the change yourself. The Library of Congress’ data is already legally free, and rightly so. The constraints are all technical–the big red books, unavailable in digital form, the periodic and expensive CDs of MARC data, the OPAC no search engine can spider, and so forth. You are just a half-step away from true openness.

You can’t force the future, but you can lead it. If the LC were to throw its weight behind true, radical openness, that would really be something.

I wish you all the best.

Best,

Tim Spalding
LibraryThing Founder

*The “Draft Final Report of the Working Group on the Future of Bibliographic Control.” What an ungainly phrase.
**I submitted a version with a spelling correction at 12:03. However, it was still yesterday somewhere, right?

Labels: library of congress report

Saturday, December 15th, 2007

Headless body in topless bar*

I love book covers; I designed one of my wife‘s—not the best, but it broke a long publisher/author stalemate—and marveled as the rest went by. Cover design is a dark art. The right cover is crucial to the success of a novel, but designers can’t afford to spend too much time or money on them, and seek safety in herds.

Book Cannibal has a funny piece on The Mystery of the Decapitated Cover Models, “There are certain trends in publishing that baffle me. … [W]hat’s with all these covers that feature half of some girl’s face?” They start with the Gossip Girl books, but they’re everywhere, including LibraryThing author Elizabeth Bear‘s novels.

Book Cannibal wonders about the phenomenon, providing one good explanations, via her editor:

“The editor explained that B&N wants covers with live models (as opposed to scenery, or abstract painting, or an icon). Sometimes, the models aren’t quite the right age (I’m guessing this is the case re: Gossip Girls), but if you cut off part of their face, voila! Youth. You can slice away the years.”

Readers provide some others. I’m minded to look a few years back, when feet and shoes were all the rage, as Trashionista reminds us (see also GalleyCat’s dissection). First it was just feet, now we’re up to the jaw. This is progress, I suppose.

Anyway, I spent an enjoyable hour surfing book covers. This ended in 20 minutes of uncontrollable laughting at Smart Bitches Who Love Trashy Books, including this cover, with the nice observation “Why is the executive wearing a prep school jacket?”

*Famous New York Post headline, also the title of a book subtitled “The Best Headlines from Americas Favorite Newspaper.”

Labels: book covers, fun

Tuesday, December 11th, 2007

Open data and the Future of Bibliographic Control

We’ve got until December 15th to submit comments on the draft report produced by the Working Group on the Future of Bibliographic Control.

No—keep reading! This is important. People in the library profession need to be involved in this stuff. Further, people outside the profession need to be involved too. As the report notices, library data is used by many outside the library world, starting with library patrons, and extending even to Amazon.com. It shouldn’t go unnoticed, for example, that draft report mentions LibraryThing four times. For while LibraryThing uses library data, it was invented by and is mostly used by non-librarians.

Aaron Swartz, the dynamo behind Open Library, sent me a note about one important aspect of the draft report, namely what it’s missing: It doesn’t mention open data. There is serious discussion about sharing, but also the alarming proposal that the LC attempt to recoup more money from the sale of it’s data. That’s a shame. I’m not alone in believing that open access to library data is the future. A report about the future should confront the future.

The economy of library records is a complex one but not primarily a free one. By and large libraries pay the Dublin, Ohio-based OCLC for their records, even if the records were created at government expense. That model looks increasingly dated. And it is killing innovation.

It hasn’t killed LibraryThing yet, but the specter has always hung over our head. It’s why LibraryThing has—so far—not pitched itself to small libraries. OCLC doesn’t care about personal cataloging, and the libraries we use are—in every conversation I’ve had—enthusiastic about what we do. They want their data out there; they’re libraries for Pete’s sake! But if we offered data to public libraries we’d be cutting into the OCLC profit model. That could be dangerous.

Aaron invited me to sign onto a list of people interested in the issue. I did so. I invite you—any of you—to do so as well. The text says it perfectly:

“Bibliographic records are part of our shared cultural heritage and should be made available to the public for re-use without restriction. This will allow libraries to share records more efficiently, but will also make possible more advanced online sites for book-lovers, easier analysis by social scientists, interesting visualizations and summary statistics by journalists and others, as well as many other possibilities we cannot predict in advance.”

“Government agencies and public institutions are increasingly making data open. We strongly encourage the Library of Congress to join this movement by recommending that more bibliographic data is made available for access, re-use and re-distribution without restriction.”

The petition is here: http://www.okfn.org/wiki/OpenBibliographicData .

Labels: library of congress, open data, open library, Working Group on the Future of Bibliographic Control

Monday, December 10th, 2007

SantaThing: Social networking goes Santa

Shameless cross-post from the main blog, but I want all my Lib 2.0 chums to come and join the new Secret Santa system, SantaThing, I cooked up last night. Secret Santa for booklovers. Can you resist?

Labels: new feature, santathing, secret santas

Friday, December 7th, 2007

Someone poke Syria

News reports* indicate that Syria is now blocking Facebook, alleging Israeli spies were infiltrating Syrian social networks. One can doubt that given some of the other sites on Syria’s blocked list—YouTube, Blogspot, Hotmail, Skype and—wait for it—Amazon.

But not LibraryThing! Syria bans sites, China bans sites. Heck the UAE just banned Twitter!** But we never get banned, by those guys or anyone else; our competitors don’t get banned either. I’m almost sorry about it. YouTube might someday bring down a government, but people talking about books does it all the time.


*See: SeattlePI, Washington Post/Reuters, Mashable, Fox/AP, Jerusalem Post.
**As a certified member of the Web/Lib 2.0 set I’m supposed to think Twitter is a serious thing. I don’t. I don’t fall for the argument that only books matter, or that blogs are giving us the attention spans and intellectual perspicacity of squirrels. And I don’t think social networking is vain shadow of real-life connection. But there is some lower limit to the length of an idea and the depth of a connection—and Twitter is it!

Labels: censorship, syria

Wednesday, December 5th, 2007

Here comes another bubble!

It’s funny because it’s true.

Ultimately I think it’s all about one thing. The hype isn’t misplaced, but a lot more value is created than captured. Facebook touches more people than Ford does, and, I suspect brings them more joy. But Facebook can’t get at that value the way Ford does.

Google got at a tiny slice of the value they create with AdWords, and it’s making them insanely wealthy.

LibraryThing also makes lot of people insanely happy. While we capture some value—it helps that our engineers are few and pay for their own beer—we’re never going to capture most of the value we create.

Good, I think.

Labels: Uncategorized

Monday, December 3rd, 2007

MARCThing: A simple, self-contained MARC and Z39.50 application

Over the past couple of weeks, LibraryThing has been rolling out major improvements to our cataloging system—a new system for retrieving and parsing book information we’re calling “MARCThing.”

MARCThing is a major advance for LibraryThing. We’ve sunk months of development time into it, but we’re not going to keep it to ourselves. We will be releasing all the code for non-commercial use in libraries and elsewhere.

When the dust settles, LibraryThing members will be able to draw on nearly 700 data sources worldwide, with greatly improved foreign character support and better data manipulation behind the scenes. With MARCThing underneath we will be able to introduce many new features and to reach a truly global audience. But we are confident that developers outside of LibraryThing will find many other, equally compelling uses for MARCThing, and make useful changes and extensions.

What it is. When I was given the task of improving LibraryThing’s cataloging system and other involving library data, I immediately thought of Solr, one of the most influential pieces of software to come out in the past couple of years. The big idea behind Solr is that it provides a “magic box”—an easy, self-contained interface to some very powerful but complex technology, the Lucene search engine. Solr hides the messy details of Lucene from the developer and provides all sorts of extra goodies in a self-contained package. The net result is you can instantly stick an extremely powerful search engine into your project with almost no work. This combination of power and ease-of-use has quickly made it a developer favorite, and spawned all sorts of interesting projects that never would’ve come out without Solr.

I wanted my own magic box that would handle the two main protocols used by libraries to transfer cataloging data, MARC and Z39.50, without anyone having to go into the details of how they work. And since I didn’t want to have to find or build another magic box, ever, I wanted something that could be easily used from any programming language.

Writing it was pretty easy—I used Django for the web part, Pymarc for MARC, and PyZ3950 for the Z39.50 support. With a good software library, working with Z39.50 or MARC records isn’t hard. The hard (or at least time-consuming) part of MARCThing was tracking down servers and dealing with oddball cases. There are many lists of Z39.50 servers out there, but the data is often incomplete, incorrect, or out of date. When you do find a Z39.50 server, oftentimes it’s non-standard in some way, or only has limited functionality. So the process of connecting to libraries using Z39.50 is fraught with guesswork and manual fiddling. That’s bad. The whole point of a standard should be to free you from guesswork.

How to use it. Using MARCThing is simple. Either send it some MARC records or what Z39.50 server you want to search and what you want to search on, and get back XML (or a variety of other formats) that you can use in applications without having to know a lick about library cataloging. All the messy details (and there are a lot of them) are hidden from view. Everything just works. You don’t need to know what a nonfiling indicator or a use attribute is, or the difference between MARC8 and UTF-8. You just need to know how to make an HTTP request.

What I hope is that this inspires allows people not in the library world to do cool things with library data. It’s sad that working with library data is such a hassle — there are so many underused resources out there. I won’t go too much into the technical problems with Z39.50 and MARC, but I do have a recommendation for anybody involved in implementing a standard or protocol in the library world. Go down to your local bookstore and grab 3 random people browsing the programming books. If you can’t explain the basic idea in 10 minutes, or they can’t sit down and write some basic code to use it in an hour or two, you’ve failed. It doesn’t matter how perfect it is on paper — it’s not going to get used by anybody outside the library world, and even in the library world, it will only be implemented poorly.

Open source plans. LibraryThing was already the only major cataloging site that used any library data. (The rest use Amazon’s data exclusively, a severe hurdle to book lovers in the US and an absolute barrier to those in most other countries.) It took us a long time to develop, and we have limited resources. We are not eager to give our competitors such a valuable tool — they can get their own library geeks. At the same time, we are eager to encourage non-profit use and to license its non-competing commercial use for a token amount.

We’re thinking of releasing the code under the Creative Commons Attribution-Noncommercial-Share Alike license, but it will depend on what people want to do with it. If you were bitten by a radioactive librarian and suddenly gained the power to search 700 libraries worldwide, what would you do?

Stay tuned; code is coming soon!

Labels: django, librarything for libraries, marcthing