Archive for the ‘open data’ Category

Tuesday, May 15th, 2012

Harvard University’s 12 million records now in LibraryThing

Short version. Our “Overcat” search now includes 12.3 million records from Harvard University!

Long version. On April 24 the Harvard Library announced that more than 12 million MARC records from across its 73 libraries would be made available under the library’s Open Metadata policy and a Creative Commons 0 public domain license. The announcement stunned the library world, because Harvard went against the wishes of the shared-cataloging company OCLC, who have long sought to prevent libraries from releasing records in this way. (For background on OCLC’s efforts see past blog posts.)

It took a while to process, but we’ve finally completed adding all 12.3 million MARC records (3.1GB of bibliographic goodness!) to LibraryThing. They’ve gone into OverCat, our giant index of library records from around the world—now numbering more than 51 million records! As a result, when searching OverCat under “Add books,” you’ll now see results “from Harvard OpenMetadata.”

This release (“big data for books,” as David Weinberger calls it) is, to put it mildly, a Very Big Deal. Harvard’s collections are both deep and broad, covering a wide variety of languages, fields, and formats. The addition of these 12 million records to OverCat has significantly improved our capacity for the cataloging of scholarly and rare books, and greatly enhanced our coverage generally.

Kudos to Harvard for making this metadata available, and we hope that other libraries will follow suit.

For more on the metadata release, see Quentin Hardy’s New York Times blog post, the Dataset description, or the Open Metadata FAQ. And happy cataloging!

Come discuss here.


Harvard requests and we’re happy to add: The “Harvard University Open Metadata” records in OverCat contain information from the Harvard Library Bibliographic Dataset, which is provided by the Harvard Library under its Bibliographic Dataset Use Terms and includes data made available by, among others, OCLC Online Computer Library Center, Inc. and the Library of Congress.

Labels: cataloging, open data

Friday, October 29th, 2010

Better German cataloging from open data


University of Konstanz (Wikimedia Commons)

Casey has just finished loading 1.38 million library MARC records from Konstanz University into LibraryThing’s search index, Overcat.

While Overcat isn’t the only way to find German items–you can search libraries directly–it has become many members’ first source. At 35.2 million items, it’s now considerably larger than any remote source, as well as faster and more diverse. The Konstanz University records jump it up significantly as a German-language source.

Adding the records was possible because Konstanz chose to release the records as “CC-0,” essentially “public domain.” In as much as OCLC has convinced (or intimidated) much of the library world into acting as if library records were private property, this was a brave move.(1) You can read more about the release on the Open Knowledge foundation blog. It’s notable they originally opted for a more restricted, non-commercial license, but, under prompting from German librarians, opened it up all the way.

And what will we do with these records? Evil things! Hardly. LibraryThing has never sold library records and we never will. But the records will make a small percentage of members happy, as their German books suddenly got easier to catalog. These records, in turn, will serve as a scaffold to add other cataloging-like data—what we call Common Knowledge (CK)—all of which is released under a Creative Commons Attribution license. In this way open data improves open data, and everyone is the richer.


1. Their action is especially notable in that German governmental agencies aren’t required to disclaim copyright, as US ones are. Locking up free US government or government-funded library data, as OCLC does, is obnoxious and legally dubious, but Germany has different rules–including a true “database copyright” the United States lacks.

Labels: cataloging, open data, openness

Thursday, September 17th, 2009

The Amazon policy change, and how we’re responding.

“Amazon Cardboard Boxes” by Flickr member Akira Ohgaki (Attribution 2.0 Generic)

Summary: Amazon is requiring us remove links to other booksellers on work pages. We’re creating a new “Get it Now” page, with links to other booksellers, especially local bookstores and libraries, and a host of new features. Talk about it here.

The challenge. We’re days away from releasing a series of changes to our book pages, both forced and intentional. Amazon is requiring all websites, as a condition of getting any data from them, to have the primary page link to Amazon alone. Links to other booksellers are prohibited. Secondary pages—pages you go to from the primary page—can have non-Amazon links.

Everyone at LibraryThing disagrees with this decision. LibraryThing is not a social cataloging and social networking site for Amazon customers but for book lovers. Most of us are Amazon customers on Tuesday, and buy from a local bookstore or get from a library on Wednesday and Thursday! We recognize Amazon’s value, but we certainly value options.

Importanly, the decision is probably not even good for Amazon. Together with a new request-monitoring system, banning iPhone applications that use Amazon data, and much of their work on the Kindle, Amazon is retreating from its historic commitment to simplicity, flexibility and openness. They won through openness. Their data is all over the web, and with it millions of links to Amazon. They won’t benefit from a retreat here.

But agree or not, we have to follow their terms. We thought long and hard about giving up Amazon data entirely, converting to library data only, in concert with a commercial provider, like Bowker or Ingram, and with help from publishers and members. Unlike our competitors, who are exclusively based on Amazon and who don’t “catalog” so much as keep track of which Amazon items you have, that option is available to us. But we’d lose a lot, particularly book covers. Ultimately, we’ve decided the disadvantages outweigh the benefits.

The Response. Most of all, we think we’ve found a way to give Amazon what they require, and continue to provide members with options: We’re going cut back our primary-page links to Amazon alone, and give people the best, most diverse secondary pages we can make. We are allowed to link to other booksellers, like IndieBound and Barnes and Noble on secondary pages, and we’re going to do it far better than we ever have. We’re going to take something away, but also make something better—something that goes way past what we did before, in features and in diversity of options.

The upcoming “Get it Now” page will go far beyond our current “Buy, borrow, swap” links, with a live new and used price-comparison engine, as well as sections for ebooks, audiobooks and swap sites. The page will be edition-aware, and draw on feeds or live data (so the links work). Many members have wanted live pricing data for the books they already own and these features can be used for that purpose too. We’ll also be doing some stuff with libraries nobody else has, or can, do.

Key to the upcoming Get it Now page is a “Local” module, drawing on LibraryThing Local, showing all the libraries and bookstores near you. Where possible, this list will incorporate holdings data and links to buy—the sort of information you never get from a Google search on a book. If not, we’ll give you their telephone numbers and show you where they are on a map. We’ll make the page customizable, and let members add sources to it.

We think the new page will make a lot of members happy. For one thing, LibraryThing has never been about buying books, so having all these links on a separate page won’t be a great loss. And if the new format doesn’t make members happy, we’ll listen, and together we can plan to take LibraryThing on a truly independent course.

Post your comment here, or come talk about this on Site Talk.

Labels: amazon, apis, google, open data

Thursday, February 19th, 2009

Seeing parallels

Steve Lawson wrote this wonderful piece for his blog See also…, reprinted here (by permission) in full:

There is a large organization whose main business isn’t producing information, but instead hosting and aggregating information for many thousands of users on the web. Users upload content, and use the service to make that content public worldwide, and, likewise, to find other users’ content. Then one day the large organization decides to change the rules about how that information is shared, giving the organization more rights–to the point where it sounds to some people like the organization is trying to claim ownership of the users’ content, rather than simply hosting it and making it available on the web.

A small but vocal and influential group of users object to the policy change. The organization protests that it isn’t their intent to fundamentally change their relationship with their users and that legal documents tend to sound scarier than they really are. Most customers are either unaware or unconcerned by the change in policy, but the outcry continues until the organization backs down a bit, sticking with the old policy for the time being. The future, though, is up in the air.

Facebook? Or OCLC?

Perfect, just perfect.

Labels: facebook, oclc, open data, steve lawson

Monday, December 22nd, 2008

LCSH.info, RIP

LCSH.info, Ed Summers’ presentation of Library of Congress Subject Headings data as Linked Data, has ended. As Ed explained:

“On December 18th I was asked to shut off lcsh.info by the Library of Congress. As an LC employee I really did not have much choice other than to comply.”

I am not as up on or enthusiastic about Ed’s Semantic-Web intentions, but the open-data implications are clear: the Library of Congress just took down public data. I didn’t think things could get much worse after the recent OCLC moves, but this is worse. The Library of Congress is the good guy.

Jenn Riley put it well:

“I know our library universe is complex. The real world gets in the way of our ideals. … But at some point talk is just talk and action is something else entirely. So where are we with library data? All talk? Or will we take action too? If our leadership seems to be headed in the wrong direction, who is it that will emerge in their place? Does the momentum need to shift, and if so, how will we make this happen? Is this the opportunity for a grass-roots effort? I’m not sure the ones I see out there are really poised to have the effect they really need to have. So what next?”

The time has come to get serious. The library world is headed in the wrong direction. It’s wrong for patrons—and taxpayers. And it’s wrong for libraries.

By the way, Ed, we’re recruiting library programmers. The job description includes wanting to change the world.

See also: Panlibus.

Labels: library of congress, open data