Archive for the ‘open data’ Category

Tuesday, May 15th, 2012

Harvard University’s 12 million records now in LibraryThing

Short version. Our “Overcat” search now includes 12.3 million records from Harvard University!

Long version. On April 24 the Harvard Library announced that more than 12 million MARC records from across its 73 libraries would be made available under the library’s Open Metadata policy and a Creative Commons 0 public domain license. The announcement stunned the library world, because Harvard went against the wishes of the shared-cataloging company OCLC, who have long sought to prevent libraries from releasing records in this way. (For background on OCLC’s efforts see past blog posts.)

It took a while to process, but we’ve finally completed adding all 12.3 million MARC records (3.1GB of bibliographic goodness!) to LibraryThing. They’ve gone into OverCat, our giant index of library records from around the world—now numbering more than 51 million records! As a result, when searching OverCat under “Add books,” you’ll now see results “from Harvard OpenMetadata.”

This release (“big data for books,” as David Weinberger calls it) is, to put it mildly, a Very Big Deal. Harvard’s collections are both deep and broad, covering a wide variety of languages, fields, and formats. The addition of these 12 million records to OverCat has significantly improved our capacity for the cataloging of scholarly and rare books, and greatly enhanced our coverage generally.

Kudos to Harvard for making this metadata available, and we hope that other libraries will follow suit.

For more on the metadata release, see Quentin Hardy’s New York Times blog post, the Dataset description, or the Open Metadata FAQ. And happy cataloging!

Come discuss here.

Harvard requests and we’re happy to add: The “Harvard University Open Metadata” records in OverCat contain information from the Harvard Library Bibliographic Dataset, which is provided by the Harvard Library under its Bibliographic Dataset Use Terms and includes data made available by, among others, OCLC Online Computer Library Center, Inc. and the Library of Congress.

Labels: cataloging, open data

Thursday, September 17th, 2009

The Amazon policy change, and how we’re responding.

“Amazon Cardboard Boxes” by Flickr member Akira Ohgaki (Attribution 2.0 Generic)

Summary: Amazon is requiring us remove links to other booksellers on work pages. We’re creating a new “Get it Now” page, with links to other booksellers, especially local bookstores and libraries, and a host of new features. Talk about it here.

The challenge. We’re days away from releasing a series of changes to our book pages, both forced and intentional. Amazon is requiring all websites, as a condition of getting any data from them, to have the primary page link to Amazon alone. Links to other booksellers are prohibited. Secondary pages—pages you go to from the primary page—can have non-Amazon links.

Everyone at LibraryThing disagrees with this decision. LibraryThing is not a social cataloging and social networking site for Amazon customers but for book lovers. Most of us are Amazon customers on Tuesday, and buy from a local bookstore or get from a library on Wednesday and Thursday! We recognize Amazon’s value, but we certainly value options.

Importanly, the decision is probably not even good for Amazon. Together with a new request-monitoring system, banning iPhone applications that use Amazon data, and much of their work on the Kindle, Amazon is retreating from its historic commitment to simplicity, flexibility and openness. They won through openness. Their data is all over the web, and with it millions of links to Amazon. They won’t benefit from a retreat here.

But agree or not, we have to follow their terms. We thought long and hard about giving up Amazon data entirely, converting to library data only, in concert with a commercial provider, like Bowker or Ingram, and with help from publishers and members. Unlike our competitors, who are exclusively based on Amazon and who don’t “catalog” so much as keep track of which Amazon items you have, that option is available to us. But we’d lose a lot, particularly book covers. Ultimately, we’ve decided the disadvantages outweigh the benefits.

The Response. Most of all, we think we’ve found a way to give Amazon what they require, and continue to provide members with options: We’re going cut back our primary-page links to Amazon alone, and give people the best, most diverse secondary pages we can make. We are allowed to link to other booksellers, like IndieBound and Barnes and Noble on secondary pages, and we’re going to do it far better than we ever have. We’re going to take something away, but also make something better—something that goes way past what we did before, in features and in diversity of options.

The upcoming “Get it Now” page will go far beyond our current “Buy, borrow, swap” links, with a live new and used price-comparison engine, as well as sections for ebooks, audiobooks and swap sites. The page will be edition-aware, and draw on feeds or live data (so the links work). Many members have wanted live pricing data for the books they already own and these features can be used for that purpose too. We’ll also be doing some stuff with libraries nobody else has, or can, do.

Key to the upcoming Get it Now page is a “Local” module, drawing on LibraryThing Local, showing all the libraries and bookstores near you. Where possible, this list will incorporate holdings data and links to buy—the sort of information you never get from a Google search on a book. If not, we’ll give you their telephone numbers and show you where they are on a map. We’ll make the page customizable, and let members add sources to it.

We think the new page will make a lot of members happy. For one thing, LibraryThing has never been about buying books, so having all these links on a separate page won’t be a great loss. And if the new format doesn’t make members happy, we’ll listen, and together we can plan to take LibraryThing on a truly independent course.

Post your comment here, or come talk about this on Site Talk.

Labels: amazon, apis, google, open data

Thursday, August 7th, 2008

A million free covers from LibraryThing

A few days ago, just before hitting thirty million books, we hit one million user-uploaded covers. So, we’ve decided to give them away—to libraries, to bookstores, to everyone.

The basics. The process, patterned after the cover service, is simplicity itself:

  1. Take an ISBN, like 0545010225
  2. Put your Developer Key and the ISBN into a URL, like so:
  3. Put that in an image tag, like so:
    <img src="">
  4. And your website, library catalog or bookstore has a cover.

Easy details. Each cover comes in three sizes. Just replace “medium” with “small” or “large.”

As with Amazon, if we don’t have a cover for the book, we return a transparent 1×1 pixel GIF image. So you can put the cover-image on OPAC pages without knowing if we have the image. If we have it, it shows; if we don’t, it doesn’t.

The Catch? To get covers, you’ll need a LibraryThing Developer Key—any member can get one. This puts a top limit on the number of covers you can retrieve per day—currently 1,000 covers. In fact, we only count it when a cover is made from the original, o our actual limit will be much higher. We encourage you to cache the files locally.

You also agree to some very limited terms:

  • You do not make LibraryThing cover images available to others in bulk. But you may cache bulk quantities of covers.
  • Use does not involve or promote a LibraryThing competitor.
  • If covers are fetched through an automatic process (eg., not by people hitting a web page), you may not fetch more than one cover per second.

You will note that unlike the new API to our Common Knowledge data, you are not required to link back to LibraryThing. But we would certainly appreciate it.

Caveats. Some caveats:

  • At present only about 913,000 covers are accessible, the others being non-ISBN covers.
  • Accuracy isn’t guaranteed–this is user data–and coverage varies.
  • Some covers are blurrier than we’d like, particularly at the “large” size. This is sometimes about original files and sometimes about our resizing routines. We’re working on the latter.

Why are you doing this? The goal is half promotional and half humanitarian.

First, some background. This service “competes” with Amazons cover service, now part of Amazon Web Services. Amazon’s service is, quite simply, better. They have far more covers, and no limit on the number of requests. By changing the URL you can do amazing things to Amazon covers.

The catch is that Amazon’s Terms of Service require a link-back. If you’re trying to make money from Amazon Affiliates, this is a good thing. But libraries and small bookstores have been understandably wary about linking to Amazon. Recent changes in Amazon’s Terms of Service have deepened this worry.

Meanwhile, there are a number of commercial cover providers. They too are probably, on average, better. But they cost money. Not surprisingly many libraries and bookstores skip covers, or paste them in manually from publisher sites.

That’s too bad. Publishers and authors want libraries and bookstores to show their covers. Under U.S. law showing covers to show off books for sale, rental or commentary falls under Fair Use in most circumstances. (We are not lawyers and make no warrant that your use will be legal.) We’ve felt for years that selling covers was a fading business. Serving the files is cheap and getting cheaper. It was time for someone to step up.*

So we’re stepping up. We’re hoping that by encouraging caching and limiting requests, we can keep our bandwidth charges under control. (If it really spikes, we’ll limit new developer keys for a while; if you submit this to Slashdot, we will be Slashdotted for sure!) And it will be good for LibraryThing—another example of our open approach to data. Although none of our competitors do anything like this—indeed our Facebook competitors don’t even allow export although, of course, they import LibraryThing files!—we think LibraryThing has always grown, in part, because we were the good guys—more “Do occasional good” than “Do no evil.”

If we build it, they will come. If the service really pick up, we’re going to add a way for publishers, bookstores and authors to get in on it. We’d be happy to trade some bandwidth out for what publishers know—high-quality covers, author photos, release dates and so forth. We’ve already worked with some publisher data, but we’d love to do more with it.

*In the past, we had been talking to the Open Libary project about a joint effort. We even sent them all our covers and a key to the identifiers that linked them. But nothing came of it. To some extent that was our fault, and to some extent not. (I think them and us would differ on the blame here.) In any case, I was tired of the time and transactional friction, and wanted to try a different approach.

Labels: apis, book covers, covers, open data

Friday, February 15th, 2008

Take our files, raw.

Short. Here’s a page of our raw graphics files. If you find that fun, have some. If you make an interesting change, all the better.

Long. We believe in openness. But openness is a process. It’s not so much that openness is difficult or painful* it’s that openness is non-obvious. You don’t see each successive layer until you remove the one above it.

Since the site started, we’ve enjoyed kibitzing about how it should look. We’d talk about layout and design. We’d throw up an image and sit back for reactions. Occasionally a user would get inspired and post what they thought something should look like. We just concluded a great exchange about the new “Author” and “Legacy” badges. Members helped us refine the wording and the colors enormously.

Open, right? But wait! Why didn’t we post our raw images for members to play with, if they wanted? You can talk about a GIF, but that’s like asking people to have conversations about a prepared speech.

Frankly, until now, I never even thought of the idea. I’ve never heard of a company that did it. And although it happens on open source projects, it’s not universal. The Open Library project, for example, is a model of openness. You can download both code and data; but you won’t find any design files on the site.

So, why not? We don’t lose trademark or copyright by posting a raw Photoshop file, with layers and alternate versions, anymore than we lose them by posting GIFs and JPEGs. What is the potential downside? Just in case there’s any confusing, we’ve posted a notice about copyright and trademark, but also granted explicit permission to make changes and blog about them.

So, here’s a wiki page for us to post our raw graphics files, and users to view, edit and remix them. It’s a very selective list so far, mostly because I started with what was lying around my on my desktop.**

More, much deeper openness coming next week…

*Although maintaining the “What I did today?” page proved too much work, and it helps that I have very thick skin for most criticism.
**There’s a side-benefit to putting all the files up on the wiki. Last time I lost my hard drive I lost almost no work—it’s all up on the “cloud” these days—except for my Photoshop files.

Labels: love, member input, open data, openness