Archive for the ‘oclc’ Category

Sunday, November 2nd, 2008

OCLC Policy Change

Here it is:

No comment, as of now. Frankly, I haven’t even read it. February is a long time away. Long enough to discover we’re okay, make a deal or copy OCLC from scratch using nothing but periwinkle ink and passionate book lovers’ time.

Update: Depressing analysis: Terry’s Worklog. Wow.

Update #2: The non-legal page remains up, but the legalese page was taken down very early this morning

“We are reconsidering some aspects of the policy. More information will be available in the near future.”

Damn. I wish I had remembered to copy and paste. Does anyone have the original text? (For example, in your browser cache? I browse cache-less, unfortunately.)

Update #3: See Inkdroid pointing out the “viral” nature of the policy. Over a few years the libraries that now get their data from the Library of Congress, bypassing OCLC, will find uninfected records increasingly scarce. They’ll be forced to join OCLC—or do all their own original cataloging.

Update #4: A librarian-blogger managed to take a snapshot before OCLC took it down, here.

Update #4: Does anyone get Publishers Lunch Plus? Apparently it has an article called “WorldCatFight.” I don’t know the terms on forwarding that, but if it’s legal, can someone send me a copy?

It would certainly be good if publishers got into this. In my fantasy, publishers “pull a reverse-OCLC” and require unlicensed distribution of records derived from their data. Publishers have want their data out there, not restricted, and since OCLC records often start at publishers, this would shut down OCLC’s data-monopoly plans.

Update #5: The terms kill off the Open Library project completely. Not only does it involve viral terms—terms that OL could enver accept—but OCLC libraries are prohibited from participating in anything that “substantially replicates the function, purpose, and/or size of WorldCat, for example for the purpose of providing cataloging services to libraries or other organizations.”

I think that means it kills Talis too.

Update #6: Edward Corrado has an excellent summary of some of the issues.

Update #7: Jonathan Rochkind wrote a good explanation of the difference between an open source viral license—designed to keep things open—and an OCLC viral license—designed to keep them closed. He also suggests a remedy—give OCLC a virus instead, by add an Open Data license to everything your library catalogs!

Labels: oclc

Saturday, June 14th, 2008

OCLC’s non-profit status

The New York Times ran an interesting story on non-profits that act like businesses. Apparently a number of states are taking a hard look at charities that “give nothing away,” or have amassed vast wealth. A lot of day-care centers are worried, as is Harvard, where the endowment tops the GDP of more than 100 counties.*

Of course, my mind went to OCLC, the Dublin, Ohio-based global library-data organization.

OCLC’s core business involves maintaining a central database of cataloging records, largely created by others, which member libraries pay to access. That OCLC was a great invention can hardly be denied. Personally, I think it has become a relic and an danger to the future of libraries. Agree with me on this or not, there’s no question it is highly profitable—driving a steady stream of acquisitions—and in its fee structure calls into question the core idea of the non-profit.

So, why hasn’t someone take away OCLC’s non-profit status?

I Googled it up, and discovered that someone DID! In 1984 Ohio state courts stripped OCLC of it’s charitable status on those very grounds:

“(A)lthough OCLC’s service may greatly enhance the ability of libraries to better serve the public, OCLC essentially offers a product to charitable institutions, for a fee exceeding its cost, and, as the board concluded, is not itself a charitable organization.”

So, what happened?

It seems the Ohio legislature passed some sort of private bill removing Ohio organizations involved in “library technology development” (and starting with the letter “O”?) from the court’s requirements. Well, I guess that’ll do it.

UPDATE: I’m working up a presentation on why OCLC’s (also unfree) Dewey Decimal System needs to be killed-off, and what distributed, open classification could replace it. I’m all ears for anti-Dewey examples. And if any bright young cataloger with no love of Dewey wants to talk to me about heading up the effort, I’d love to hear from you.

*$35 billion, doing a quick check against Wikipedia. Of course, GDP is wiggly as heck.

Labels: DDC, dewey decimal, oclc, tax exemption

Friday, February 15th, 2008

ThingISBN adds LCCNs, OCLC numbers

ThingISBN, our popular ISBN-based API, supports and returns data for two more identifiers: LCCN and OCLC.

At core, ThingISBN—blogged before here and here—takes an ISBN and returns a simple XML list of other ISBNs, corresponding to other “editions” of the work, eg.

Now, if you add &allids=1 to the ISBN, the XML will include relevant LCCN and OCLC numbers, eg.

You can also feed ThingISBN both numbers, eg.,

If you feed it an LCCN or an OCLC number you don’t need to add “&allids=1″ to get back these identifiers.

What’s next?

  • I haven’t added LCCNs and OCLC numbers to the ThingISBN feed, yet.
  • Although there are some details to be worked out, this advance looks forward to adding support for LCCNs and OCLC numbers to LibraryThing for Libraries.

Tell us what’s going on. I know that ThingISBN gets a lot of use, some of it even in accordance with its Terms of Use. If you’re using ThingISBN, I’d love to hear how on a new wiki page I’ve created, Projects Currently Using ThingISBN.

Caveat. ThingISBN is free for non-commercial use. Commercial use requires our say-so. Read more here.

In the news! Coincidentally, LCCNs are in the news this week. Yesterday, the Library of Congress announced a “LCCN Permalink,” a smart bid to convert a vital but underused set of permanent, unique IDs, the LCCN (Library of Congress Control Number), into the regnant permanent, unqiue ID, the URL. See Catalogablog for the announcement.

Labels: apis, lccn, lccns, oclc, oclc numbers, thingisbn

Tuesday, January 8th, 2008

While you were sleeping, ThingISBN got better.

LibraryThing does a lot of cool things nobody else does. And, as we grow, we do them better and better.

I’ve got a very good example for today: the ThingISBN service. It was good when it was launched more than a year ago, becoming LibraryThing’s first API, and it’s been getting better ever since. (And where its competitor became a paid service, ThingISBN is still free for non-commercial use.)

The ThingISBN service provides something called “edition disambiguation.” Give it an ISBN and it will shoot back a list of “related” ISBNs—other editions, other media, and translations. Edition disambiguation is valuable stuff. Retailers use it to aggregate reviews and other data across editions, and to sell you something when the book you searched for is no longer available. Libraries use it to make sure a patron leaves with a copy of a book, even if the edition the patron searched for is checked out.

You can get ThingISBN in two ways:

  • As a REST-based API. Just change the ISBN in this URL as needed.
  • As a complete feed (thingISBN.xml.gz in /feeds). We ask that people not hit the API more than 1,000 times per day. Instead, pick up the full feed.

What’s cool here? LibraryThing isn’t the only supplier of this data. The other supplier, OCLC, the Dublin-Ohio based library data organization, compiles its data through clever automated analysis of OCLC’s billion-plus records. Their data and algorithms do a great job. Unfortunately, they charge for the service, called xISBN.

LibraryThing does it differently, relying instead on members, who add, combine and separate editions by the thousands every day. For doing this, LibraryThing members get better connections with other users. That is, you gain connections and enhanced recommendations by connecting your edition with others. The result is a detailed list set of correspondences between editions, assembled by thousands and improving every day.

You’ve got to admit it’s getting better. If you improve every day, you can get pretty good, and that’s what’s happened to ThingISBN. OCLC still beats LibraryThing in quantity, but LibraryThing is closer, and, it seems to me, has a clear advantage for paperbacks.

I want to revist some of the examples I gave when ThingISBN debuted:

  • OCLC’s canonical example is Frank Herbert’s Dune. I don’t have the exact counts, but LibraryThing originally trailed OCLC. (I know because I used it as example in a number of talks.) As of now, however, LibraryThing has passed OCLC, with 89 ISBNs to OCLC’s 80.
  • Peter Green, Alexander of Macedon. When ThingISBN started, both LibraryThing and OCLC knew the recent hardback, and one other edition. That is, LibraryThing knew the paperback and OCLC knew the 1974 first edition. Since then, LibraryThing has discovered the first edition, giving it three ISBNs; OCLC still doesn’t know about the paperback.
  • Lee Strobel, The Case for a Creator. OCLC knew of two editions, LibraryThing eight. OCLC now knows three, LibraryThing eleven. It’s about paperbacks, obviously.
  • Emily Bronte, Wuthering Heights. Originally LibraryThing had 92 ISBNs, OCLC a commanding 326 ISBNs. OCLC is still in the lead, with 424 ISBNs, but LibraryThing has more than tripled its count, to 285.

Now, I’m quite sure that, overall, OCLC’s xISBN service still beats LibraryThing in coverage. LibraryThing only covers 2.7 million ISBNs. OCLC must cover more.

But LibraryThing is gaining. It’s getting better faster.

And while OCLC continues to sink resources into the project, including staff, now a paid service for all but minimal use as part of its Peace-is-War-ish Openly division, I can tell you honestly that I haven’t touched ThingISBN in six months. I haven’t made it better, even a little. Members made it better.

Now as then, that’s pretty revolutionary stuff.

See you next January, OCLC.

Labels: apis, frbr, oclc, thingisbn, work disambiguation, xisbn

Friday, May 18th, 2007

Why I joined OCLC …

… is the title of a short Library Journal piece by Roy Tennant. In it, Roy, a popular and much respected library speaker and author explains his decision to leave the California Digital Library and take a job at OCLC.

Roy’s decision drew some flack among anti-OCLC librarians and related pundits who view OCLC as—in Steve Oberg’s phrase—the “Microsoft of the library world.”

I’m in that camp, as Roy knows well. After we presented in the same session at Computers in Libraries Roy and I went out to dinner, with another prominent librarian. I subjected them both to a long, Greek-food-fueled rant about open data and the problems with OCLC and its approach to the web. I had my shot. A day or two later, he announced he was moving to OCLC. Apparently I didn’t convince him! :)

OCLC needs people like Roy—passionate librarians with a vision for the future. If OCLC is to change, people like Roy are going to be the ones to do it. I have great hopes for him there.

But they’re not going to do it alone. People on the outside are going to change OCLC too. They’re going to keep the pressure on. I applaud that Roy took the time to explain his move, and his vision for OCLC and the future of libraries. He was eloquent and persuasive. But I’m also glad he felt he had to.

Labels: oclc, roy tennant

Thursday, April 12th, 2007

WorldCat: Think locally, act globally

OCLC just announced a “pilot” of WorldCat Local. In essence, WorldCat local is OCLC providing libraries with a OPAC.

That’s the news. Here’s the opinion. Talis’ estimable Richard Wallis writes:

“Yet another clear demonstration that the library world is changing. The traditional boundaries between the ILS/LMS, and library and non-library data services are blurring. Get your circulation from here; your user-interface from there; get your global data from over there; your acquisitions from somewhere else; and blend it with data feeds from here, there and everywhere is becoming more and more a possibility.”

I think this is exactly wrong. OCLC isn’t creating a web service. They’re not contributing to the great data-service conversation. They’re trying to convert a data licensing monopoly into a services monopoly. If the OCLC OPAC plays nice with, say, the Talis Platform, I’ll eat my hat. If it allows outside Z39.50 access I’ll eat two hats.

They will, as the press release states “break down silos.” They’ll make one big silo and set the rules for access. The pattern is already clear. MIT thought that its bibliographic records were its own, but OCLC shut them down when they tried to act on that. The fact is, libraries with their data in OCLC are subject to OCLC rules. And since OCLC’s business model requires centralizing and restricting access to bibliographic data, the situation will not improve.

As a product, OCLC local will probably surpass the OPACs offered by the traditional vendors. It will be cleaner and work better. It may well be cheaper and easier to manage. There are a lot of good things about this. And—lest my revised logo be misunderstood—there are no bad people here. On the contrary, OCLC is full of wonderful people—people who’ve dedicated their lives to some of the highest ideals we can aspire. But the institution is dependent on a model that, with all the possibilities for sharing available today, must work against these ideals.

Keeping their data hidden, restricted and off the “live” web has hurt libraries more than we can ever know. Fifteen years ago, libraries were where you found out about books. One would have expected that to continue on the web–that searching for a book would turn up libraries alongside bookstores, authors and publishers.

It hasn’t worked out that way. Libraries are all-but-invisible on the web. Search for the “Da Vinci Code” and you won’t get the Library of Congress–the greatest collection of books and book data ever assembled–not even if you click through a hundred pages. You do get WorldCat, seventeen pages in!

The causes are multiple, and discussed before. But a major factor is how libraries deal with book data, and that’s largely a function of OCLC’s business model. Somehow institutions dedicated to the idea that knowledge should be freely available to all have come to the conclusion that knowledge about knowledge—book data—should not, and traditional library mottos like Boston‘s “Free to All” and Philadelphia‘s Liber Libere Omnibus (“Free books for all!”) given way to:

“No part of any Data provided in any form by WorldCat may be used, disclosed, reproduced, transferred or transmitted in any form without the prior written consent of OCLC except as expressly permitted hereunder.”

We now return you to our regularly-scheduled blogging.

Labels: library of congress, oclc, open data, worldcat local

Thursday, March 15th, 2007

thingISBN data in one file

thingISBN is a simple API for discovering related editions. Give it an ISBN and it returns a list of other ISBNs—different formats, translations, etc. We offer the API free for non-commercial use. Today we’re releasing thingISBN in one giant feed, under the same conditions.*

thingISBN is based on LibraryThing’s first-of-its-kind “work” system, by which regular people—LibraryThing members, mostly—combine and separate editions. Members run over 2,000 work-combination actions per day. Although some do it for pure altruism, combining editions helps LibraryThing users by improving the quality of their connections.

LibraryThing’s results compare very favorably with its competition, OCLC’s xISBN service (also free for non-commercial use). xISBN’s coverage is better, but where LibraryThing is built on the collective judgment of humans, xISBN is just a computer algorithm. As the fella says, xISBN is “based on a world which is built on rules and because of that, [it] will never be as strong or as fast as [thingISBN] can be.”**

APIs, while nifty, can be a pain. Both thingISBN and xISBN have a 1,000-per-day limit. So, starting today, thingISBN is also available in feed format—one giant XML file with all the data from over two million unique ISBNs.

Here’s a sample file with just 1000 ISBNs:

As you can see, the format is not ISBN-to-ISBNs. This would involve too much repetition—the full XML file is already 96MB! Instead, it goes work by work, listing the ISBNs inside them:

<work workcode="183">
<isbn uncertain="true">999107371X</isbn>

This format should go into a database well, e.g.,

CREATE TABLE isbn_to_work (
itw_workcode mediumint(8) unsigned NOT NULL,
itw_isbn char(13) NOT NULL,
itw_uncertain tinyint(4) NOT NULL default '0',
PRIMARY KEY (itw_workcode,itw_isbn)

As you can see, some ISBNs are listed as “uncertain.” This happens when an ISBN crosses works. In a perfect world, these works would be combined, but LibraryThing doesn’t do it automatically. There are a couple ways that can go wrong. For example “great books” sets often sport a single ISBN across volumes. It wouldn’t do to combine “Pride and Prejudice” with “Moby Dick” just because their publisher wouldn’t pony up for two ISBNs.

So, you can use the “uncertains” if you are willing to accept more errors. Otherwise, ignore them.

The feed itself is in and is called “thingISBN.xml.gz”. It is 16MB compressed.

We’d love to hear what people are doing with the data.

*Commercial use requires our permission. See
**Okay, the comparison in inexact, but OCLC does have a “Matrix” feel to it.

Labels: apis, frbr, oclc, thingisbn, works, xisbn