Archive for the ‘works’ Category

Thursday, February 10th, 2011

LibraryThing gets work-to-work relationships!

Today we’ve launched some new ways to display relationships between works.

The concept covers works that contain other works, or are contained by them. It also covers retellings, abridgments, parodies, commentaries on and so forth.

Thus, LibraryThing members will be able to add relationships that show:

A core concept here is that this is only for work-level relationships. Therefore, we are not doing “translation of,” “facsimile edition of,” etc. Members are asked to connect only existing works, not make up new, so-far uncataloged works.

Come discuss rules, concepts and ideas in the Talk topic.

We’ve got a lot more coming that builds and expands on these capabilities, so stay tuned!

Many thanks to the members of Board for Extreme Thing Advances group, who’ve been helping us develop and refine this feature. They have already added some 4,500 contains/contained-in relationships across LibraryThing.

Labels: cataloging, work pages, works

Thursday, May 22nd, 2008

Works, editions, ISBNs and cocktails


We got your Harry Potter and the Angus an Orchloch right here!

Short verson.I’ve just completed a major change in the “substructure” of LibraryThing’s data, the “works system” that links different editions together. The system is better and will allow more betterness down the road. It was the reason we were down most of last night. We regret that, but think the change will prove worth it.

Long version—What are “Works?” LibraryThing’s work system brings users together around the books they’ve read, not the peculiarities of publisher, format or even language. Works are created and tended by members, who “combine” editions together into works. Anyone can do it, but the die-hards created a large and active group—Combiners!—to trade tips, debate philosophy, muster effort—and complain about the system!

Combiners is a remarkable community, and one that has gone without a nod from me for some time. I hope these changes encourage them, and the prospect of future improvements built on surer footing.


The Combiners! know the stakes, as their group logo tells us.

Since the beginning I’ve promoted the idea of the “cocktail party” test.* This test answers whether two books belong to the same work by asking whether their readers would, in casual conversation, own up to reading the same book or not. So, for example, in such a context it wouldn’t matter if you had read a book in its hardcover or paperback edition, or listened to it on CD. If the cute girl with the backless dress mentions she’s fond of the Unbearable Lightness of Being, the edition is immaterial (but see this link). I also suspect that title differences occasioned by marketing considerations—eg., Harry Potter and the Philosopher’s Stone (UK) vs. Harry Potter and the Sorcerer’s Stone (US)—wouldn’t matter. Nor should language itself matter; few would turn a cold shoulder to a Finnish Tolkien fan merely because he read Tolkien in Finnish.**

What’s Changed? The core concept used to be that a work consisted of a discrete set of title-author pairs. We chose title-author to emphasize the loose, verbal nature of the cocktail party test, and because ISBNs are much less perfect than many believe.*** These title-author pairs we called “editions.”

Unfortunately, there are a small number of works that can’t be identified based on title and author alone. This happens particularly in science fiction and graphic novels. (Apparently the Fantasy Hall of Fame currently entombs two distinct works—same title, same authors but different contents and publisher. Someone should be punished for that.) My bête noire are Cliff’s Notes filed in with the works they “interpret.” No appletini for you “Great Expectations”!

The system still automatically assigns new editions based on author and title. But I’ve added ISBNs to the mix, so members can combine and separate editions looking at and according to their ISBNs.

Other changes:

  • Title-author-ISBN bundles are now distiguished by the smallest details, so you can separate “Hard Times” from “Hard times” from “Hard times” with a period at the end. It has vastly increased the number of editions in the system. (There are now more than 1,200 editions of the Hobbit!) This is was mostly a technical decision.
  • The original system produced a few “hash collisions,” utterly different books thrown in together unhappily. This has been a long-running defect—and complaint. The new system will allow their separation, although existing ones will need to be separated.
  • The Combination and Debris (renamed “Editions”) pages should be faster. Some will start—and stall!—on a message about updating edition information. Once the editions have been calculated, the page will be faster.

As mentioned above, the new system was responsible for our extended downtime last night. Between a few mistakes and a database just shy of 27 million books, it took longer than we thought. I hope that the changes prove worthwhile in and of themselves.

Being much better designed, the new system should enable:

  • Edition-level pages
  • Edition-to-edition and work-to-work relationships
  • Member and book matching that takes editions into account
  • An end to the “dead languages” exception to the cocktail party test.
  • More opportunities for me to discuss the Pop-Up Kama Sutra at library conferences.

I’ve created a Talk thread for members who want to discuss the changes.


*Perhaps wishing I’d get invited to a few more cocktail parties! Speaking of which, are you going to Book Expo America 2008 in Los Angeles? We are.
**Whether you choose to avoid the Finnish Tolkien fan at cocktail parties is, of course, up to you.
***In fact, publishers recycle ISBNs, steal ISBNs, make up ISBNs, print wrong ISBNs, apply ISBNs to large sets of seemingly discrete items and otherwise abuse the system all the time. Most of the time they work in a bookstore context. They aren’t really fit for a project of LibraryThing’s size and scope.

Labels: frbr, library science, new feature, new features, work pages, works

Thursday, March 15th, 2007

thingISBN data in one file

thingISBN is a simple API for discovering related editions. Give it an ISBN and it returns a list of other ISBNs—different formats, translations, etc. We offer the API free for non-commercial use. Today we’re releasing thingISBN in one giant feed, under the same conditions.*

thingISBN is based on LibraryThing’s first-of-its-kind “work” system, by which regular people—LibraryThing members, mostly—combine and separate editions. Members run over 2,000 work-combination actions per day. Although some do it for pure altruism, combining editions helps LibraryThing users by improving the quality of their connections.

LibraryThing’s results compare very favorably with its competition, OCLC’s xISBN service (also free for non-commercial use). xISBN’s coverage is better, but where LibraryThing is built on the collective judgment of humans, xISBN is just a computer algorithm. As the fella says, xISBN is “based on a world which is built on rules and because of that, [it] will never be as strong or as fast as [thingISBN] can be.”**

APIs, while nifty, can be a pain. Both thingISBN and xISBN have a 1,000-per-day limit. So, starting today, thingISBN is also available in feed format—one giant XML file with all the data from over two million unique ISBNs.

Here’s a sample file with just 1000 ISBNs:
http://www.librarything.com/feeds/thingISBN_small.xml

As you can see, the format is not ISBN-to-ISBNs. This would involve too much repetition—the full XML file is already 96MB! Instead, it goes work by work, listing the ISBNs inside them:

<work workcode="183">
<isbn>0802150845</isbn>
<isbn>0802143008</isbn>
<isbn>2020006014</isbn>
<isbn>0745300359</isbn>
<isbn>0394179900</isbn>
<isbn>9867574397</isbn>
<isbn uncertain="true">999107371X</isbn>
</work>

This format should go into a database well, e.g.,

CREATE TABLE isbn_to_work (
itw_workcode mediumint(8) unsigned NOT NULL,
itw_isbn char(13) NOT NULL,
itw_uncertain tinyint(4) NOT NULL default '0',
PRIMARY KEY (itw_workcode,itw_isbn)
)

As you can see, some ISBNs are listed as “uncertain.” This happens when an ISBN crosses works. In a perfect world, these works would be combined, but LibraryThing doesn’t do it automatically. There are a couple ways that can go wrong. For example “great books” sets often sport a single ISBN across volumes. It wouldn’t do to combine “Pride and Prejudice” with “Moby Dick” just because their publisher wouldn’t pony up for two ISBNs.

So, you can use the “uncertains” if you are willing to accept more errors. Otherwise, ignore them.

The feed itself is in http://www.librarything.com/feeds/ and is called “thingISBN.xml.gz”. It is 16MB compressed.

We’d love to hear what people are doing with the data.

*Commercial use requires our permission. See http://www.librarything.com/api.php.
**Okay, the comparison in inexact, but OCLC does have a “Matrix” feel to it.

Labels: apis, frbr, oclc, thingisbn, works, xisbn

Wednesday, June 14th, 2006

Introducing thingISBN

UPDATE: thingISBN is now also availabe in feed format.

Many of you are familiar with OCLC’s xISBN service. Give it an ISBN and it returns a list of “associated” ISBNs from WorldCat. So—xISBN’s canonical example goes—give it an ISBN for one edition of Dune, and it will return a list of ISBNs of other editions, in XML format. This is red meat for mashups. (Speaking of which, did you know about Talis’ Mashing up the Library competition?)

Today I’m releasing “thingISBN,” LibraryThing’s “answer” to xISBN. Under the hood, xISBN is a test of FRBR, a highly-developed, well thought-out way for librarians to model bibliographic relationships. By contrast, thingISBN is based on LibraryThing’s “everyone a librarian” idea of bibliographic modeling. Users “combine” works as they see fit. If they make a mistake, other users can “separate” them. It’s a less nuanced and more chaotic way of doing things, but can yield some useful results.

To use thingISBN, point your browser at a URL like this, replacing the ISBN as appropriate:

To compare xISBN and thingISBN add &compare=1

thingISBN vs. xISBN.
UPDATE: OCLC has disallowed comparison.
I’ve done some preliminary comparisons between the two services. The results are pretty interesting. For starters, OCLC has much broader ISBN coverage. The dataset is orders larger, and “regular people” just don’t own certain books. Where the data sets overlap, however, LibraryThing can contribute a lot, particularly when it comes to paperbacks and non-US editions.

Examples:

  • 031228884 (Elizabeth Cook, Achilles). Recently-published novel. OCLC and LibraryThing know about two ISBNs. LibraryThing adds two others, a UK hardback and a UK paperback.
  • 0553212583 (Wuthering Heights). OCLC and LibraryThing share 60 editions. OCLC alone knows 266. LibraryThing alone knows 32.
  • 0520071654 (Peter Green, Alexander of Macedon…). OCLC and LibraryThing both know this hardcover ISBN. LibraryThing knows the paperback, but OCLC includes the 1974 first-edition.*
  • 0310241448 (Lee Strobel, The Case for a Creator). OCLC and LibraryThing know of one hardcover edition. OCLC knows of no other editions. LibraryThing knows of seven others. Wow.**
  • 0393049841 (Jason Epstein, The Book Business). OCLC and LibraryThing share two ISBNs. OCLC knows one by itself. LibraryThing also knows one by itself, but it’s to Simple Pineapple Crochet. Yes, you read that right. I’m not sure where the error is from, but it’s either a pitfall of the “everyone is a librarian” system, or of LibraryThing’s occasionally ratty data.

Mashups? I brought out thingISBN in part to provide more grist for Talis’ Mashing up the Library competition. I was careful to make thingISBN’s output follow the conventions of xISBN, so that existing xISBN code could be reused. I’m looking forward to see if anyone does anything with it. (One obvious application would be as an addition to LibX, an open-source Firefox extension that leverages xISBN to help you find things in your library. Here’s an excellent screen cast of it at work.)

As usual, comments, criticisms, bug reports and feature requests are asked for and gratefully received.

The fine print. By using thingISBN you agree to the following terms and conditions:

  • thingISBN is available for non-commercial use only.
  • You cannot hit thingISBN more than once per second.
  • If you’re going to hit thingISBN more than 1,000 times/day, you must notify LibraryThing (we’d love to hear what you’re doing). This is the current policy. If thingISBN turns out to be a success I’ll optimize the code more, put it on my second server and allow it to be hit as hard as people want to hit it.
  • ThingISBN is provided “as is,” without any promises or guarantees. LibraryThing is not responsible for any errors in the data, damages resulting from its use, your teenager’s attitude or the state of the world generally.
  • We reserve the right to change these terms and generally make things up as we go.

*Stratch that. LibraryThing knows it now too. A user had it, but it wasn’t combined; I went ahead and combined it. Actually, Green changed a lot between editions, but they still qualify as one “work.” (This edition, with another ISBN, may also be the same work, but I’m not sure, so I left it.)
**I started look around to see if this disparity was true in general of religious books. I think it isn’t, or at least the effect isn’t as striking.

Labels: apis, frbr, thingisbn, works, xisbn