Archive for March, 2007

Wednesday, March 28th, 2007

Will libraries die?

Note: The opinions in this post are mine alone, and contain generalizations about libraries from a non-librarian. Abby (a librarian) and John (not) probably don’t share them. And I might not agree with them tomorrow. Go easy on me.

Every profession has its party question–the one strangers ask when they find out what you do. Doctors get “What about those insurance companies?” My wife, a novelist, gets “Are you published?”* My question is “Are books going to die? Are libraries going to die?”

Meh. I’m not too afraid. I don’t see the perfect ebook arriving any time soon, and all the the book lovers and libraries hauling their collections to the dumpster. A thousand interesting, transformative things are happening to books and to libraries, but death-by-ebook seems very far off.

But then it hit me. To me, libraries are about books. But libraries today are about much more, with CDs and DVDs high on the list**. Those media ARE dying, being replaced by digital downloads. In my own life, I’ve almost stopped buying CDs, and recently my wife and I have seriosuly cut back on DVD rentals. We get both on iTunes now.***

I’m not taking every to the dumpster yet, but CD and DVD racks no longer have a central place in our living room. This stuff is on the way out. Technological adoption, habit and the fact that library borrowing is free will slow things down, but the trend is clear. Books are better than ebooks, even if you have to go to the library to get them. CDs and DVDs aren’t.

So, let’s all stop imagining a library without books, and imagine a library without CDs and DVDs. Let’s imagine a library with books, and hope for one with more of them. Maybe it’s just me, but I’m excited by that prospect.

*She is. Both she and my friend Kevin Shay have discovered another common question. When people hear they write novels, an amazing number of people are moved to ask “fiction?”
**Also internet access and serials. To keep this post short, I won’t get into them, although I think both are on the same downward escalator as CDs and DVDs.
***We watch on my laptop. We don’t own a TV. I know, smell us.

Labels: Uncategorized

Tuesday, March 27th, 2007

No more User Generated Content on LibraryThing

Did that get your attention? I mean no more using the term “User Generated Content.”

I hated “users” already, and have largely dropped it in favor of “members,” “people” or “you.” “Users” is too impersonal, and as some anonymous genius* said, the only other industry that calls its customers “users” is not one we want to emulate.

Anyway, there’s an excellent IT Conversations podcast with Doc Searls (Cluetrain Manifesto co-author), run by Phil Windley, where Doc expands on his hated of the term “User Generated Content.”

Doc Searls: One reason is I’m not just a user. I’ve never like the term “user” either. I realize there’s no better term. It’s like “content.” You need an encompassing word that stands for everybody who’s sitting at a computer or using a telephone or whatever the “usage” happens to be.

But on top of that, I don’t like the term “generated.” I don’t generate what I create–I write it. “Generating” is something that an inanimate device does. It’s not something that a person does.

And I don’t produce “content.” I never sit down at the keyboard or pick up a camera or draw something thinking “I’m going to generate some content here.” Nobody is motivated to generate content. Content is a measure of volume. It’s packing material. It’s container cargo. It’s not creative work.

And “user generated content” is the kind of thing only an exclusive, controlling producer can say. And to hear people in the Web 2.0 world or the online world saying “Oh, we need more user-generated content here!” It’s that you’re adopting the langauge of the old world when you do that. …

It’s not just about packing stuff into a vehicle that’s a medium. I don’t even like the term “medium” very much any more.

Phil Windley: Or “delivering information”—that’s another one.

Doc Searls: Yeah again, it’s the container cargo shipping version of the world–that assumes a distance. It assumes that you’re way over there and I’m way over here, and I need to “scale” a whole pile of you and I got to scale it up in way that I can package it up and I’m going to pack a lot of advertising around it because I can sell that shit. Oh, come on.

I mean, there’s nothing wrong with doing a business with that. But at least know what you’re doing. What you’re doing is to some degree diminishing the profoundly individual and deeply personal and socially transforming nature of the best of what that stuff is. …

When you say “user generated content” you are now subtracting out all the value of everything everybody’s doing.

The relevance to LibraryThing is obvious. We should never adopt the “containing shipping” model of what our members are doing, even in how we talk about it.

But I think there’s some special relevance to libraries too. Uncertainty about “user generated content” among librarians centers around issues of authority, certainly. But I suspect the mixture of impersonal technology and impersonal personality is also toxic. After all, most librarians have jobs that put them in frequent, meaningful contact with their patrons**. Librarians value the patron’s role in the library, and I suspect that, like teachers and students, many librarians learn from their patrons every day. I suspect there would be less resistance to “user generated content” in the library if it sounded less like communal sausage production.

We in the “Lib 2.0″ world gain nothing by using the language of language of container ships to describe the writing, knowledge and opinions of patrons.

*Help? Paul Graham?
**A good term.

Labels: cluetrain, doc searls, lib2.0, ugc, user generated content

Friday, March 23rd, 2007

xISBN and thingISBN compared

William Denton over at the FRBR Blog ran some interesting comparisons between xISBN and thingISBN, two services that allow you to send an ISBN and get back a list of “related” ISBNs.* Denton seems to have confirmed what we found—OCLC’s xISBN has more ISBNs, but our thingISBN’s paperback coverage is superior. And both can turn out better than the other for no apparent reason.

He concludes:

Upshot: If you have an ISBN in hand and want to find ISBNs of other manifestations of the same work, use both thingISBN and xISBN.

Considering that OCLC has a BILLION records, some 1,200 employees and more left-handed, green-eyed, vegetarian software engineers than LibraryThing has employees, we’ll take the tie. And, of course, it’s not because our engineers are smarter. It’s because social collaboration is a powerful thing.

*Both have APIs; thingISBN is also available as one big take-it-and-run file.
**Someone wrote us to object that LibraryThing should not devalue the work of librarians by calling xISBN “just” an algorithm–it’s an algorithm built on the painstaking work of librarians. That’s fair enough, but both xISBN and thingISBN rely on that labor. (It’s why LibraryThing decided, and I conjecture OCLC decided, to make the service free to libraries.) The question is what happens next, algorithm or crowdsourcing.***
***Also, it’s a little known fact, but OCLC’s xISBN algorithm requires a constant supply of dead kittens. When you use xISBN, a kitten dies.

Labels: Uncategorized

Monday, March 19th, 2007

Compare your library to LibraryThing, now in CSV

Our feed of all LT ISBNs (described in this blog post) are now available in CSV format as well. See http://www.librarything.com/feeds/ .

Labels: Uncategorized

Thursday, March 15th, 2007

thingISBN data in one file

thingISBN is a simple API for discovering related editions. Give it an ISBN and it returns a list of other ISBNs—different formats, translations, etc. We offer the API free for non-commercial use. Today we’re releasing thingISBN in one giant feed, under the same conditions.*

thingISBN is based on LibraryThing’s first-of-its-kind “work” system, by which regular people—LibraryThing members, mostly—combine and separate editions. Members run over 2,000 work-combination actions per day. Although some do it for pure altruism, combining editions helps LibraryThing users by improving the quality of their connections.

LibraryThing’s results compare very favorably with its competition, OCLC’s xISBN service (also free for non-commercial use). xISBN’s coverage is better, but where LibraryThing is built on the collective judgment of humans, xISBN is just a computer algorithm. As the fella says, xISBN is “based on a world which is built on rules and because of that, [it] will never be as strong or as fast as [thingISBN] can be.”**

APIs, while nifty, can be a pain. Both thingISBN and xISBN have a 1,000-per-day limit. So, starting today, thingISBN is also available in feed format—one giant XML file with all the data from over two million unique ISBNs.

Here’s a sample file with just 1000 ISBNs:
http://www.librarything.com/feeds/thingISBN_small.xml

As you can see, the format is not ISBN-to-ISBNs. This would involve too much repetition—the full XML file is already 96MB! Instead, it goes work by work, listing the ISBNs inside them:

<work workcode="183">
<isbn>0802150845</isbn>
<isbn>0802143008</isbn>
<isbn>2020006014</isbn>
<isbn>0745300359</isbn>
<isbn>0394179900</isbn>
<isbn>9867574397</isbn>
<isbn uncertain="true">999107371X</isbn>
</work>

This format should go into a database well, e.g.,

CREATE TABLE isbn_to_work (
itw_workcode mediumint(8) unsigned NOT NULL,
itw_isbn char(13) NOT NULL,
itw_uncertain tinyint(4) NOT NULL default '0',
PRIMARY KEY (itw_workcode,itw_isbn)
)

As you can see, some ISBNs are listed as “uncertain.” This happens when an ISBN crosses works. In a perfect world, these works would be combined, but LibraryThing doesn’t do it automatically. There are a couple ways that can go wrong. For example “great books” sets often sport a single ISBN across volumes. It wouldn’t do to combine “Pride and Prejudice” with “Moby Dick” just because their publisher wouldn’t pony up for two ISBNs.

So, you can use the “uncertains” if you are willing to accept more errors. Otherwise, ignore them.

The feed itself is in http://www.librarything.com/feeds/ and is called “thingISBN.xml.gz”. It is 16MB compressed.

We’d love to hear what people are doing with the data.

*Commercial use requires our permission. See http://www.librarything.com/api.php.
**Okay, the comparison in inexact, but OCLC does have a “Matrix” feel to it.

Labels: apis, frbr, oclc, thingisbn, works, xisbn

Thursday, March 1st, 2007

Percent who tag

Some bloggers* has talked about my statement in When tags work and when they don’t, that:

“Tags work best when they’re about memory, so tagging makes the most sense when you have a lot of something to remember. On LibraryThing, members with under 50 books seldom tag, but users with 200 or more usually do.** When you get right down to it, few of us need to remember 200 books on Amazon. For most of us, the “wishlist” feature is good enough. We don’t need to sub-segment out the “anthropology” books.”

Here’s some data on that issue. I compared the number of books a LibraryThing member has with whether they tag or not. The later is defined as having at least one tag, so it over-represents taggers. But the trend is clear. The more you have to keep track of, the more you tag.

(click to enlarge)

*Notably, Thomas Vanderwal, who coined “folksonomy.” Vanderwal is giant. He makes some criticisms of my post that I agree with—I wrote pretty rapidly and off-the-cuff. And some I don’t. I have not been watching Amazon’s tagging month by month as Thomas has, but his points don’t change my mind. Even with half the time—and remembering that LibraryThing was dead-obscure for much of it’s life—and any other caveats and nuances one applies, Amazon’s tagging experiment hasn’t worked out. If it worked, they’d be swimming in high-quality tags. They aren’t, and they’ve been distracting customers and burning up valuable screen space to acquire a web of meaning so flimsy as to be largely useless for its ends. Anyway, I hope to get a reply out soon.
**You’ll note my qualifiers are a bit off. But my impressions match the facts better insofar as the numbers above overplay taggers. In theory, I could set a bar—X% of books are tagged. But that would miss some people who catalog first, tag second. I’ve seen that a lot—although I’d go mad if I delayed it like that! Those people are taggers too.

Post footnotem: Anyone have an explanation for why tagging dips at 200-250? People who hit the free-200 wall, get frustrated and leave before tagging?

Labels: Uncategorized