Archive for February, 2007

Tuesday, February 27th, 2007

SocialCatalogers: For people who make social cataloging applications

Introducing SocialCatalogers, a Ning-based social network for people who make social cataloging sites. No, this is not a joke. It’s funny, but it’s not a joke.

If you make social cataloging sites, or have a deep interest in them, join up here: http://socialcatalogers.ning.com.

Social cataloging has exploded. Today LibraryThing’s list of competitors—very broadly defined—hit forty.* That’s not counting the dozens or even hundreds of sites in other niches—movies, games, comics, programs, wine, beer, recipes. It’s not counting the swap sites, which catalog as a means to something else. Or the list sites like 43Things and Wordie, which catalog intangibles.** Or projects like John Blyber’s “Social OPAC,” which bolts social cataloging onto “unsocial cataloging” (a service LibraryThing will offering soon too).

For a while the social cataloging social network was me. I started LibraryThing by trying to get Bibliophil to join forces with me—they would provide the social, I’d provide the cataloging. No dice. Since then I’ve emailed or met with half the social cataloging developers out there, looking for synergies or just to talk shop. Sometimes something came out of it; LibraryThing has “also on” integration with a few, like Cork’d, “LibraryThing for wine.” (Some day, if we have enough shared users, LibraryThing can recommend books based on the wines you drink!) There should be more of that.

The time has come for a socal cataloging watering hole. We’re an industry now, or a dwarf industry anyway. Some of us compete, but that doesn’t prevent automakers from getting together. We too can conspire to fix prices! Seriously, we ought to have some things to talk about. At the very least we can keep an eye on the competition.

I made the social network on Ning, which relaunched today. Ning*** is a “social network maker,” started and mostly funded by Mark Andreeson. I wasn’t that impressed with it, until the relaunch. It’s really something now. I was able to create a basic social network in about ten minutes. It’s not what I would have designed, but I would have taken a month to do it, at least. 60% in ten minutes beats 100% in a month every time.

Also, by getting in early, SocialCatalogers hopes to become the dominant social network for people who make social cataloging applications. Take that CatalogingSocial.com, SocietyofSocialCatalogers.com, SocialCatalogingThing.com, ThingSocialCatalogers.com, SocialCatalogersList.com, SocialCatalogersster.com, Joptwix, Flipto, Gropo and Fhtagn****.

*I found StashMatic, which is similar to Squirl and iTaggit. (Squirl is my pick, and I’m not just saying that because half the development team now works for LibraryThing.) And I found JunkLog, which brings minimalism in social cataloging to a new level. That’s not really a knock. It’s kind of cool to strip it down. I can’t tell if it’s developing or defunct.
**Early on in LibraryThing I was at my parents house for the weekend, and my dad came into my room at 5am, fresh from bed. He had an idea he was dying to tell me about. He had an idea—LibraryThing, but for people! Instead of cataloging your books, you list your friends. I let him down easy. (Still, a more catalogy social network would be an interesting project.)
***Mostly because it got so much press but has LibraryThing-level traffic. Then again, I’m Dan Quayle to Andreeson’s John Kennedy. I expect the relaunch to kick Ning into the clouds.
****As every lover of H. P. Lovecraft knows, Fhtagn comes from Ph’nglui mglw’nafh Cthulhu R’lyeh wgah’nagl fhtagn! (“In his house at R’lyeh dead Cthulhu lies dreaming”). Fhtagn.com, .net., .org., .de and .edu are taken. However, phngluimglwnafhcthulhurlyehwgahnaglfhtagn.com is still available, even if phngluimglwnafhcthulhurlyehwgahnaglfhtagn.net is not.

Labels: Uncategorized

Monday, February 26th, 2007

New feed: Compare your library with LibraryThing

Over on Next Generation Catalogs for Libraries, NCSU‘s Emily Lynema, asked me:

“Do you have any idea of the coverage of non-fiction, research materials in LT? Have you done any projects to look at overlap with a research institution (or with WorldCat)?”

No, we haven’t. And I’m dying to find out, both for academic and non-academic libraries.

So I put together a feed of all unique LibraryThing’s ISBNs. With a little work, library programmers should be able to compare them against their holdings.

If you’re not up to the task, but still want to find out how LibraryThing compares to your library, you can send me a file with ISBNs—just ISBNs or a more detailed dump—and I’ll do the comparison.

See our Feeds and APIs page for the file, AllLibraryThingISBNs.xml.gz.

Complications and opportunities:

  • I included only valid ISBNs.
  • It’s a week or two old.
  • About 20% of LibraryThing books have invalid or no ISBN. Many of these have LCCNs. I suspect a high percentage are library-ish books.
  • I have turned all ISBN-13s in 978 format into ISBN-10s. There are a few bogus ones too, including the valid but numerically absurd 0000000000. (Bowker should auction that one off!)
  • There can be little doubt that LibraryThing is stronger in paperbacks and weaker in the formats libraries collect. It would therefore be very useful to run all ISBNs through OCLC’s xISBN service*. (By definition, they’re not going to be improved by running them through xISBN’s chief competitor alternative service provider, thingISBN.) Unfortunately, I can’t run them through xISBN on my own.
  • The feed is available for non-commercial use only. That basically means libraries and hobbyists. Other use is expressly prohibited.
  • I am guessing the overlap won’t always be that impressive as a percentage. But these are the books people think enough of to own. They’re going to move more than other library books.

I’m looking forward to what people find out!

*Which is moving, but will not break.

Labels: Uncategorized

Monday, February 26th, 2007

New feed: Wikipedia citations

I parsed the English Wikipedia looking for ISBNs and came up with a public feed of ISBNs to articles.

Over on the LibraryThing blog, I posted about it. LibraryThing is now showing the results on work pages, but others should feel free to use the feed for their own ends. (Cause trouble. Put Wikipedia in your OPAC!)

Enjoy.

Labels: Uncategorized

Monday, February 26th, 2007

O’Reilly Radar on tagging

The O’Reilly Radar Blog does a nice post on my When tags work and when they don’t: Amazon and LibraryThing.

O’Reilly blogger Brady Forrest notes that much of what I said echoed Joshua Schachter of Del.icio.us:

“You have to understand the selfish user – user #1 has to find the system useful or you won’t get user #2. Systems that only become useful when lots of people are using them usually fail, because there’s no incentive for people to contribute themselves.”

So now would a good time for me to say I’m not claiming my insights are all my own. Most of this stuff has been in the air for a while. While I’d never seen that piece by Schachter, I’ve seen similar statements by him and others. (And there’s a good short panel taped with him and David Weinberger.) I’d love to talk tags with these guys. No doubt like Wikipedia founder Jimmy Wales, Schachter’s so swamped he’s not even answering Bono’s email (source: net@night).*

A few corrections:

  • AllConsuming is already owned by Amazon. That is, Amazon now owns two of LibraryThing’s competitors!
  • Shelfari allows tagging. Honesty hurts.
  • As good as “CEO” sounds, I’m only the president. LibraryThing is an LLC.

Labels: Uncategorized

Friday, February 23rd, 2007

Not the Ninja!

I’m loathe to take the last post, When tags work and when they don’t: Amazon and LibraryThing, off the top. It got a *lot* of attention*, and I owe the commenters a follow up.

The Shifted Librarian found this YouTube video “put together by Steven Reed’s students at Wilmington High School.”

It’s a fun video, no question. It’s an *amazing* demonstration of what kids can do these days. My highschool had the best Super8 program in Massachusetts, and this level of professionalism would have been way past our capabilities. The book-throwing is great. The editing is quick and professional. The kids get an A. They rock.

But I can’t leave it at that. The kids are rock stars, but the message is all wrong—and it’s wrong in a very telling way.

The situation is completely false. I don’t mean the ninja—they’re increasingly common in libraries of all sizes—but the contest and its results.

Type Capital of Russia into Google and you get this:

You don’t even need to GO to a page—the answers are in the page titles themselves. Face it, the Web is *great* for this sort of thing. You’re not going to “defend” books by claiming they’re better for looking up trivial facts. They’re not. Breathe deep and repeat after me: They’re worse.**

The second false idea is that libraries and the web are rivals, two competing ways to get the same thing (which is mostly factoids). This is all too often how popular culture sees libraries, and it’s a disaster. If libraries are just low-tech search engines, they are bad ones. They should be defunded and closed.***

I’m not going to launch into a defense of libraries and books. Of course I love them. I started LibraryThing because I loved them so much. But I don’t love them because I hate computers, or because books are better than computers.**** I don’t see them as rivals. The web has supplanted a few things that books used to do, but not the important ones. And libraries can do things with computers they are only just starting to explore.

People who love books need to fight against these ideas. They’re a trap. They’re wrong, and they’re very dangerous to the things we love.*****

Yeah, I know, lighten up, Tim!

*Alexa is under the impression LibraryThing’s traffic doubled the day the post hit. That’s total nonsense and a great proof of Alexa’s failings. It makes me wonder how much they rely on new link creation, not traffic.
**Where did http://www2.blogger.com/img/gl.link.gifthat guy find the books? Does he have the shelving memorized, or did he consult and OPAC to find Countries of the World?
***And don’t tell me it’s about not everyone having a computer. If so, libraries should be just computer centers.
****For starters, books are worse at email, worse at social networking, and they are hands-down a lousy way to blunder upon shocking new types of pornography.
*****Can anyone help me find a quote? I think it was from random sci-fi movie or show, taking place in the near future. The quote was something like “Would you have wanted to shut down the internet just to keep the libraries open?” Don’t even try to Google it. (And I recommend not going to the library either.)

Update: This ninja movie has a good message.

Labels: Uncategorized

Tuesday, February 20th, 2007

When tags work and when they don’t: Amazon and LibraryThing

This is an extensive post, revealing the results of a statistical comparison between Amazon and LibraryThing tags, and exploring why tagging has turned out relatively poorly for Amazon. I end by making concrete recommendations for ecommerce sites interested in making tagging work.

Both LibraryThing and Amazon allow users to tag books. But with a tiny fraction of Amazon’s traffic, LibraryThing appears to have accumulated *ten times* as many book tags as Amazon—13 million tags on LibraryThing to about 1.3 million on Amazon. (See below for the method I used to find this out.)

Something is going on here—something with broad implications for tagging, classification and “Web 2.0″ commerce. There are a couple of lessons, but the most important is this: Tagging works well when people tag “their” stuff, but it fails when they’re asked to do it to “someone else’s” stuff. You can’t get your customers to organize your products, unless you give them a very good incentive. We all make our beds, but nobody volunteers to fluff pillows at the local Sheraton.

A tale of two tagging sites.

LibraryThing began on August 30, 2005. From the start, we allowed members to tag their books. We showed that people could embrace book tagging, much as they had photo and website tagging. But LibraryThing was a marginal player.

Three months later, Amazon unveiled its tagging feature. This was big deal in certain quarters. To many, Amaon’s move signalled that tagging had “arrived.” As CNet blogger Daniel Terdima wrote:

“[This] may well prove to be the most visible example of a company incorporating tags as a way to bring order to information. Outfits like Flickr are big and have tremendous followings, but nothing compared to Amazon’s.”

Amazon’s size was key. With something like 60 million registered customers, and one of the highest traffic sites out there, tagging at Amazon must have seemed like a sure bet. It’s visitors were a firehose. Point them at tagging and KABOOM!

Amazon’s tagging was quick and easy—but would it work?

It didn’t work out that way. Amazon visitors have not taken to tagging Amazon’s books in significant numbers. With thousands of times the traffic, Amazon produced a tenth as many tags as LibraryThing. What’s going on?

In fairness, Amazon didn’t give tagging a lot of prominence. Tags were stuck in the middle of their ever-lengthening book page—one section for adding your own tags, another for showing others’ tags. They didn’t push them very hard.

It’s likely Amazon could have done better. A higher profile could have increased familiarity and comfort with the feature. Some user-interface tweaks could have enhanced its appeal. Maybe Amazon will make changes, and Amazon tags will get some traction.

But there’s a general message in this: If Amazon with its unsurpassed traffic is having trouble, can other ecommerce sites hope to make tagging work?

Numbers matter

Amazon’s shortfall matters. To do anything useful with tags, you need numbers. With only a few tags, you can’t conclude much. The tags could just be “noise.”

A web of meaning: LibraryThing’s tag cloud for Guns, Germs and Steel.

Take one example: LibraryThing users have applied over 3,900 tags to Jared Diamond‘s Guns, Germs and Steel, including “apples,” “office” and “quite boring.” With just a few tags, it might be thought a desert cookbook, a business book or—worst of all—a boring one. But these are all single-instance tags. With a larger number of tags, clear patterns emerge, with high-level descriptors like “history” (755 times) and “anthropology” (293 times) standing out clearly against the noise. Even lower-frequency tags, like “social evolution” (25 times) and “pulitzer prize” (20 times) can be trusted as relevant.

Large numbers are particularly important when looking for best examples for a given tag. Go by numbers alone and you just get what’s popular. By the numbers Guns, Germs and Steel, tagged “evolution” 39 times, is the number ten book on evolution. That’s crazy. By looking at “tag share,” LibraryThing can understand that Ernst Mayr‘s What Evolution Is is a better choice. Although tagged “evolution” only 25 times, those constitute a much larger percentage of its tags. (See the LibraryThing tag page for “evolution.”)

Critical mass is important, even if we can’t pinpoint the line. Ten tags are never enough; a thousand almost always is. Unfortunately, Amazon’s low numbers translate into a broader failure to reach critical mass. With ten times as many tags overall, LibraryThing has fifteen times as many books with 100 tags, and 35 times as many with over 200 tags.

ISBN tag distribution for A Farewell to Arms. Doubles as an example of my Excel-fu.

The “problem of small numbers” is compounded by Amazon’s failure to aggregate tags accross editions. In Amazon, the tags for the various paperback, hardback, British, French and German editions of a work are all in separate “buckets.” LibraryThing’s unique user-generated “works” concept combines editions and their data, compounding tag statistics. Thus, Amazon’s top edition of A Farewell to Arms has 28 tags, where LibraryThing’s has 716. But when all of LibraryThing’s editions are combined together under a work, LibraryThing has 1,914 tags—68 times as many! A Farewell to Arms is a very well-known book, but Amazon’s 28 tags can’t mean much. With 1,914 tags, LibraryThing has a truly extensive “web of meaning,” created by its members. You can do a lot more when the data is so rich.

Why Ecommerce Tagging Fails

Amazon’s tagging suffers a failure of incentive. The causes are multiple:

  • Tags work best when they’re about memory, so tagging makes the most sense when you have a lot of something to remember. On LibraryThing, members with under 50 books seldom tag, but users with 200 or more usually do. When you get right down to it, few of us need to remember 200 books on Amazon. For most of us, the “wishlist” feature is good enough. We don’t need to sub-segment out the “anthropology” books.
  • When you tag on LibraryThing, you’re putting your library in order. The pleasure and use is not unlike reshelving your books the way you want them, except that tags can draw together books that must otherwise reside separately on the shelves. And tagging on LibraryThing is connected to a social system—tag something “anthropology” and you’re connected to all the other anthropology buffs out there.
  • Amazon is a store, not a personal library or even a club. Organizing its data is as fun as straightening items at the supermarket. It’s not your stuff and it’s not your job.
  • Amazon underplays the social. Tagging really kicks into high gear when the personal blooms into the social, when organizing your web pages or your books turns into an hours-long exploration of others’ web pages and books. But Amazon doesn’t want you to hang out—they want you to buy! Tags on book pages do not list their taggers. You need to click around a lot before the tags turn into people. (The failure is particularly surprising in light of Amazon’s clear grasp of social software. Amazon got “social” years before it was trendy. What are reviews and Listmania but social sharing and user-generated content?)
  • Users don’t “own” their tags. There is no way to export them. Considering how central APIs are to Amazon—and to it’s success—this comes as a surprise. (I’m guessing they’ll add this eventually.)

The problem of opinion tags

Some of the tags from Ann Coulter’s Treason. But what is it about? (Compare with LibraryThing’s page.

The limited utility of tagging on Amazon produces an unintended consequence—a surfeit of “opinion tags.” So, Daniel Silva’s The Unlikely Spy gets “wow what a book” and Nick Hornby’s High Fidelity gets “good” and “good book.” Not infrequently, opinion outnumbers other types of tags. Five of the seven tags applied to Bette Green’s Summer of My German Soldier are opinion tags, incluing “aweful” (sic), “obnoxious,” “pathetic,” “stupid” and “wonderful story.”

The takeover is total with political books. Ann Coulter’s Treason gets a lot of tags like “craptacular,” “evil” and “brain dead.” Coulter’s tag-defenders weigh in with “you won’t disprove the facts,” “you can’t disprove the facts,” “no one has proven this book wrong” and “try and disprove this book.” (Well, I guess that settles it!) Finally, Coulter has also received “dildo” (elsewhere applied mostly to Bill O’Reilly books*), “vibrator,” “lunesta” and “xanax.” It seems the naughty teenagers and the pharmaceutical spammers have discovered Amazon tags!**

Tag-spam on Amazon

Amazon’s all-items tag cloud shows the impact of partisan tagging. After “DVD,” “Music” and “fiction,” the largest tag is “Defectivebydesign,” applied by a small, pitchfork-weilding mob of Microsoft DRM haters.***

Ultimately, I don’t care about the commercial side of things, but “opinion tagging” in a low-numbers environment holds commercial risks. The Summer of my German Soldier is actually a pretty good book (I hear). Although Amazon won’t let me confirm it, one suspects all five negative tags come from one user. Is it fair to let one anonymous reader shape a book’s tag cloud so completely?

How to make Ecommerce Tagging Work

Big suggestions:

  • Figure out why your customers would want to tag your stuff. Don’t fool yourself.
  • Make tagging as easy as possible. (Amazon’s are quite easy to add, although registration is a pain.)
  • Understand that commercial tagging can turn people off. Avoid crass commercialism. Respect your taggers—these people are helping you out!
  • Make taggers feel like it’s “their” thing. Encourage users to give out their tag URLs—people love to show off—and let them export their tags any way they want.
  • Keep tagging social. Stop selling and start connecting. If you connect people up right, the selling will follow. Think Tupperware!
  • Consider whether a non-commerce site has the data you need. Back when LibraryThing had a million tags, Amazon could have bought our data for the price of a cup of coffee. Now, that we’re big and important and have three employees, that’ll be THREE cups of coffee, buster!

Small suggestions:

  • Put methods in place to fight spamming and tag-bombing. LibraryThing does this by considering both the number of times a tag has been applied and the number of users who use it. A single angry user can’t make a tag really big on the tag cloud.
  • Have logical URLs. Amazons tag URLs are full of junk, much of it rather crass attempts at search engine optimization (eg., the book title is inserted into the tag URL, but it works without it). It seems getting a little search engine help trumped providing users with easy-to-remember URLs.

Methods

To my knowledge, Amazon doesn’t release any total tag statistics. So I tried a statistical sample:

  • I picked 1000 random entries from LibraryThing libraries, and retrieved their ISBNs.
  • I ran the ISBNs through LibraryThing and Amazon, counting tag numbers. I did it by hand through 100 before I decided to write a quick scraper.
  • I compiled the results and did some simple math. You can find my Excel file to the right.

The final results were 56,185 tags on LibraryThing, 5,528 on Amazon. Extrapolating on the sample, I conclude that Amazon has something like 1,337,388 tags in total, to LibraryThing’s 13,593,069.

If anyone wants to duplicate the test, let me know. By default, LibraryThing doesn’t think of tags ISBN-by-ISBN, so I’d need to give you an API to that data.

Problems with my method

  • It only covers books. Maybe DVD tagging is a different phenomenon. I note, however, that Amazon’s page for bananas—yes, Amazon sells bananas—is overrun with Borat-themed tags.
  • The random books were drawn from LibraryThing. Maybe LibraryThing’s ISBNs are unrepresentative of Amazon’s ISBNs as a whole—that the sort of books that are tagged on LibraryThing are not tagged on Amazon. There may be some truth to this insofar as LibraryThing includes a lot of older books, while Amazon focuses on the new and in-print.
  • I only sampled Amazon’s US site. LibraryThing has a fair number of non-US editions.

Let me know what you think

As usual, I’m dying to hear what people think about this post. I know it’s imperfect—I bit off more than I could chew. But it says a lot of things I’ve been keeping in my head for months. Leave comments here. If enough interest develops, we can start something on Talk.

*Shouldn’t it be “falafel”? And YES! O’Reilly’s Culture Warrior IS tagged “falafel”! I swear I did NOT do it.
**Out of 60 unique tags applied to the book, I can spy only four that read like subject tags.
***Small numbers also mean Amazon is open to manipulation. One of the larger tags on their tag cloud is for “bards and minstrels,” applied to 4,200 products by six taggers. The tag has never been used on LibraryThing, Flickr or Del.icio.us. I suspect a conspiracy.

Labels: Uncategorized

Monday, February 19th, 2007

WorldCat Registry: Join up!

OCLC has introduced WorldCat Registry, a one-stop place for libraries, library consortia, library vendors, funders and suchnot to put contact info, link URLs and other “identity” data. Every institution gets its own page—it’s like MySpace but with libraries and minus friends, comments, tacky background images and all the drunken photos. Here’s LibraryThing’s page. We hate being a “vendor,” but there was no category for “The OCLC of Lilliput.”

OCLC is being generous with the entry requirements. Personal libraries* are out, but small institutional ones are not. Their FAQs note “no restrictions prevent a smaller physical entity such as a church library, or a ‘virtual’ entity such as a digital library, from representing itself in the Registry.” So, if you’re on an institutional membership, go ahead and take them at their word—join up!

When you join up, you can give your catalog URL as:

http://www.librarything.com/catalog/YOURUSERNAME

Your ISBN and ISSN URLs are:

http://www.librarything.com/catalog.php?view=YOURUSERNAME
&searchbox=ISBNORISSN&searchType=Books

LibraryThing is not currently listed among their vendors. Until they do, select “other.”

*Which reminds me, we recently had an application by a coven. They were uncertain if they were a family or an institution. OCLC is silent on the coven issue.

Labels: Uncategorized

Thursday, February 15th, 2007

Borges and women entrepreneurs

An alert LibraryThing member sent me Amazon email, a textbook case of collaborative filtering gone wrong. (LibraryThing makes these kinds of mistakes too.) The fact that it’s Borges, who has such fun things to say about how books relate to each other, is just icing on the cake.

Dear Amazon.com Customer,

We’ve noticed that customers who have expressed interest in Borges: A Life by Edwin Williamson have also ordered How She Does It: How Women Entrepreneurs Are Changing the Rules of Business Success by Margaret Heffernan. For this reason, you might like to know that Margaret Heffernan’s How She Does It: How Women Entrepreneurs Are Changing the Rules of Business Success is now available. You can order your copy for just $17.13 ($8.82 off the list price) by following the link below.

How She Does It: How Women Entrepreneurs Are Changing the Rules of Business Success How She Does It: How Women Entrepreneurs Are Changing the Rules of Business Success Margaret Heffernan
List Price: $25.95
Price: $17.13
You Save: $8.82 (34%)

Labels: Uncategorized

Tuesday, February 13th, 2007

Blyberg’s SOPAC

This isn’t breaking news at this point, but it’s still cool. John Blyberg has announced what he calls “SOPAC”:

It’s basically a set of social networking tools integrated into the AADL catalog. It gives users the ability to rate, review, comment-on, and tag items.

Tags, ratings, and reviews in an OPAC! I think it’s great that he’s done it— it’s no surprise that we’d love to put some of LT’s features into OPACs, and to see a big library like AADL take on social stuff legitimates the point.*

Anyway, you should check out his post about it, which even has a nifty screencast.

*Tim has blogged before about putting LT and OPACs… We’re thinking it’ll be a sort of OPAC widget, so hold onto your hats.

Labels: Uncategorized

Tuesday, February 13th, 2007

Introducing the book

Labels: Uncategorized