Tuesday, February 20th, 2007

When tags work and when they don’t: Amazon and LibraryThing

This is an extensive post, revealing the results of a statistical comparison between Amazon and LibraryThing tags, and exploring why tagging has turned out relatively poorly for Amazon. I end by making concrete recommendations for ecommerce sites interested in making tagging work.

Both LibraryThing and Amazon allow users to tag books. But with a tiny fraction of Amazon’s traffic, LibraryThing appears to have accumulated *ten times* as many book tags as Amazon—13 million tags on LibraryThing to about 1.3 million on Amazon. (See below for the method I used to find this out.)

Something is going on here—something with broad implications for tagging, classification and “Web 2.0” commerce. There are a couple of lessons, but the most important is this: Tagging works well when people tag “their” stuff, but it fails when they’re asked to do it to “someone else’s” stuff. You can’t get your customers to organize your products, unless you give them a very good incentive. We all make our beds, but nobody volunteers to fluff pillows at the local Sheraton.

A tale of two tagging sites.

LibraryThing began on August 30, 2005. From the start, we allowed members to tag their books. We showed that people could embrace book tagging, much as they had photo and website tagging. But LibraryThing was a marginal player.

Three months later, Amazon unveiled its tagging feature. This was big deal in certain quarters. To many, Amaon’s move signalled that tagging had “arrived.” As CNet blogger Daniel Terdima wrote:

“[This] may well prove to be the most visible example of a company incorporating tags as a way to bring order to information. Outfits like Flickr are big and have tremendous followings, but nothing compared to Amazon’s.”

Amazon’s size was key. With something like 60 million registered customers, and one of the highest traffic sites out there, tagging at Amazon must have seemed like a sure bet. It’s visitors were a firehose. Point them at tagging and KABOOM!

Amazon’s tagging was quick and easy—but would it work?

It didn’t work out that way. Amazon visitors have not taken to tagging Amazon’s books in significant numbers. With thousands of times the traffic, Amazon produced a tenth as many tags as LibraryThing. What’s going on?

In fairness, Amazon didn’t give tagging a lot of prominence. Tags were stuck in the middle of their ever-lengthening book page—one section for adding your own tags, another for showing others’ tags. They didn’t push them very hard.

It’s likely Amazon could have done better. A higher profile could have increased familiarity and comfort with the feature. Some user-interface tweaks could have enhanced its appeal. Maybe Amazon will make changes, and Amazon tags will get some traction.

But there’s a general message in this: If Amazon with its unsurpassed traffic is having trouble, can other ecommerce sites hope to make tagging work?

Numbers matter

Amazon’s shortfall matters. To do anything useful with tags, you need numbers. With only a few tags, you can’t conclude much. The tags could just be “noise.”

A web of meaning: LibraryThing’s tag cloud for Guns, Germs and Steel.

Take one example: LibraryThing users have applied over 3,900 tags to Jared Diamond‘s Guns, Germs and Steel, including “apples,” “office” and “quite boring.” With just a few tags, it might be thought a desert cookbook, a business book or—worst of all—a boring one. But these are all single-instance tags. With a larger number of tags, clear patterns emerge, with high-level descriptors like “history” (755 times) and “anthropology” (293 times) standing out clearly against the noise. Even lower-frequency tags, like “social evolution” (25 times) and “pulitzer prize” (20 times) can be trusted as relevant.

Large numbers are particularly important when looking for best examples for a given tag. Go by numbers alone and you just get what’s popular. By the numbers Guns, Germs and Steel, tagged “evolution” 39 times, is the number ten book on evolution. That’s crazy. By looking at “tag share,” LibraryThing can understand that Ernst Mayr‘s What Evolution Is is a better choice. Although tagged “evolution” only 25 times, those constitute a much larger percentage of its tags. (See the LibraryThing tag page for “evolution.”)

Critical mass is important, even if we can’t pinpoint the line. Ten tags are never enough; a thousand almost always is. Unfortunately, Amazon’s low numbers translate into a broader failure to reach critical mass. With ten times as many tags overall, LibraryThing has fifteen times as many books with 100 tags, and 35 times as many with over 200 tags.

ISBN tag distribution for A Farewell to Arms. Doubles as an example of my Excel-fu.

The “problem of small numbers” is compounded by Amazon’s failure to aggregate tags accross editions. In Amazon, the tags for the various paperback, hardback, British, French and German editions of a work are all in separate “buckets.” LibraryThing’s unique user-generated “works” concept combines editions and their data, compounding tag statistics. Thus, Amazon’s top edition of A Farewell to Arms has 28 tags, where LibraryThing’s has 716. But when all of LibraryThing’s editions are combined together under a work, LibraryThing has 1,914 tags—68 times as many! A Farewell to Arms is a very well-known book, but Amazon’s 28 tags can’t mean much. With 1,914 tags, LibraryThing has a truly extensive “web of meaning,” created by its members. You can do a lot more when the data is so rich.

Why Ecommerce Tagging Fails

Amazon’s tagging suffers a failure of incentive. The causes are multiple:

  • Tags work best when they’re about memory, so tagging makes the most sense when you have a lot of something to remember. On LibraryThing, members with under 50 books seldom tag, but users with 200 or more usually do. When you get right down to it, few of us need to remember 200 books on Amazon. For most of us, the “wishlist” feature is good enough. We don’t need to sub-segment out the “anthropology” books.
  • When you tag on LibraryThing, you’re putting your library in order. The pleasure and use is not unlike reshelving your books the way you want them, except that tags can draw together books that must otherwise reside separately on the shelves. And tagging on LibraryThing is connected to a social system—tag something “anthropology” and you’re connected to all the other anthropology buffs out there.
  • Amazon is a store, not a personal library or even a club. Organizing its data is as fun as straightening items at the supermarket. It’s not your stuff and it’s not your job.
  • Amazon underplays the social. Tagging really kicks into high gear when the personal blooms into the social, when organizing your web pages or your books turns into an hours-long exploration of others’ web pages and books. But Amazon doesn’t want you to hang out—they want you to buy! Tags on book pages do not list their taggers. You need to click around a lot before the tags turn into people. (The failure is particularly surprising in light of Amazon’s clear grasp of social software. Amazon got “social” years before it was trendy. What are reviews and Listmania but social sharing and user-generated content?)
  • Users don’t “own” their tags. There is no way to export them. Considering how central APIs are to Amazon—and to it’s success—this comes as a surprise. (I’m guessing they’ll add this eventually.)

The problem of opinion tags

Some of the tags from Ann Coulter’s Treason. But what is it about? (Compare with LibraryThing’s page.

The limited utility of tagging on Amazon produces an unintended consequence—a surfeit of “opinion tags.” So, Daniel Silva’s The Unlikely Spy gets “wow what a book” and Nick Hornby’s High Fidelity gets “good” and “good book.” Not infrequently, opinion outnumbers other types of tags. Five of the seven tags applied to Bette Green’s Summer of My German Soldier are opinion tags, incluing “aweful” (sic), “obnoxious,” “pathetic,” “stupid” and “wonderful story.”

The takeover is total with political books. Ann Coulter’s Treason gets a lot of tags like “craptacular,” “evil” and “brain dead.” Coulter’s tag-defenders weigh in with “you won’t disprove the facts,” “you can’t disprove the facts,” “no one has proven this book wrong” and “try and disprove this book.” (Well, I guess that settles it!) Finally, Coulter has also received “dildo” (elsewhere applied mostly to Bill O’Reilly books*), “vibrator,” “lunesta” and “xanax.” It seems the naughty teenagers and the pharmaceutical spammers have discovered Amazon tags!**

Tag-spam on Amazon

Amazon’s all-items tag cloud shows the impact of partisan tagging. After “DVD,” “Music” and “fiction,” the largest tag is “Defectivebydesign,” applied by a small, pitchfork-weilding mob of Microsoft DRM haters.***

Ultimately, I don’t care about the commercial side of things, but “opinion tagging” in a low-numbers environment holds commercial risks. The Summer of my German Soldier is actually a pretty good book (I hear). Although Amazon won’t let me confirm it, one suspects all five negative tags come from one user. Is it fair to let one anonymous reader shape a book’s tag cloud so completely?

How to make Ecommerce Tagging Work

Big suggestions:

  • Figure out why your customers would want to tag your stuff. Don’t fool yourself.
  • Make tagging as easy as possible. (Amazon’s are quite easy to add, although registration is a pain.)
  • Understand that commercial tagging can turn people off. Avoid crass commercialism. Respect your taggers—these people are helping you out!
  • Make taggers feel like it’s “their” thing. Encourage users to give out their tag URLs—people love to show off—and let them export their tags any way they want.
  • Keep tagging social. Stop selling and start connecting. If you connect people up right, the selling will follow. Think Tupperware!
  • Consider whether a non-commerce site has the data you need. Back when LibraryThing had a million tags, Amazon could have bought our data for the price of a cup of coffee. Now, that we’re big and important and have three employees, that’ll be THREE cups of coffee, buster!

Small suggestions:

  • Put methods in place to fight spamming and tag-bombing. LibraryThing does this by considering both the number of times a tag has been applied and the number of users who use it. A single angry user can’t make a tag really big on the tag cloud.
  • Have logical URLs. Amazons tag URLs are full of junk, much of it rather crass attempts at search engine optimization (eg., the book title is inserted into the tag URL, but it works without it). It seems getting a little search engine help trumped providing users with easy-to-remember URLs.

Methods

To my knowledge, Amazon doesn’t release any total tag statistics. So I tried a statistical sample:

  • I picked 1000 random entries from LibraryThing libraries, and retrieved their ISBNs.
  • I ran the ISBNs through LibraryThing and Amazon, counting tag numbers. I did it by hand through 100 before I decided to write a quick scraper.
  • I compiled the results and did some simple math. You can find my Excel file to the right.

The final results were 56,185 tags on LibraryThing, 5,528 on Amazon. Extrapolating on the sample, I conclude that Amazon has something like 1,337,388 tags in total, to LibraryThing’s 13,593,069.

If anyone wants to duplicate the test, let me know. By default, LibraryThing doesn’t think of tags ISBN-by-ISBN, so I’d need to give you an API to that data.

Problems with my method

  • It only covers books. Maybe DVD tagging is a different phenomenon. I note, however, that Amazon’s page for bananas—yes, Amazon sells bananas—is overrun with Borat-themed tags.
  • The random books were drawn from LibraryThing. Maybe LibraryThing’s ISBNs are unrepresentative of Amazon’s ISBNs as a whole—that the sort of books that are tagged on LibraryThing are not tagged on Amazon. There may be some truth to this insofar as LibraryThing includes a lot of older books, while Amazon focuses on the new and in-print.
  • I only sampled Amazon’s US site. LibraryThing has a fair number of non-US editions.

Let me know what you think

As usual, I’m dying to hear what people think about this post. I know it’s imperfect—I bit off more than I could chew. But it says a lot of things I’ve been keeping in my head for months. Leave comments here. If enough interest develops, we can start something on Talk.

*Shouldn’t it be “falafel”? And YES! O’Reilly’s Culture Warrior IS tagged “falafel”! I swear I did NOT do it.
**Out of 60 unique tags applied to the book, I can spy only four that read like subject tags.
***Small numbers also mean Amazon is open to manipulation. One of the larger tags on their tag cloud is for “bards and minstrels,” applied to 4,200 products by six taggers. The tag has never been used on LibraryThing, Flickr or Del.icio.us. I suspect a conspiracy.

Labels: Uncategorized

One Comments:

  1. China says:

    I was just doing some research on Amazon tags and found this old blog post. A bit dated now in 2010, but fascinating nonetheless, especially compared with the evolution of Amazon's tags over the past 3 years. I think all I have to add now is that independent authors rely on Amazon's tags for visibility on the site as a substitute for lack of sales/customer reviews. Tags offer unknown authors a fighting chance at being noticed in otherwise crowded genres. I'm sure once this catches on, Amazon will tweak the system, just to keep the little guy down, but for now, I think tags are more important than ever. Thanks for the data!