Archive for the ‘tagging’ Category

Wednesday, May 7th, 2008

The Long Tail of Ann Coulter

Here is are two charts showing the distribution of customer tags on Amazon.com for Ann Coulter’s Godless: The Church of Liberalism. The first shows tags 1-25; the second all 881 tags.


The distribution is not too far from the classic “long tail” pattern common to social data. Although the common tags are common, fully 75% of the tags are used only once.

It’s an even better example of another characteristic of social data, that “user generated content” is all about context, not just object. LibraryThing members and Amazon customers are tagging the same book. But while, on LibraryThing, where you have to have a book to tag it, Godless has a fairly unremarkable tag cloud, touching on its subject matter and point of view, on Amazon, the tagging has devolved into a shouting match. I don’t think the people who tagged the book “asshat,” “vomit” or “w h o r e” are using tagging as a memory aid (“I forget—what books did I think are ‘asshat’ anyway?”). They’re using tagging as a sort of drive-by review.

Now, a case can be made that Amazon’s tags are signaling something important—this is a “controversial” book indeed! The LibraryThing tag cloud doesn’t show that as starkly. On balance, however, I think opinion tags corrupt the value of tagging. 

Either way, I think this example demonstrates that tagging isn’t a simple matter of putting users in front of taggable stuff.

Labels: tagging

Sunday, January 27th, 2008

Tagging: People-Powered Metadata for the Social Web

“Walk into the public library in Danbury, Connecticut, and you’ll find the usual shelves stacked with books, organized into neat rows. Works of fiction are grouped alphabetically by the author’s last name. Nonfiction titles are placed into their propper Dewey Decimal categories just like they are at tens of thousands of other libraries in North America.

But visit the Danbury Library’s online catalog, and you’ll find something rather unlike a typical library.

“A search for The Catcher in the Rye bring sup not just a call number but also a list of related books and tags—keywords such as “adolescence,” “angst,” “coming of age,” and “New York”—that describe J. D. Salinger’s classic novel … Click the tag “angst,” and you’ll find a list of angsty titles such as The Bell Jar, The Stranger, and The Virgin Suicides.”

So begins Gene Smith’s newly released book Tagging: People-Powered Metadata for the Social Web (New Riders). That’s right. The first book dedicated to tagging begins with LibraryThing—specifically our LibraryThing for Libraries project!

Library 2.0 people pause a second. How about that: a book about new developments in social media starts by talking about new things going on in a library? Not a social networking site, not a photo sharing site. A dream come true.

That’s all I have to say for now. I knew the book was coming; Gene interviewed me for it (selections on page 134). But I haven’t finished it yet.

My first impression is that it’s rich and detailed, covering everything from what tagging is and why it matters, to how to implement it at the level of user interface and even technically. But But, as is my wont, I’m already scribbling little objections and expansions in the margins. That’s the sign of a good book, right?

I’ve created a discussion group on Talk for people reading the book. Come join me to talk about it.

Labels: gene smith, librarything, librarything for libraries, social media, tagging, tags

Saturday, November 10th, 2007

An academic take on LibraryThing tags

I just discovered Tiffany Smith’s “Cataloging and You: Measuring the Efficacy of a Folksonomy for Subject Analysis“.* It’s the first detailed academic study of LibraryThing tagging—and a very sympathetic one.

The article focus on five books, comparing their tags with their Library of Congress Subject Headings (LCSH). The books are Harry Potter and the Half-Blood Prince, Susanna Clarke’s Jonathan Strange and Mr. Norrell, Ian McEwan’s Atonement, Marjane Satrapi’s Persepolis and John Hodgman’s The Areas of My Expertise.

LibraryThing doesn’t “win” every comparison, but it comes out pretty well. I’ve already coopted her observations on two titles into my talks, namely Persepolis and Areas of my Expertise, both of which rate a single, very general subject. On the latter:

“How do you identify the subject of a fictionalized almanac, which, according to the Library Journal blurb on the back cover, is ‘a handy desk reference for those needing a dose of nonsense’? If you’re the Library of Congress, you call it ‘American wit and humor’, and move on to the next item on your book cart. You’d be accurate, because Hodgman is American and the book is witty and humorous, but you wouldn’t have captured the specificity of this item.”

Smith contrasts this with the LibraryThing’s florid tag cloud, sporting such terms as almanac, hoboes, alchemy, cheese, cryptozoology, eels, omens, portents and absurdities. Record-by-record these tags may only serve to amuse, but if you can’t recall the title, Hodgman’s strange work can be easily retrieved by looking for books tagged both “eels” and “humor” or “hoboes” and “almanac”. By contrast, I would not recommend wading through the American Wit and Humor subject!

I was also gratified to see the author notice an effect I’ve mentioned periodically but which has found no echo in other examinations of the topic and in the whole tired expert-vs-amateur polemic. As she writes, LibraryThing members pick up on the Napoleonic Wars element in Jonathan Strange, which LCSH misses:

“This may speak to the problem of the physical impossibility of the library cataloger reading the entirety of this roughly 800 page book to get to all of the detail. The Napoleonic element is not evident for the first third of the book and is not represented in the chapter titles, although it plays a pivotal role in the plot development.”

Fundamentally, I’m willing to concede the virtues of expertise, but there’s a lot to be said for reading the book all the way through, and library catalogers are not often able to do that.

In this connection, I’ve previously noted how my wife’s third novel, Love in the Asylum, acquired an erroneous “Alcoholism” subject, derived ultimately from bad publisher flap copy. Clearly neither the librarian nor the publicist had read the book. (My wife caught the copy before it went to print, but not before it had acquired Cataloging in Print LCSHs.) And the LCSH team also missed the topic of American Indians (Abenakis), a major presence in the book, but not touched on in the first 1/3 or the flap copy.

Anyway, it’s an interesting read. Since Smith did her research LibraryThing has grown almost 100%, and there are few things I’d quibble with*, but it’s a very good outside examination of why LibraryThing member’s tags should be dismissed by librarians interested in cataloging quality.


*”in”—as they say in academia—Lussky, Joan, Eds. Proceedings 18th Workshop of the American Society for Information Science and Technology Special Interest Group in Classification Research, Milwaukee, Wisconsin.
**For example, Smith was confused why some LibraryThing works had subjects that were not present in the Library of Congress record, which she believes is our source. In fact, we get our Library of Congress Subject Headings (LCSH) from many librares. Libraries are free to augement the LC’s headings, and many do; we pick up anything in the 600s of all the MARC records that make up a work.

Labels: academics, LCSH, LIS, tagging, tags

Sunday, September 23rd, 2007

Tagging innovations, from the government

Has anyone seen click-based tag clouds? These are tag clouds in which the size of the words depend not on the number of times something has been tagged, but on the number of times the tag is clicked.

I never had, but Abby just spotted on the website of the the State of Delaware. Apparently site visitors are interested in employment.

It’s a pretty cool idea, and one I’d love to try out on LibraryThing. It wouldn’t work on work pages, but it might on the home page. And I’m impressed that it was on state-government site. While these sites are increasingly competent, they’re not usually thought of as a hotbeds of web innovation.

Labels: tagging, tags

Thursday, August 23rd, 2007

What does tagging do to knowledge?

Back when David Weinberger‘s Everything is Miscellaneous was published, LibraryThing ordered a box of copies to give out at conferences and so forth. (Although LibraryThing is mentioned only in passing, the book is, in a way, the intellectual justification for much of what we do.)

We ended up with a dozen or so left over, so I held a contest to get rid of them: Say something about what tagging means, or what it “does” to knowledge, and you might win a copy. I figured that it was time to stop pontificating about what people were doing with tags, and get them to pontificate instead.

The Talk topic eventually accumulated 170 comments, almost all interesting and some quite lengthy and involved. I found it thrilling stuff. We picked ten random winners, and sent out the books.

The whole discussion is newly relevant in light of our new Tag Mirror feature, discussed on the main blog and at in great detail on Talk.

Here are some selections from the full discussion:

I think the most interesting aspects of tagging, in a social networking context, are that: (1) All tagging is personal and (2) All tagging is public (ssd7)

So what does tagging do to knowledge? It classifies it in a fuzzy, family-resemblance kind of way, doing justice to multiple topics and interdisciplinary books in a way that the Dewey Decimal System could only do if it worked in four or five dimensions at once. (MyopicBookworm)

I like fun tags that are so personal or unique that nobody else uses them. A friend of mine, for example, has tags like “Detectives with gimmicks“, “Elaborate crimes“, “Witty people being clever“, and my favorite “Fangirlin’“. I myself want to use a tag for “Farm boys with magical destinies” but it’s apparently too long. (saturnine13)

Tags capture individual perceptions of a work, data, and add that information to our knowledge of the work. That’s a useful enhancement, but the variety offered becomes a disadvantage if they are used to find other works. Tags lack the precoordination necessary for efficient comprehensive searching. For example, the tagmash search for libraries, –fiction includes libraries and bibliotecas, but not bibliothéques, etc. Related works may have been lost. That interferes with one of Ranganathan’s laws—it does not save the time of the reader. (notelinks)

One of the things I find most fascinating about tagging is what it reveals about the cognitive processes of the taggers. What makes one person tag Walden with “simplicity” and another person with “hermits“? It’s not a novel observation that we all experience books (for example) personally or subjectively. Tagging is a very simple way to turn that individual experience into universal information. (johnascott)

I’m always amazed at the different ways of viewing something when I see how differently others tagged something to which I have already assigned the most ‘correct’ or ‘appropriate’ tags. (bobngail)

Believe it or not, tags are actually more formal or structured than some similar systems. Consider the general WikiWiki idea of turning any word into a link if it’s in FunnyCaps. The effect is very similar, but the links appear anywhere in text. Tags isolate the linking to specific fields. The extreme free-form nature of Wikis drives some people off, just as the extreme formalisms of MARC, etc. do. So tags seem to be a widely accepted compromise. (JasonRiedy)

Tagging doesn’t so much affect knowledge as reveal it in unexpected places and from unexpected sources. We are all bent, but we’re bent in different directions, and so the sum of our deviances converges on reality quickly – and tagging taps into that. (xaglen)

I think the main point to remember is that tagging is NOT JUST an unstructured form of subject headings; it is a completely different way of viewing the world. Taxonomies and standardised subject heading vocab divide knowledge hierarchically according to set rules. Folksonomies allow knowledge to emerge through collaborative involvement. Tagging allows people to look at books in new ways, to share that knowledge, and to create tag clouds so that no one tag gets missed. (mrsradcliffe)

Tagging helps to both aggregate and splinter knowledge. By this I mean, tagging helps to navigate relationships among disparate “knowledge objects” while at the same time, splits the categorization of similar objects into much finer and/or more random collections. (stoberg)

Everything is Miscellaneous is one of 37 books I currently have tagged “included in the present classification” (there are none that look like flies from a great distance). (sabreuse)

First, tags really only seem to work for organizing stuff you have some sort of conceptual “ownership” of things that in some way you have an incentive to keep order within. People don’t seem to want to tag in enough quantity / detail to be useful when they don’t have a personal stake in sorting through the resultant mess. (cubeshelves)

I much rather spend my time reading a book! (bcobb)

From a library standpoint, my favorite thing about tags is that it allows natural language into the catalog. .. [A]nd what tagging does to knowledge? It gives you more access points. (e1da)

Tagging is getting awfully close, it seems, to the way our brains naturally work anyway – it “associates” and “retrieves” based on miscellaneous tags it has (subconsciously) attached to the idea or concept. (nicknich3)

The variety of tagging systems is amazing. You can tell a lot about a user’s interests by the complexity of tags relating to a specific concept. I am always a bit disappointed when I encounter a catalog without tags. Of course you can look at the books in that catalog, but you don’t get much indication of the user’s relationship to their books. (oregonobsessionz)

[SilentInAWay wrote an exceptional piece on the Deathly Hallows tag cloud and it’s common and uncommon tags, from fantasy (783) to Kleenex (1): — Ed]

[W]here there is a clear consensus on a tag, it is probably based on fairly broad considerations (and therefore constitutes relatively superficial knowledge). Conversely, the most intriguing tags (autistic-like character, Kleenex, the end of Pottermania) are almost inevitably used by only a single member. (SilentInAWay)

I remember being flabbergasted when I found out how long it took for the Library of Congress to change the subject heading “Vietnam Conflict” to “Vietnam War.” Now it doesn’t seem so ludicrous to me.

Even recognizing that LC Subject Headings and tagging achieve two different goals doesn’t ease my mind about this. I cannot stand the thought of how muddy and increasingly useless much of Library Thing’s tagging database will end up being in a very short time. (lmccoll)

Tagging permits me to see books as others see them. (kencf)

Labels: contests, everything is miscellaneous, tag mirror, tagging, weinberger

Tuesday, July 24th, 2007

Tagmash: Book tagging grows up

Tagmash: alcohol, history gets over the fact that almost nobody tags things history of alcohol

Short version: I’ve just gone live with a new feature called “tagmash,” pages for the intersections of tags. This is a fairly obvious thing to do, but it isn’t trivial in context. In getting past words or short phrases, tagmash closes some of the gap between tagging and professional subject classifications.

For example, there is no good tag for “France during WWII.” Most people just don’t tag that verbosely. Tagmash allows for a page combining the two: France, wwii. If you want to skip the novels, you can do france, wwii, -fiction. The results are remarkably good.

Tagmash pages are created when a user asks for the combination, but unlike a “search” they persist, and show up elsewhere. For example, the tagmash for France, Germany shows France, wwii as a partial overlap, alongside others. Related tagmashes now also show up on select tag and library subject pages, as a third system for browsing the limitless world of books.

Booooring? Go ahead and play a bit:

That’s the short version. But stop here and you’ll never know what Zombie Listmania is!

Long version. LibraryThing has shown some of the things that book tags are good for, such as plain language, genre fiction, capturing identity and perspective, academic schools, staying current and changing over time. (Details and examples in footnote.*)

It also demonstrates some of the weaknesses, including:

  1. Idiots
  2. Bad actors (spammers, racists, anarchists)
  3. “Personal” tags clouding the tagosphere with junk (eg., “at the beach house”)
  4. The lack of a “controlled” vocabulary results in ambiguous terms (eg., classics, leather, magic)
  5. Tags lacks the detail and focus available to a hierarchical subject system like the Library of Congress Subject Headings (LCSH), eg.,
    Great Britain — History — Elizabeth, 1558-1603 — Fiction
    , or
    Jews — Italy — Bologna — Conversion to Christianity — History — 19th century**

As I’ve argued elsewhere and in my Library of Congress talk, problems 1, 2 and 3 are mitigated by having LOTS of tags. Idiocy, malice and personal junk fall out statistically. A tag here or there can’t be trusted, but a large body of tags in agreement is different.

Problems 4 and 5 are harder to tackle. Flickr has shown the way with one solution, statistical clustering. The screen shot below shows this–clusters of images related to the tag “bow.”

Some day–when I become a better programer?–I’m going to try this on LibraryThing data. It will help with ambiguity—the secondary tags on the various meanings of “leather” are surely wildly divergent! But I suspect it separates better than it clarifies. Flickr supposes that tags fall into discrete clusters, but subjects interact with books in extremely complex ways. On a more basic level, I am suspicious of the too-quick resort to algorithms against user data.*** After all, if computers are so good at figuring out meaning, why were users necessary in the first place? It smacks of technological revanchism.

So, where Flickr’s clusters are automated, tagmash is a semi-automated process. LibraryThing does the statistics, but users decide what the meaningful clusters are. Some mashes are interesting and useful. Some aren’t. By and large, uninteresting clusters won’t last.****

This certainly helps with ambiguity. Take the problemmatic tag leather, which divides easily into tagmashes like:

Now let’s take the “focusing” power of hierarchy. As mentioned above, there is no good way to get at “france during wwii.” The tag Vichy covers some of the ground, but not enough. Tagmash provides an answer.

The book list is good, and a simple union gets around an imposed hierarchy. Looking at the related LCSHs, for example, one is left in doubt whether France is part of World War II, or World War II part of France—or what:

Of course, both trees are equally artificial. David Weinberger writes how, in the real world, a leaf can be on many branches. But it’s equally true that what’s trunk and what’s branch are largely about where you start–dirt or pinecone. Either way, branching happens. The order of the branches isn’t necessarily important.

Even as it borrows some of the virtues of subject classification, tagmash keeps the strenghts of tagging. Subject systems are pre-built things. Now and then they get larger, but it takes deliberation and effort. What gets “blessed” is often surprising. I would have never predicted the unusually staid LCSH would have embraced:

But tagging has no limits. Think of the tagmash “erotica” and “zombies” and there it is. (Tagmash: erotica, zombies). Want to know what chick lit takes place in Greece? (Tagmash: chick lit, greece.) Young adult books involving horses? (Tagmash: horses, young adult.) Poems from or about San Francisco? (Tagmash: poetry, san francisco). Slavery in Brazil? (Tagmash: brasil, slavery.) Non-fiction books about Narnia? (Tagmash: narnia, -fiction.) The options are endless.

Of course, tagmash only narrows the gap. It doesn’t eliminate it. Tagmash: poetry, San Francisco still can’t distinguish between poetry about and poetry from San Francisco–it involves whatever is tagged “San Francisco” and that’s probably a mixed bag.***** Well-planned and carefully executed subject systems have strengths that no ad hoc, regular-person system can match.

Lastly—let there be no doubt—tagmash needs a very large quantity of tags to work. For tagmash after tagmash, the data is simply insufficient.

You’ve made it to Zombie Listmania! There are some obvious directions this can go:

  • The syntax can improve, for example to allow alternates (eg., humor, cats/dogs)
  • The syntax can include non-tag factors, such as formal subject headings (Tag: zombies, LCSH: love stories), languages, dates, authors and so forth.
  • The syntax can include weights (eg., Zombies 50%, vampires 50%, love stories 90%). Abby and I experimented with just such a system, creating algorithmic proxies for BISAC (bookstore) headings. It isn’t that hard to do.
  • Complex mashes could acquire titles and other metadata.
  • Users could follow a tagmash, and be alerted whenever new material enters the list.

Amazon calls its static, or dead, lists “Listmania.” All these tend to create a “Zombie Listmania,” lists of books that “won’t stay dead.” Instead, they change over time, as the underlying social and non-social data change. There’s no reason you couldn’t create “Zombie” versions of formal subject headings—a series of tags and other markers which approximated the content of a professionally-assigned subject heading.

Pretty cool idea, I think. We’ll see what we can do about it.

Details.

  • Tagmashes can be made from any tagmash or tag page. Just search for a tag or two or more tags with a comma between them. The URLS are the same /tag/ plus a tag or tags separated by commas.
  • The weighting of tags is wiggly. We’re trying to get at both raw numbers of tags on an item and the relative salience (number divided by total number of tags), and then cross this data tag-by-tag. There is no obvious answer. In an ideal world, some tags would about salience (eg., humor) and others would be threshholds (eg., fiction)–that is, when you’re looking for humor, fiction you want the funniest fiction, not the most fictional humor.
  • You can enter the tags in any order, but it will reformat your URL in alphabetical order, with the minuses at the end, such that “wwii, france” is the same as “france, wwii.”
  • A single minus (-fiction) “discriminates” against items tagged “fiction.” A double minus (–fiction) disqualifies all books with the fiction tag.
  • Tagmashes don’t get built until someone builds them. The first time can take a while to generate. There is currently no system to expire older or underused tagmashes.
  • UPDATE: I’m seeing a lot of part/whole tagmashes. These rarely work. When you search for “Einstein, science” or “Manet, art” you’re not doing much more than putting a statistical cramp on the smaller of the two tags—a few Manet books won’t have an art tag, and that will be the end of them. Tagmashes work with different things, not a thing and its category.

Footnotes!

*What’s good about tagging:

  • Tags use everyday terms (the tag cooking vs. the subject cookery)
  • Tags are great for genre fiction that subject systems can’t keep up with as fast or as well as their readers (chick lit, cyberpunk, paranormal romance)
  • Tags often encode subtleties that “controlled vocabulary” irons out (lgbt, glbt, queer, gay, homosexuality)
  • Tags capture identity and perspective that subject systems can’t or wont (queer, glbt, lgbt, christian living)
  • Tags are good for schools of thought (intelligent design, austrian economics)
  • Tags respond quickly to change (hurricane katrina)
  • Tags “keep happening” in a way that systems like LCSH do not, getting added to books where LCSH misses the “first wave” of anything new (memetics, sociobiology)

**I’ve left out one problem, not covered at the LC—how “democratic” weighting can put Angela’s Ashes at the top of the Ireland tag. books. I want to write a blog post on the topic sometime. I think there are ways around it, and algorithmic solutions that nobody has really tried.

Aside: Much LIS anti-tagging polemic focuses on the most trivial of problems—spelling mistakes and “incorrect” tags. The former underestimates technology, the latter insults our intelligence. LibraryThing has dealt with the spelling problem, and has seen very few “wrong” tags. In fact, there are some serious problems with tagging. But you have to understand tags before you can see the problems, and many refuse to get past the idea that people will spell “white” wrong, or tag white horses as black.
***This is half formed. I have a problem with the reflexive “turn” from people-centered data to algorithms. I see this pattern again and again in software. Something transformative happens–something human. But it’s imperfect, so programmers conclude that programs will fix humans. In a way, it’s a reassertion of importance. More often, humans fix humans. To adapt David Weinberger, the answer to user-generated data is MORE user-generated data.
****Probably there’s got to be some system to expire unused clusters.
*****UPDATE: After turning the feature loose I watched what new tagmashes would be created. One was children, cooking. Should I call the police?

Labels: new feature, tagging, tagmash

Sunday, July 1st, 2007

Tags and the Power of Suggestion

REMINDER: LibraryThing is offering $1,000 worth of books if you find us an employee!

As usually argued, tags have “low cognitive cost,” a high-cognitive cost way of saying “you dash them off.” You grab the book, you tag it “cooking” and move on.

That usually a good thing. If you thought about it, you might try to come up with the “perfect” phrase, like “food preparation,” to cover salad-making and other methods that involve no actual cooking, or “food preparation, presentation and related subjects” to cover that book about creating beautiful designs in coffee foam and the manual that came with the Salad Shooter. But coming up with the perfect phrase takes effort and time. You pay for it then and, more importantly, you pay for it when you come to search–for searching is even more about low cognitive effort than tagging.

This much is standard. It’s also clear that “dashed-off” terms cluster well socially. For most domains there are only a few simple terms (eg., cooking, cook books), but an almost endless number of complex ones.

There are problems with this. Indeed, all the “problems” with tagging stem from it. A careful, formal system would distinguish between books about “leatherworking” and books of “leather erotica”. On LibraryThing, both tend to get tagged leather. I won’t multiply examples I’ve discussed before, so I can get to a new one: the Power of Suggestion.

Yak, yak, yak, yak. Joke, joke, joke, joke! Now, what is the white of an egg called? Did you think “yolk”? I’ll bet you did. The children’s joke illustrates something about the brain works. Rapid thought is open to the power of suggestion.

Now catalog and tag the book 9-11 by Noam Chomsky. I’ll bet you tag it “9-11.” The same goes for 9-11 emergency relief, 9-11 : artists respond and 9-11 : the world’s finest comic book writers and artists tell stories to remember. But elsewhere, “9/11” (with a slash) is by far the dominant tag.

All books
9/11 1179 times
9-11 173 times (13%)

Books with “9-11” in the title
9/11 28 times
9-11 32 times (53%)

Sometimes seeming synonyms actually encode a difference in nuance or perspective (eg., Shirky’s example of “film” vs. “cinema”). In this case, they don’t. There doesn’t appear to be any real difference between “9-11” and “9/11” that can’t be explained by the tile. This is why LibraryThing users have “combined” the two tags, an operation we allow, and the combination has not been contested.

Titles influence how we tag things. Most of the books on birds and birding could be tagged with either term, but books with “birds” in the title rank higher on the “birds” tag.

Or take Heilbroner’s The Worldly Philosophers. My brother, Oakes, once pointed out, Helbroner’s book about the history of economics is almost invariably to be found in a used bookstore’s “Philosophy” section, not in “Economics.”* On LibraryThing the problem isn’t so acute, but it’s there–152 people have tagged it “economics,” 75 have tagged it “philosophy,” the second-largest tag. Of course, there is some legitimate cross-over between the two subjects. But I don’t think the content alone would merit so much “philosophy” tagging.

This isn’t a perfect example either. It would be interesting to know how many of the “philosophy” taggers had read the book, or what their other tags for it were. But I think it shows a pervasive effect.

The “Power of Suggestion” isn’t a major problem with tagging. But in showing us a flaw, it clues us in to what it’s all about.


*He showed me this when I was quite young, and it stuck. So when I’m in a new bookstore and passing the philosophy section, I often do a quick check to see if my old, confused friend is there again. I’m weird.

Labels: cognitive cost, tagging, tags