Archive for the ‘tagmash’ Category

Thursday, September 24th, 2009

Geeks vs. Nerds: Hard data

LibraryThing’s systems administrator, John Dalton, came up with this—using LibraryThing’s tagmash feature to demonstrate the difference between geeks and nerds:

See also:

Labels: geeks, humor, nerds, tagmash

Tuesday, September 15th, 2009

Tagmash, redux: Tim’s favorite feature

Tagmash. I’ve redone, improved and expanded my favorite feature, tagmashing.

Introduced back in 2007, tagmashes, allow you to investigate what books satisfy two or more tags. It’s a great way to find books of a clear type, but for which no single tag really works.

For example, no one has yet used the tag “vegetarian Indian cooking” and there’s no Library of Congress Subject Heading for it either. But combine three tags, like vegetarian, India and cooking into the tagmash vegetarian, India, cooking and you get over 50 good matches.

Simple two-tag combination can work wonders:

Some of my favorites are off-beat: all those books about knitting for your dog and—shiver—knitting with dog hair can be found at knitting, pets. erotic, zombies is 80% Laurell K. Hamilton. And who can say no to humor, pirates? (Did you know that this Saturday is Talk like a Pirate Day? You will.)

On the serious end, fairly complex topics also work:

You can also use – (minus) or — (double minus) to mean “demote” or “remove” a tag. For example:

An important feature of tagmash is that it’s not just a “search.” Once created, tagmash pages stay there, and it enters the “swirl of relatedness.” Somtimes a tag page will suggest the perfect tagmash. Other times, a tagmash will suggest an unconsidered subject.

New Feature: Tagmash overlap. I’ve added a new feature that, I think, brings tagmash to a new level—the tagmash overlap.

It works something like tag mirrors. Instead of showing you how you tag things, it shows how others tag your stuff. Except instead of showing you Individual tags, it finds tagmashes.

The results is, I think, a good list of topics you’re interested in—topics more complex than a single tag can express. In my case, it surfaces topics like Macedonia, history, Greek, divination, Ottoman Empire, travel and erotic, poetry (!). Abby is apparently interested in adventure, surreal, English, death, love and—what a winner—evil, love.

You can find the feature from your profile statistics page. If you’re signed in, this link will take you to yours.

What do you think? Comment here or come over to the New Features Talk thread.

Labels: classification, tag mirror, tagging, tagmash

Monday, June 16th, 2008

Tagmashes for Readers Advisory

I’ve been thinking a lot about how booksellers and librarians can use LibraryThing for “readers advisory,” helping readers find books they’ll love. One answer, I think, is to promote and improve our “tagmashes” feature.

Readers Advisory is something of a discipline in librarianship, with a body of thinking behind it. There are also a number of well-known subscription RA tools, such as NoveList and FictonConnection, available in a very large number of US libraries. (See this page for a much larger list, which includes LibraryThing up with the big guys.)

LibraryThing can be used for Readers Advisory in a couple of ways:

  • Some libraries have used LibraryThing to highlight special topics (eg., new YA material at the Framingham Library)
  • Most LibraryThing works include recommendations—both automatic and member suggested, and with various summary and detailed lists—so you can get from a known book to a set of similar titles.  
  • Our fielded wiki Common Knowledge links books by series, places, awards and so forth.
  • LibraryThing tag pages provide relevancy-ranked lists for many topics, eg., chick lit, steampunk, memetics, cozy mysteries
  • Tagmashes

“Tagmashes,” introduced a year ago, are a variant on tags, for when a simple tag isn’t good enough.

By combining two or more tags, or excluding tags,  tagmashes extend tagging and nip away at some of the unique values of traditional subject classification—high granularity and hierarchy. Thus, although the tagmash France, wwii doesn’t have an explicit notion of hierarchy, it works something like the LCSH World War II, 1939-1945 — France. (And, of course, the LCSH tree is an artificial one—there’s nothing in the idea that makes France a branch of World War II more than World War II is a branch of France!)

Notably, the system doesn’t make tagmashes, users do. Once made, they “stick around,” and may appear on related tag and subject pages, with their overlap to that page listed, testimony that a particular combination of tags made sense to someone. The system could–but does not currently–track tagmashes for relevance and usage, pruning some and elevating others. And it could allow users to edit, rate or review them for useful and accuracy.

I have it in my head that tagmashes, particularly with these additions, are one stone in the bridge between “free tagging” and traditional classification, between algorithmic recommendations and hand-generated ones, between the physical past and the digital future.

I see a world of librarians and readers creating, spreading and editing book lists that don’t just “stay still”—depreciating over time, like a physical object—but shift and grow like a digital object can. And they wouldn’t be the same for everyone, like a physical object, but adapt to the reader, like only a digital object can.

Anyway, here are some tagmashes to play with:

Labels: ra, readers advisory, tagmash

Thursday, August 23rd, 2007

Tag Mirror: See your books the way others do

UPDATE: I’m really enjoying the Talk discussion of this feature. Also, at this point it’s better to talk about the feature than to use it. Everyone using it at once has the server that handles it taxed rather seriously!

A major publisher recently asked us to show them a tag cloud of their books. It took a mental flip, but only a few lines of code to adapt this for individual use.

The result is Tag Mirror, available from your and everyone’s profile—here’s mine (and Abby‘s, Altay‘s, Giovanni‘s and Casey‘s*). If you’re signed in, here’s yours. (Please note: It takes serious processing power to analyze 22 million tags. Everyone is going to hit it at once, so be patient.)

Tag Mirror “holds a mirror” up to your books and to you. Instead of showing what you think about your books—what a regular tag cloud shows—it shows you what others think of them, in effect using LibraryThing’s twenty-two million tags to organize and surface interesting topics from within your own collection.** As with other tag clouds, size equals importance. When you click on a tag, you get a relevancy-ranked list of books tagged that way.

I can’t decide if it’s just the sort of cherry-on-top feature that makes LibraryThing unique or if it’s something genuinely new and interesting. I think it might be the latter. As Altay put it, it’s the sort of idea that seems obvious in retrospect.

I didn’t know I was interested in gender studies.

Here’s a for-example. I don’t use the tags gender studies, patristics or theory. They’re just not terms I use. To some extent, that reflects who I am. But I have a fair number of books that, to others, fall under those categories. It’s interesting to slice my books up in an alien way—to see them through other eyes. Maybe I’m more interested in gender studies than I thought.

More concretely, I do use the tag “alternate history,” but browsing my tag mirror page called up some alternate histories that I hadn’t tagged that way—useful stuff.***

Finally, Tag Mirror gives everyone a tag cloud, even those who don’t bother to tag anything. It seems almost unfair.

As our recent discussion of what tagging does to knowledge brought out so well, tagging is a complex mixture of private purpose and public good. I agree with those who say that we tag best when we tag for ourselves. But when everyone does that, a rich web of meaning is created.

I’ve done my best to push tagging in some new directions, trying subjects and tags together statistically, making book recommendations based on tag patterns, and with the tagmash feature. You can add Tag Mirror to that list. Little things. But they keep getting more interesting.

UPDATE: It’s 4:30am and, of course, I couldn’t finish blogging it before someone else started a thread about it (“Just noticed this on my profile”). Come talk about it.


*Casey has a surprising number of cookbooks! He’s coming up here in a few weeks—it’ll be the first time any of us have actually met him. We usually just order pizza. I think that plan’s changed.
**It doesn’t actually exclude your own tags. They still have an effect.
***It also brought up Howard Zinn’s People’s History of the United States. People tag unexpectedly, if humorously.

Labels: folksonomy, tag mirror, tagging, tagmash

Tuesday, July 24th, 2007

Tagmash!

Tagmash: alcohol, history gets over the fact that almost nobody tags things history of alcohol

Short version: I’ve just gone live with a new feature called “tagmash,” pages for the intersections of tags. This is a fairly obvious thing to do, but it isn’t trivial in context. In getting past words or short phrases, tagmash closes some of the gap between tagging and professional subject classifications.

For example, there is no good tag for “France during WWII.” Most people just don’t tag that verbosely. Tagmash allows for a page combining the two: France, wwii. If you want to skip the novels, you can do france, wwii, -fiction. The results are remarkably good.

Tagmash pages are created when a user asks for the combination, but unlike a “search” they persist, and show up elsewhere. For example, the tagmash for France, Germany shows France, wwii as a partial overlap, alongside others. Related tagmashes now also show up on select tag and library subject pages, as a third system for browsing the limitless world of books.

Booooring? Go ahead and play a bit:

That’s the short version. But stop here and you’ll never know what Zombie Listmania is!

(full post over at Thingology, “Tagmash: Book tagging grows up”)

Labels: new feature, new features, tagging, tagmash

Tuesday, July 24th, 2007

Tagmash: Book tagging grows up

Tagmash: alcohol, history gets over the fact that almost nobody tags things history of alcohol

Short version: I’ve just gone live with a new feature called “tagmash,” pages for the intersections of tags. This is a fairly obvious thing to do, but it isn’t trivial in context. In getting past words or short phrases, tagmash closes some of the gap between tagging and professional subject classifications.

For example, there is no good tag for “France during WWII.” Most people just don’t tag that verbosely. Tagmash allows for a page combining the two: France, wwii. If you want to skip the novels, you can do france, wwii, -fiction. The results are remarkably good.

Tagmash pages are created when a user asks for the combination, but unlike a “search” they persist, and show up elsewhere. For example, the tagmash for France, Germany shows France, wwii as a partial overlap, alongside others. Related tagmashes now also show up on select tag and library subject pages, as a third system for browsing the limitless world of books.

Booooring? Go ahead and play a bit:

That’s the short version. But stop here and you’ll never know what Zombie Listmania is!

Long version. LibraryThing has shown some of the things that book tags are good for, such as plain language, genre fiction, capturing identity and perspective, academic schools, staying current and changing over time. (Details and examples in footnote.*)

It also demonstrates some of the weaknesses, including:

  1. Idiots
  2. Bad actors (spammers, racists, anarchists)
  3. “Personal” tags clouding the tagosphere with junk (eg., “at the beach house”)
  4. The lack of a “controlled” vocabulary results in ambiguous terms (eg., classics, leather, magic)
  5. Tags lacks the detail and focus available to a hierarchical subject system like the Library of Congress Subject Headings (LCSH), eg.,
    Great Britain — History — Elizabeth, 1558-1603 — Fiction
    , or
    Jews — Italy — Bologna — Conversion to Christianity — History — 19th century**

As I’ve argued elsewhere and in my Library of Congress talk, problems 1, 2 and 3 are mitigated by having LOTS of tags. Idiocy, malice and personal junk fall out statistically. A tag here or there can’t be trusted, but a large body of tags in agreement is different.

Problems 4 and 5 are harder to tackle. Flickr has shown the way with one solution, statistical clustering. The screen shot below shows this–clusters of images related to the tag “bow.”

Some day–when I become a better programer?–I’m going to try this on LibraryThing data. It will help with ambiguity—the secondary tags on the various meanings of “leather” are surely wildly divergent! But I suspect it separates better than it clarifies. Flickr supposes that tags fall into discrete clusters, but subjects interact with books in extremely complex ways. On a more basic level, I am suspicious of the too-quick resort to algorithms against user data.*** After all, if computers are so good at figuring out meaning, why were users necessary in the first place? It smacks of technological revanchism.

So, where Flickr’s clusters are automated, tagmash is a semi-automated process. LibraryThing does the statistics, but users decide what the meaningful clusters are. Some mashes are interesting and useful. Some aren’t. By and large, uninteresting clusters won’t last.****

This certainly helps with ambiguity. Take the problemmatic tag leather, which divides easily into tagmashes like:

Now let’s take the “focusing” power of hierarchy. As mentioned above, there is no good way to get at “france during wwii.” The tag Vichy covers some of the ground, but not enough. Tagmash provides an answer.

The book list is good, and a simple union gets around an imposed hierarchy. Looking at the related LCSHs, for example, one is left in doubt whether France is part of World War II, or World War II part of France—or what:

Of course, both trees are equally artificial. David Weinberger writes how, in the real world, a leaf can be on many branches. But it’s equally true that what’s trunk and what’s branch are largely about where you start–dirt or pinecone. Either way, branching happens. The order of the branches isn’t necessarily important.

Even as it borrows some of the virtues of subject classification, tagmash keeps the strenghts of tagging. Subject systems are pre-built things. Now and then they get larger, but it takes deliberation and effort. What gets “blessed” is often surprising. I would have never predicted the unusually staid LCSH would have embraced:

But tagging has no limits. Think of the tagmash “erotica” and “zombies” and there it is. (Tagmash: erotica, zombies). Want to know what chick lit takes place in Greece? (Tagmash: chick lit, greece.) Young adult books involving horses? (Tagmash: horses, young adult.) Poems from or about San Francisco? (Tagmash: poetry, san francisco). Slavery in Brazil? (Tagmash: brasil, slavery.) Non-fiction books about Narnia? (Tagmash: narnia, -fiction.) The options are endless.

Of course, tagmash only narrows the gap. It doesn’t eliminate it. Tagmash: poetry, San Francisco still can’t distinguish between poetry about and poetry from San Francisco–it involves whatever is tagged “San Francisco” and that’s probably a mixed bag.***** Well-planned and carefully executed subject systems have strengths that no ad hoc, regular-person system can match.

Lastly—let there be no doubt—tagmash needs a very large quantity of tags to work. For tagmash after tagmash, the data is simply insufficient.

You’ve made it to Zombie Listmania! There are some obvious directions this can go:

  • The syntax can improve, for example to allow alternates (eg., humor, cats/dogs)
  • The syntax can include non-tag factors, such as formal subject headings (Tag: zombies, LCSH: love stories), languages, dates, authors and so forth.
  • The syntax can include weights (eg., Zombies 50%, vampires 50%, love stories 90%). Abby and I experimented with just such a system, creating algorithmic proxies for BISAC (bookstore) headings. It isn’t that hard to do.
  • Complex mashes could acquire titles and other metadata.
  • Users could follow a tagmash, and be alerted whenever new material enters the list.

Amazon calls its static, or dead, lists “Listmania.” All these tend to create a “Zombie Listmania,” lists of books that “won’t stay dead.” Instead, they change over time, as the underlying social and non-social data change. There’s no reason you couldn’t create “Zombie” versions of formal subject headings—a series of tags and other markers which approximated the content of a professionally-assigned subject heading.

Pretty cool idea, I think. We’ll see what we can do about it.

Details.

  • Tagmashes can be made from any tagmash or tag page. Just search for a tag or two or more tags with a comma between them. The URLS are the same /tag/ plus a tag or tags separated by commas.
  • The weighting of tags is wiggly. We’re trying to get at both raw numbers of tags on an item and the relative salience (number divided by total number of tags), and then cross this data tag-by-tag. There is no obvious answer. In an ideal world, some tags would about salience (eg., humor) and others would be threshholds (eg., fiction)–that is, when you’re looking for humor, fiction you want the funniest fiction, not the most fictional humor.
  • You can enter the tags in any order, but it will reformat your URL in alphabetical order, with the minuses at the end, such that “wwii, france” is the same as “france, wwii.”
  • A single minus (-fiction) “discriminates” against items tagged “fiction.” A double minus (–fiction) disqualifies all books with the fiction tag.
  • Tagmashes don’t get built until someone builds them. The first time can take a while to generate. There is currently no system to expire older or underused tagmashes.
  • UPDATE: I’m seeing a lot of part/whole tagmashes. These rarely work. When you search for “Einstein, science” or “Manet, art” you’re not doing much more than putting a statistical cramp on the smaller of the two tags—a few Manet books won’t have an art tag, and that will be the end of them. Tagmashes work with different things, not a thing and its category.

Footnotes!

*What’s good about tagging:

  • Tags use everyday terms (the tag cooking vs. the subject cookery)
  • Tags are great for genre fiction that subject systems can’t keep up with as fast or as well as their readers (chick lit, cyberpunk, paranormal romance)
  • Tags often encode subtleties that “controlled vocabulary” irons out (lgbt, glbt, queer, gay, homosexuality)
  • Tags capture identity and perspective that subject systems can’t or wont (queer, glbt, lgbt, christian living)
  • Tags are good for schools of thought (intelligent design, austrian economics)
  • Tags respond quickly to change (hurricane katrina)
  • Tags “keep happening” in a way that systems like LCSH do not, getting added to books where LCSH misses the “first wave” of anything new (memetics, sociobiology)

**I’ve left out one problem, not covered at the LC—how “democratic” weighting can put Angela’s Ashes at the top of the Ireland tag. books. I want to write a blog post on the topic sometime. I think there are ways around it, and algorithmic solutions that nobody has really tried.

Aside: Much LIS anti-tagging polemic focuses on the most trivial of problems—spelling mistakes and “incorrect” tags. The former underestimates technology, the latter insults our intelligence. LibraryThing has dealt with the spelling problem, and has seen very few “wrong” tags. In fact, there are some serious problems with tagging. But you have to understand tags before you can see the problems, and many refuse to get past the idea that people will spell “white” wrong, or tag white horses as black.
***This is half formed. I have a problem with the reflexive “turn” from people-centered data to algorithms. I see this pattern again and again in software. Something transformative happens–something human. But it’s imperfect, so programmers conclude that programs will fix humans. In a way, it’s a reassertion of importance. More often, humans fix humans. To adapt David Weinberger, the answer to user-generated data is MORE user-generated data.
****Probably there’s got to be some system to expire unused clusters.
*****UPDATE: After turning the feature loose I watched what new tagmashes would be created. One was children, cooking. Should I call the police?

Labels: new feature, tagging, tagmash