Sunday, December 18th, 2005

Combining tags (heresy!)

I’ve added a “combine tag” feature, allowing users to combine VERY similar tags to be merged on the global level. (No users’ tags are actually changed.) As with author disambiguation, LibraryThing users make the decision. The choice isn’t pushed very hard; most users won’t see it, even if they benefit from it.

You can combine when you see this below the list of related tags:

As blog readers are familiar, I take a hard, idealistic line on tagging. Tags are about memory—your memory. Automated or suggested tags (other than your own) interfere with that process. If you’re gonna use someone else’s mental categories, use an expert’s, like say, the Library of Congress’. I buy Clay Shirky’s essay/talk extolling the “signal in the noise” between tags like cinema and movies.

As the saying goes, “I believe. Help my unbelief.” Reworking the related tags feature got me thinking about “tag synonyms.” Is there any difference between wwii and ww2? What about world war two, world war ii and world war 2? Is some trivial nuance really worth the social loss—World War II buffs thinking they’re alone, worse recommendations, and so forth? After all, the top World War II tag (wwii) is used only 1,300 times, but all the tags together hit 3,100!

So, I came up with a “combine tags” feature. It works like the “combine author” feature, except that the combine page has half a page of “philosophy” on it, begging users not to combine merely similar tags. There is also a tag combination log, allowing finicky LibraryThing-arians to follow the action, and separate tags at need. Like a wiki, it’s easier to correct damage than to do it. The combination log records users who combine tags, but not those who separate them. Go ahead and separate a tag; nobody will know you did it!

I’ve already separated some. In my book Farsi is not the same as Persian. Although Persian is a term for Farsi (perhaps more commonly applied to “old” and “middle” Persian than the modern language), Persian is also a general adjectival form of Persia (which, incidentally, has a totally different flavor than Iran). I also split to be read and unread. To be read implies intent to read. Unread does not.

Well, that was fun. Now back to the book-cover issue…

Algorithmic tangent: There are various ways of thinking of “relatedness” between tags. For the tag pages, I key it to “works” (Platonic books, as opposed to individual books). Tags are related to the extent they are applied to the same works. Using this model, one might think of synonymous tags as tags that often occur together by work but rarely by user or individual book. A little play found this to works okay, but not well enough to be definitive. So I’ve resorted to user control. In essence, I’m using one user-driven process to correct for occasional mistakes of another.

Has any other tagging site ever done this?

Perhaps someone can direct me to where people talk about this stuff; I certainly haven’t found it. LibraryThing’s tag algorithms have all been ex nihilo. This is scary. I mean, if it were up to me, sorting would probably have never gone past the “bubble sort.” Hello? I studied Greek and Latin in college!

Labels: 1


Leave a Reply