Thursday, August 25th, 2011

New tag-based recommendations algorithm

Short version. I’ve just finished up a new algorithm for calculating book recommendations based on tags. You can see them on the “Recommendations” sub-page for every work page, under a special “Tags” heading (for a limited time only!), as shown to the right. When a work doesn’t have a recommendation, it will make it. (Expect to wait 2-10 seconds.) Recommendations will be propagating through the system, and, combined with the four other recommendations algorithms we use, going into your personal recommendations over time.

Come discuss it on talk here.

Long version. Recommendation algorithms are always tricky, but doing it based on tags alone is particularly difficult. Do you consider total overlap? Overlaps by tag? How do you rate a book missing an important tag? How do you factor up meaty and meaningful tags, and downgrade meaningless, over obvious or ephemeral ones? LibraryThing has long had tag-based work-to-work recommendations, but they were of uneven quality. I haven’t been making new ones for a while now, and letting them die out of the system; all the old tag-based recommendations have been removed.

The new algorithm approaches the issue afresh, looking at all of a work’s tags, and taking into account various factors that can trip it up. It thinks about factors like tag salience on a work and generally, degree of agreement between tags and low-value tags. It also attends to similar levels of work popularity. After deciding on the basic algorithm, I toyed with various “knobs” for days, myself and with Jeremy, trying to get the best results for a set of sample and problem works. In my experience, you can’t do an algorithm like this without a sense of appropriateness, taste and proportion, and (I hope) that this is one reason LibraryThing recommendations are generally so good.

Once a tag-recommendation is generated, it takes a while for it to be incorporated into the “Combo recommendations” above. “Combo recommendations” incorporate tag recommendations to greater or lesser degrees, depending on its assessment of their quality and contribution. Your personal recommendations are based mostly on these “Combo recommendations.”

The tag recommendations are going to take a while to build for all works that need them. After that, we plan to do some sort of “Pepsi Challenge” test. We think LibraryThing recommendations are as good as any out there, and are eager to prove it.

Some examples. Tag recommendations work absolutely best on non-fiction titles about something very simple and clear cut:

Works really “about” two or more things are harder, as are books with a specific point of view, which might seem separate to some extent from the tags on it. Some examples:

  • PHP and MySQL for dummies by Janet Valade — A successful example, where most of the books deal with both PHP and MySQL, with an orientation toward “entry level” programmers (eg., the “Visual Quickstart” books)
  • Born Fundamentalist, Born Again Catholic by David B. Currie — A successful example that mostly recommends other Protestant-to-Catholic conversion stories (rather than Catholic-to-Protestant ones)
  • Goldwater by Barry Goldwater — A mixed but mostly decent result, surfacing some Goldwater-specific material, but also showing other biographies, especially from the period and involving senators. Ideally it wouldn’t have quite so many contender-bios, as I’m not sure the potential Goldwater reader is eager to dig into Edward’s bio.
  • Freakonomics by Steven D. Levitt and Stephen J. Dubner — Decent list, starting with a short-shelf of popular economics-in-life books, followed by popular introductions to economics and economics-oriented thinky-think books

Fiction, especially non-”genre” fiction, is the big problem. Literary fiction isn’t “about” what it’s about in quite the same way that non-fiction is. Creating some separation between adult and youth titles is also hard. But we’ve made significant progress.

  • Watership Down by Richard Adams — A decent list of animal-centered chapter books, with classics like Redwall and The Wind in the Willows and no The Runaway Bunny or Knuffle Bunny.
  • The Book of Three by Lloyd Alexander — A decent list of magical fantasy books, angled to youth and toward Welsh mythology.
  • The Hobbit by J. R. R. Tolkien — A good list, but obviously centered quite strongly on Tolkien. Works with significant secondary literatures (cf., Harry Potter, Narnia, etc.) tend to be dominated by guides, atlases and so forth.
  • Little Women by Louisa May Alcott — Starts out well, putting March second, but then goes off the rails somewhat after a dozen. The Mother-Daughter Book Club wins because it apparently takes place in Concord, Massachusetts and involves mothers and daughters. The Secret Life of Bees is also about mothers and daughters, coming of age, sisterhood, and was made into a movie. Meh.
  • The Book Thief by Markus Zusak — Decent list, with other books, mostly fiction, about children during the Holocaust.
  • The Shack by William P. Young — I haven’t read it, but I think have a sense of it. There are some winners here, like The Christmas List, Redeeming Love. The Year of Fog and The Deep End of The Ocean cover some of the same issues from a non-religious standpoint, and Where Is God When It Hurts covers them from a non-fiction, evangelical perspective, which might or might not be wanted. But C. S. Lewis and Paul Bunyan(!) aren’t winners, being rather different sorts of fictions. Tim LaHaye is winning almost solely on being “christian fiction,” and James Redfield for “religious fiction,” “inspirational,” etc.
  • Patient Zero by Jonathan Maberry — Zombies? We got zombies, focusing on plague-based zombie terror (no Pride and Prejudice and Zombies). Straight-up plauge titles, like The White Plague are also included. Preston’s The Demon in the Freezer probably shouldn’t be there, but it’s a great book.
  • Earth Abides by George R. Stewart — More apocalypse, this time with few zombies. The Brief History of the Dead is rather different, though I’m not sure how LibraryThing could know that. At least it doesn’t attempt to recommend other boring books, like Earth Abides.
  • Blueberry Girl by Neil Gaiman — Some good stuff. Sound lousy. The tags are dragging the recommendations all around—children, picture book, Gaiman, mothers and daughters, charles vess. Twinkle, Twinkle, Little Star is winning on “poetry,” “rhymes” and child-associated tags.
  • Love in the asylum : a novel by Lisa Carey — My wife’s book. A decent-list of insane-asylum fiction, with some memoirs.
  • Twilight by Stephenie Meyer — Largely okay, so far as I can tell, although with less secondary literature than I would have guessed.
  • Illyria by Elizabeth Hand is winning on tags like “forbidden love,” “teen romance,” “teen lit,” “romantic” and “contemporary fantasy.” I have no idea if the books are similar.

Come discuss it on talk here.

Labels: recommendations, tagging, tags

One Comments:

  1. Dianna says:

    The link for the Alexander the Great recommendations goes to what I assume is the testing server (nice name).
    http://librarything.com/work/44713/recommendations#tags

Leave a Reply