Thursday, October 20th, 2005

Schedule maintenance and books you should own…

LibraryThing will go down for scheduled maintenance tonight at 1am Eastern time (6am GMT and, alas, 11pm in California). I expect it to be down for 2-3 hours.

Second, if you got this far, you’re in a very small minority of LibraryThing people—the cream of the cream, perhaps. So, here’s the scoop:

I’ve got an algorithm that tells you what books you “ought” to own. Basically, it looks at people who have similar books, and figures out what books they have that you don’t, adjusting for how close their library is to yours and for how common a given book is generally (the Harry Potter effect). If you want to look at your list, email me. NOTE: TELL ME YOUR USER NAME. I CAN’T READ MINDS!

The list is by email only for two reasons. First, it currently takes about five minutes to create, without breaking the server. (I’m taking the servers down tonight in part to speed such algorithms.) Second, I need feedback before I put the algorithm up.

It’s a lot harder to write a good library suggestion algorithm than I thought. If you like thinking about algorithms, this is an interesting one to think about.

The current algorithm has some flaws. First, it tells you about popular books you are actually avoiding. Thus, my brother isn’t a fan of Roger Zelaney, but his sci-fi heavy bookcase when matched to other sci-fi bookcases tells him he ought to own them. Second, it doesn’t think about different categories of books. Everyone’s library has more than one special section, but a “democratic” algorithm favors the largest section. So, I have a special interest in Greco-Roman divination, but it would never suggest books on that topic because my divination section is dwarfed by my other sections.

I have a number of other algorithms to look at. I’d like to test Dewey clusters (popular books in a Dewey-number range that you have a lot of books in), library suggestions that “bubble up” from book-by-book suggestions, and so forth. I’m not too interested in algorithms based on user ratings. My belief is that, in the aggregate, a library is a fair representation of a given person’s likes and dislikes. Even a “bad” book should inform the algorithm—people don’t buy books randomly.

The goal is to produce a better selection engine than Amazon has. Think big, I say.

Labels: 1


Leave a Reply