Archive for May, 2007

Saturday, May 5th, 2007

Conversation = Excellence

LibraryThing has always depended on members to set development goals and refine (or ditch) features. But it’s amazing how well it’s worked with the new “affinities”* feature. We simply could not have anticipated how members would shape our thinking. (I will never ever develop another project in a small, closed group, with occasional trips to watch a “focus group” from behind smoked glass.) We’re still watching reactions on the blog, and on a now-130+ Talk topic, but we have some good ideas. When Altay returns from Boston, we’ll hammer out changes, including customization of the look, and the ability to turn it off.

I started another thread I want to highlight, about LibraryThing’s strategy and a hiring decision for the non-English LibraryThings. Do we hire someone, and what can they do? I hoping the thread gets some traction, at least among the users of our dozen-plus non-English sites. We need a non-English plan.

Part of the problem is technical, starting with better character support. But there’s a feedback loop. Right now, the non-English sites can’t be the coding priority because they’re not contributing as much to our growth, or to our finances. (Not that they’re small. Our non-English sites appear to have more action than our largest English-language competitor.) If we hired someone—and had something for that person to do—we’d have a stronger incentive to work on it.

*We called them “affinity percentiles,” but it got chipped down nicely by SilentInaWay. Case in point.

Labels: conversation, features, non-English

Friday, May 4th, 2007

Affinity percentiles and Altay

Altay (middle), John (sweatshirt), Tim (right), Abby (encased in her spherical “soul cage”)

We’re introducing an important new feature, but only just. The feature is called “affinity percentiles.” Basically, we show numbers next to other user’s names. These represent how “similar” your libraries is to theirs.

We’ve started it off on just one area of the site, the message pages in Talk (example). We plan to roll it out across the site, but not until we get a lot of feedback. I have a feeling some members will love it, but some won’t. This isn’t something we want to do lightly.

The number needs some explaining. (It may be too subtle, and we should fall back to a more straightforward “books shared.”) Basically, the higher the better. The person who shares the most books with you will have a 99%; the person who shares the least gets a 1%.

The percentage isn’t the number shared—65% does not mean a user shares 65% of their books; it means that the user shares more books than 65% of users. Two other factors come into play:

  • a member has to share five books to get an affinity percentile
  • “sharing” is weighed by book obscurity and library size. A user with 100 books, who shares 20 obscure books with you ranks much higher than a user with 10,000 books who shares some very popular novels.

Other features:

  • If you hover over the percentile, you’ll get the shared books. We’ve thought of having it actually show the books.
  • The percentile box is colored in line with the number—the hotter the higher.

Some questions:

  • Are the percentiles too hard to understand; would shared numbers be better
  • Is the weighting confusing?
  • What should happen when you hover over it? When you click on it?
  • Where should it go? Where shouldn’t it go?

How? I’ve wanted to do something like this for months. It’s a surprisingly difficult technical problem. You can’t calculate it on the fly every time, that would be insane. But caching the data gets big quick. Imagine a “Battleship” grid of users—190,000 by 190,000. If you stored a single byte for each connection–the number of shared books–it would amount to at least 16 terabytes of data (190,000 squared/2). The solution I came up with involves efficient short-term caching, and ignoring members with fewer than five shared books. We’ve actually been running it on the Talk pages since last night, waiting to make it visible until we knew it wouldn’t melt our servers. (So far no melt!)

You’ll notice the numbers aren’t there when you first hit the page. They come in a second or two later. This is “Ajax” at work, and was done to prevent the new feature from slowing Talk down.

The real benefits will come when the feature is distributed across the site. I’m particularly interested in seeing affinity percentages on reviews, and sorting by them. Ultimately, I don’t care what 300 people think about the Da Vinci Code. I want to know what Tim-ish people think of it.

Why?
The crux of the idea is to highlight what makes LibraryThing social system work, so-called “social cataloging.” Vanilla social networking is structured around “friends.” That’s a powerful idea, but it has limits. It can be too “binary”; and the dynamics of “friending” a stranger miss many of us. At its best, social cataloging gets at something more nuanced. If I share 50 books about ancient history with you, there’s a degree, a nuance and a semantics to the connection that opens up a world of possibilities. Some are social and some aren’t. I might want to chat with you about the books we’ve read, or I might not. Either way, I benefit. The rest of your library is probably interesting to me. And your opinions have a claim on my attention no anonymous guy on Amazon gets.

This post also introduces Altay Guvench (username: Altay), who did the Javascript work behind affinity percentiles. This was actually a toss-off, but Altay was the force behind the much more amazing Javascript in LibraryThing for Libraries. That stuff is a work of art—Javascript inserting Javascript. It might actually be self aware! Altay will be working on the site generally, with a tilt toward things that JavaScript can improve, like the widgets.

Altay in a nutshell: Portland native. Harvard undergrad. Bassist for the alt-country band Great Unknowns (toured with the Indigo Girls! Reviewed ecstatically. Listen to a free song!). Co-founder of Y-Combinator-funded startup AudioBeta. One of only three members on LibraryThing with Optical holography : principles, techniques, and applications. Scheme hacker. Nerd, but a nerd who rocks out.

Labels: affinity percentiles, altay, features, soul cages

Thursday, May 3rd, 2007

Combined blog feed available

I used Yahoo Pipes to make a combined feed for this blog and our Thingology blog. It was easy to do, and the result is pretty useful. The three feeds are as follows:

I also edited the employee list on the right, to add Altay. He is the magic behind the LibraryThing for Libraries Javascript, but almost nobody’s seen that yet, so we’re waiting for his first user feature to give him a proper introduction.

Labels: 1

Wednesday, May 2nd, 2007

Many more Wikipedia citations

You’ll notice many more Wikipedia links from work pages. The total has increased by about 200%, and the coverage by at least that.

This improves what I did in February. That worked by looking for ISBN patterns. Of course, not all books cited in Wikipedia have ISBNs. And even when there is one, many Wikipedia contributors omit it. (As far as I’m concerned, ISBNs look chintzy in a bibliography anyway.)

I’ve redone it, this time also looking for telltale title/author patterns, and running the matches against LibraryThing’s vast and usefully messy dataset. The logic is somewhat fuzzy and therefore imperfect. But I haven’t noticed any problems.

The number of citations expanded a lot.* Some entries exploded. Take Thomas Kuhn’s The Structure of Scientific Revolutions:

Notably, it caught casual references to books, not just structured ones. For example, the article on Science wars mentions Kuhn’s work in running prose, not in the bibliography or footnotes.

I haven’t updated our free Wikipedia citation feed. That maps articles to ISBNs, but the new data is work-based. If anyone wants to use the new data, let me know and I’ll tackle the problem. Cool as I think it would be, I haven’t seen any libraries adding Wikipedia links to their catalogs yet.

*The fact that its a new feed, and the somewhat fluid interactions between ISBN-based and work-based matching make it tricky to estimate, but it looks like a 200% increase.

Labels: 1