Archive for the ‘uclassify’ Category

Monday, February 23rd, 2009

uClassify contest winner

After some delay, I can announce that the LibraryThing/uClassify contest has been won by Kelly Vista—the only entrant, but a worthy one. (Kelly gets a copy of Programming Collective Intelligence and $100 from Amazon or IndieBound.) She described her “LibraryThing classifier” as follows:

“My goal was to create a classifier that would automatically “tag” any book description based on actual LibraryThing tags. For example, if you paste the book description for “Truman” into UClassify, it should return to you LibraryThing tags that suit the book. This is one step more general than one of [Tim’s] ideas (fiction vs. non-fiction).”

In my testing, it does a pretty good job of hitting the top tags. Pasted descrptions of Harry Potter give “young adult” and “children’s.” John Adams gives “american history” and “biography.” It’s not perfect—Adams is also labelled “young adult”—but the initial results are good and the whole point of uClassify is to enable accelerating accuracy.

uClassify seems to be growing apace. They recently opened up public classifications for external access, so I’ll be looking into automatic text-language classification of LibraryThing reviews.

Labels: kelly vista, uclassify

Sunday, December 21st, 2008

uClassify library mashup? (with prize!)

I keep up with the Museum of Modern Betas* and today it found something wonderful: uClassify.

uClassify is a place where you can build, train and use automatic classification systems. It’s free, and can be handled either on the website or via an API. Of course, this sort of thing was possible before uClassify, but you needed specialized tools. Now anyone can do it—on a whim.

Their examples are geared toward the simple:

  • Text language. What language is some text in?
  • Gender. Did or a man or a woman write the blog? It was made for genderanalyzer.com (It’s right only 63% of the time.)
  • Mood.
  • What classical author your text is most alike? Used on oFaust.com (this blog is Edgar Allen Poe).

Where did I lose the librarians—mood? But wait, come back! The language classifier works very well. It managed to suss-out Norwegian, Swedish and Dutch reviews of the Hobbit.** So what if the others are trivial? The idea is solid. Create a classification. Feed it data and the right answer. Watch it get better and better.

Now, I’m a skeptic of automatic classification in the library world. There’s a big difference between spam/not-spam and, say, giving a book Library of Congress Subject Headings. But it’s worth testing. And, even if “real” classification is not amenable to automatic processes, there must be other interesting book- and library-related projects.

The Prize! So, LibraryThing calls on the book and library worlds to create something cool with uClassify by February 1, 2009 and post it here. The winner gets Toby Segaran’s Programming Collective Intelligence and a $100 gift certificate to Amazon or IndieBound. You can do it by hand or programmatically. If you use a lot of LibraryThing data, and it’s not one of the sets we release openly, shoot me an email about what you’re doing and I’ll give you green light.

Some ideas. My idea list…

  • Fiction vs. Non-Fiction. Feed it Amazon data, Common Knowledge or LT tags.***
  • DDC. Train it with Amazon’s DDC numbers and book descriptions. Do ten thousand books and see how well it’s guessing the rest.
  • Do a crosswalk, eg., DDC to LCC, BISAC to DDC, DDC to Cutter, etc.

Merry data-driven Christmas!


*A website that tracks new “betas.” Basically, it tracks new web 2.0 apps. It also keeps tab of their popularity, according to Delicious bookmarks. LibraryThing is now number 12, beating out Gmail. Life isn’t fair.
**Yes, we’re going to get it going for reviews on the site itself. Give us some time. Cool as it is, we’re pretty busy right now. Note: You can’t give it the URL alone. You have to give it the text of the review.
***We may do this with tags. We already do it very crudely, using it only for book recommendations.

Labels: Dewey Decimal Classification, open shelves classification, uclassify