Sunday, December 21st, 2008

uClassify library mashup? (with prize!)

I keep up with the Museum of Modern Betas* and today it found something wonderful: uClassify.

uClassify is a place where you can build, train and use automatic classification systems. It’s free, and can be handled either on the website or via an API. Of course, this sort of thing was possible before uClassify, but you needed specialized tools. Now anyone can do it—on a whim.

Their examples are geared toward the simple:

  • Text language. What language is some text in?
  • Gender. Did or a man or a woman write the blog? It was made for genderanalyzer.com (It’s right only 63% of the time.)
  • Mood.
  • What classical author your text is most alike? Used on oFaust.com (this blog is Edgar Allen Poe).

Where did I lose the librarians—mood? But wait, come back! The language classifier works very well. It managed to suss-out Norwegian, Swedish and Dutch reviews of the Hobbit.** So what if the others are trivial? The idea is solid. Create a classification. Feed it data and the right answer. Watch it get better and better.

Now, I’m a skeptic of automatic classification in the library world. There’s a big difference between spam/not-spam and, say, giving a book Library of Congress Subject Headings. But it’s worth testing. And, even if “real” classification is not amenable to automatic processes, there must be other interesting book- and library-related projects.

The Prize! So, LibraryThing calls on the book and library worlds to create something cool with uClassify by February 1, 2009 and post it here. The winner gets Toby Segaran’s Programming Collective Intelligence and a $100 gift certificate to Amazon or IndieBound. You can do it by hand or programmatically. If you use a lot of LibraryThing data, and it’s not one of the sets we release openly, shoot me an email about what you’re doing and I’ll give you green light.

Some ideas. My idea list…

  • Fiction vs. Non-Fiction. Feed it Amazon data, Common Knowledge or LT tags.***
  • DDC. Train it with Amazon’s DDC numbers and book descriptions. Do ten thousand books and see how well it’s guessing the rest.
  • Do a crosswalk, eg., DDC to LCC, BISAC to DDC, DDC to Cutter, etc.

Merry data-driven Christmas!


*A website that tracks new “betas.” Basically, it tracks new web 2.0 apps. It also keeps tab of their popularity, according to Delicious bookmarks. LibraryThing is now number 12, beating out Gmail. Life isn’t fair.
**Yes, we’re going to get it going for reviews on the site itself. Give us some time. Cool as it is, we’re pretty busy right now. Note: You can’t give it the URL alone. You have to give it the text of the review.
***We may do this with tags. We already do it very crudely, using it only for book recommendations.

Labels: Dewey Decimal Classification, Open Shelves Classification, uclassify

0 Comments:

Leave a Reply

WP-SpamFree by Pole Position Marketing