Archive for the ‘Dewey Decimal Classification’ Category

Sunday, December 21st, 2008

uClassify library mashup? (with prize!)

I keep up with the Museum of Modern Betas* and today it found something wonderful: uClassify.

uClassify is a place where you can build, train and use automatic classification systems. It’s free, and can be handled either on the website or via an API. Of course, this sort of thing was possible before uClassify, but you needed specialized tools. Now anyone can do it—on a whim.

Their examples are geared toward the simple:

  • Text language. What language is some text in?
  • Gender. Did or a man or a woman write the blog? It was made for genderanalyzer.com (It’s right only 63% of the time.)
  • Mood.
  • What classical author your text is most alike? Used on oFaust.com (this blog is Edgar Allen Poe).

Where did I lose the librarians—mood? But wait, come back! The language classifier works very well. It managed to suss-out Norwegian, Swedish and Dutch reviews of the Hobbit.** So what if the others are trivial? The idea is solid. Create a classification. Feed it data and the right answer. Watch it get better and better.

Now, I’m a skeptic of automatic classification in the library world. There’s a big difference between spam/not-spam and, say, giving a book Library of Congress Subject Headings. But it’s worth testing. And, even if “real” classification is not amenable to automatic processes, there must be other interesting book- and library-related projects.

The Prize! So, LibraryThing calls on the book and library worlds to create something cool with uClassify by February 1, 2009 and post it here. The winner gets Toby Segaran’s Programming Collective Intelligence and a $100 gift certificate to Amazon or IndieBound. You can do it by hand or programmatically. If you use a lot of LibraryThing data, and it’s not one of the sets we release openly, shoot me an email about what you’re doing and I’ll give you green light.

Some ideas. My idea list…

  • Fiction vs. Non-Fiction. Feed it Amazon data, Common Knowledge or LT tags.***
  • DDC. Train it with Amazon’s DDC numbers and book descriptions. Do ten thousand books and see how well it’s guessing the rest.
  • Do a crosswalk, eg., DDC to LCC, BISAC to DDC, DDC to Cutter, etc.

Merry data-driven Christmas!


*A website that tracks new “betas.” Basically, it tracks new web 2.0 apps. It also keeps tab of their popularity, according to Delicious bookmarks. LibraryThing is now number 12, beating out Gmail. Life isn’t fair.
**Yes, we’re going to get it going for reviews on the site itself. Give us some time. Cool as it is, we’re pretty busy right now. Note: You can’t give it the URL alone. You have to give it the text of the review.
***We may do this with tags. We already do it very crudely, using it only for book recommendations.

Labels: Dewey Decimal Classification, open shelves classification, uclassify

Tuesday, August 5th, 2008

Open Shelves Classification: Welcome Laena and David

Back in July I proposed the Open Shelves Classification (OSC), a new, free, crowdsourced replacement for the Dewey Decimal System. I also created a group to start in on the project.

The proposal included a call for a volunteer to lead the group. I was happy to write the software, and members would create the OSC, but someone with a library degree was needed to shepherd the project and make the occasional tough decision.

I’ve found two: the LIS team of Laena McCarthy and David Conners. It turns out, I already knew them. Abby and I met with Laena and David, back at ALA 2007, when they were MLS students doing a joint LibraryThing-related project called Folksonomies in Action. They impressed us then. It was extraordinary to talk to librarians with a deep understanding and creative take on the ideas LibraryThing was exploring. Since then Laena and David have started promising careers as librarians and professors. So, after receiving word they were interested in the project, we are only too happy to bring them on.

Laena M. McCarthy (user: laena). Laena is currently an Assistant Professor and Image Cataloger at the Pratt Institute in NYC. Her bio contains the priceless bit:

“Previously, she worked in Antarctica as the world’s Southernmost librarian, where she provided a remote research station with access to information. She incorporated into the library the first permanent art gallery in Antarctica.”

Laena’s teaching and research focus on the application of bottom-up, usability-centric design and collaboration. She is currently researching image tagging, FRBR for works of art & architecture, and information architecture. Her work has been published in Library Journal and the forthcoming Magazines for Libraries 2008.

In her free time, among other things, she can be found making jam, competing in food competitions, scuba diving and writing.

David Conners (user: conners). David is the Digital Collections Librarian at Haverford College in Pennsylvania. At Haverford, David works to make the College’s unqiue materials, such as the first organized protest against slavery in the New World, available online. He also oversees the College’s oral history program and the audio component of Special Collections exhibits such as “A Few Well Selected Books.”

David’s research interests include subject analysis, FRBR, and, occasionally, doped ablators. His work has been published in Library Journal, The Serials Librarian, and Physics of Plasmas.

The torch is passed! From this point on, it’s their project to direct. But we’re in agreement on their role: They aren’t royalty, they’re facilitators. They’re there to listen and to encourage conversation. They’re there to guide things toward consensus. They’re there too see the project stays on track and true to its goals. They’re there to propose forking the project or moving it elsewhere, if that’s what it needs and the community wants it.
Laena and David are doing this for fun and interest. As a fun side-project with no financial component—OSC is by definition public domain in every respect—we can’t pay them. But we’ve promised to help pay their way to LIS conferences, if someone wants them to talk about it. (At least one group already does.) And there’s the hope that, if OSC can accomplish its goals, they will have helped create something highly beneficial for libraries and library patrons everywhere.

If you’re interested in the project, come join the group and find out more.

Labels: Dewey Decimal Classification, open data, open shelves classification, osc

Tuesday, July 8th, 2008

Build the Open Shelves Classification

This mural is said to depict Dewey and the railroad service he gave to Lake Placid, FL. It’s time to throw Dewey under the train.

I hereby invite you to help build the Open Shelves Classification (OSC), a free, “humble,” modern, open-source, crowd-sourced replacement for the Dewey Decimal System.

I’ve been speaking of doing something like this for a while, but I think it’s finally going to become a reality. LibraryThing members are into it and after my ALA panel talk, a number of catalogers expressed interest too. Best of all, one library director has signed on as eager to implement the system, when it comes available. Hey, one’s a start!

The Call. I am looking for one-to-five librarians willing to take leadership on the project. LibraryThing is willing to write the (fairly minimal) code necessary, but not to lead it.

As leaders, you will be “in charge” of the project only as a facilitator and executor of a consensus. Like Wikipedia’s Jimmy Wales, your influence will depend on listening to others and exercising minimal direct power.

For a smart, newly-minted librarian, this could be a big opportunity. You won’t be paid anything, but, hey, there’s probably a paper or two in it, right?

Why it’s necessary. The Dewey Decimal System® was great for its time, but it’s outlived that. Libraries today should not be constrained by the mental models of the 1870s, doomed to tinker with an increasingly irrelevant system. Nor should they be forced into a proprietary system—copyrighted, trademarked and licensed by a single entity—expensive to adopt and encumbered by restrictions on publishing detailed schedules or coordinating necessary changes.

In recent years, a number of efforts have been made to discard Dewey in favor of other systems, such as BISAC, the “bookstore system.” But none have proved good enough for widespread adoption, and license issues remain.

The vision. The Open Shelves Classification should be:

  • Free. Free both to use and to change, with all schedules and assignments in the public domain and easily accessible in bulk format. Nothing other than common consent will keep the project at LibraryThing. Indeed, success may well entail it leaving the site entirely.
  • Modern. The OSC should map to current mental models–knowing these will eventually change, but learning from the ways other systems have and haven’t grown, and hoping to remain useful for some decades, at least.
  • Humble. No system–and least of all a one-dimensional shelf order–can get at “reality.” The goal should be to create a something limited and humble–a “pretty good” system, a “mostly obvious” system, even a “better than the rest” system–that allows library patrons to browse a collection physically and with enjoyment.
  • Collaboratively written. The OSC itself should be written socially–slowly, with great care and testing–but socially. (I imagine doing this on the LibraryThing Wiki.)
  • Collaboriately assigned. As each level of OSC is proposed and ratified, members will be invited to catalog LibraryThing’s books according to it. (I imagine using LibraryThing’s fielded bibliographic wiki, Common Knowledge.)

I also favor:

  • Progressive development. I see members writing it “level-by-level” (DDC’s classes, divisions, etc.), in a process of discussion, schedule proposals, adoption of a tenative schedule, collaborative assignemnt of a large number of books, statistical testing, more discussion, revision and “solidification.” 
  • Public-library focus. LibraryThing members are not predominantly academics, and academic collections, being larger, are less likely to change to a new system. Also, academic collections mostly use the Library of Congress System, which is already in the public domain.
  • Statistical testing. To my knowledge, no classification system has ever been tested statistically as it was built. Yet there are various interesting ways of doing just that. For example, it would be good to see how a proposed shelf-order matches up against other systems, like DDC, LCC, LCSH and tagging. If a statistical cluster in one of these systems ends up dispersed in OSC, why? 

I have started a LibraryThing Group, “Build the Open Shelves Classication.” Members are invited to join, and to start working through the basic decisions.

Labels: Dewey Decimal Classification, open shelves classification, osc

Saturday, June 14th, 2008

OCLC’s non-profit status

The New York Times ran an interesting story on non-profits that act like businesses. Apparently a number of states are taking a hard look at charities that “give nothing away,” or have amassed vast wealth. A lot of day-care centers are worried, as is Harvard, where the endowment tops the GDP of more than 100 counties.*

Of course, my mind went to OCLC, the Dublin, Ohio-based global library-data organization.

OCLC’s core business involves maintaining a central database of cataloging records, largely created by others, which member libraries pay to access. That OCLC was a great invention can hardly be denied. Personally, I think it has become a relic and an danger to the future of libraries. Agree with me on this or not, there’s no question it is highly profitable—driving a steady stream of acquisitions—and in its fee structure calls into question the core idea of the non-profit.

So, why hasn’t someone take away OCLC’s non-profit status?

I Googled it up, and discovered that someone DID! In 1984 Ohio state courts stripped OCLC of it’s charitable status on those very grounds:

“(A)lthough OCLC’s service may greatly enhance the ability of libraries to better serve the public, OCLC essentially offers a product to charitable institutions, for a fee exceeding its cost, and, as the board concluded, is not itself a charitable organization.”

So, what happened?

It seems the Ohio legislature passed some sort of private bill removing Ohio organizations involved in “library technology development” (and starting with the letter “O”?) from the court’s requirements. Well, I guess that’ll do it.

UPDATE: I’m working up a presentation on why OCLC’s (also unfree) Dewey Decimal System needs to be killed-off, and what distributed, open classification could replace it. I’m all ears for anti-Dewey examples. And if any bright young cataloger with no love of Dewey wants to talk to me about heading up the effort, I’d love to hear from you.


*$35 billion, doing a quick check against Wikipedia. Of course, GDP is wiggly as heck.

Labels: Dewey Decimal Classification, oclc, tax exemption