Thursday, August 7th, 2008

Kaboom!

Did you hear that? It was the sound of LibraryThing announcing A million free covers for your library or bookstore.

Won’t someone in the library world, um, blog this?

Labels: book covers, coverthing

Wednesday, August 6th, 2008

Open Shelves Classification: Update and Summary

Note: This post was created by David and Laena, but reposted by me for a stupid technical reason. (Tim)

Hello Librarythingers, librarians and classification fans, we are happy to join you as facilitators of this exciting project! To learn more about us, see Tim’s introduction. Hadrian’s library (above) seemed an appropriate illustration, as we strive to create a new system upon the building blocks of the old.

To reiterate the initial goals of the project, Open Shelves Classification (OSC) is a free, “humble,” modern, open-source, crowd-sourced replacement for the Dewey Decimal System.

It will also be:

  • Collaboratively written. The OSC itself should be written socially–slowly, with great care and testing–but socially. (This is already underway via the group Build the Open Shelves Classification and the LibraryThing Wiki.)
  • Collaboriately assigned. As each level of OSC is proposed and ratified, members will be invited to catalog LibraryThing’s books according to it. (Using LibraryThing’s fielded bibliographic wiki, Common Knowledge.)

And include:

  • Progressive development. Written “level-by-level” (DDC’s classes, divisions, etc.), in a process of discussion, schedule proposals, adoption of a tentative schedule, collaborative assignment of a large number of books, statistical testing, more discussion, revision and “solidification.” This has already begun.
  • Public-library focus. LibraryThing members are not predominantly academics, and academic collections, being larger, are less likely to change to a new system. Also, academic collections mostly use the Library of Congress System, which is already in the public domain. This is also the place and audience that has demonstrated the most need for change (see BISAC and other non-Dewey conversions already underway).
  • Statistical testing. As far as we are aware, no classification system has ever been tested statistically as it was built. Yet there are various interesting ways of doing just that. For example, it would be good to see how a proposed shelf-order matches up against other systems, like DDC, LCC, LCSH and tagging. If a statistical cluster in one of these systems ends up dispersed in OSC, why?


Where are we now? Since its inception, there has been consistent and productive discussion on the LibraryThing group Build the Open Shelves Classification, and circeus began an excellent wiki Open Shelves Classification that summarizes the current OSC consensus. The wiki is where the work will be staged as it is developed by all of us. So far, the wiki includes consensus on materials that must be included, call number requirements and proposed scheme, and the choice of top-level classes.

Where do we go from here? We feel that the most important issues to determine are:

  • Top-level classes. Findability is key. Terms need to be familiar and clear (not abstract), roughly 12-15 categories, and relavent to the public library audience and their needs. Library data would be very helpful here! (OSC is focusing more on task (what people find: history, gardening, sci-fi) versus audience (who is finding: children, women, dogs) when determining top-level terms.)
  • Alpha-numeric decisions and punctuation. TBD. A numbered system that doesn’t require equal digits is so far the most popular format (10.6.245.20). As for punctuation, the debate continues–dots or dashes?
  • Factors be determined locally or at a later stage of development. We need to be as focused and specific in our tasks as possible, and there are many decisions we will not be undertaking. (For example, Cutter numbers and possibly non-book materials.)

David and I are simply facilitators, and we need LibraryThing Members to help monitor threads and contribute valuable content. Please comment below if you want to volunteer to monitor a particular thread to make sure we do not miss anything. Also, people should continue to add content to the wiki as consensus emerges from the threads. Although theoretical discussion is fascinating, examples from your library or your personal experience are what will make the OSC usable.

We look forward to working with our fellow LibraryThing members!

Labels: Open Shelves Classification, OSC

Tuesday, August 5th, 2008

Open Shelves Classification: Welcome Laena and David

Back in July I proposed the Open Shelves Classification (OSC), a new, free, crowdsourced replacement for the Dewey Decimal System. I also created a group to start in on the project.

The proposal included a call for a volunteer to lead the group. I was happy to write the software, and members would create the OSC, but someone with a library degree was needed to shepherd the project and make the occasional tough decision.

I’ve found two: the LIS team of Laena McCarthy and David Conners. It turns out, I already knew them. Abby and I met with Laena and David, back at ALA 2007, when they were MLS students doing a joint LibraryThing-related project called Folksonomies in Action. They impressed us then. It was extraordinary to talk to librarians with a deep understanding and creative take on the ideas LibraryThing was exploring. Since then Laena and David have started promising careers as librarians and professors. So, after receiving word they were interested in the project, we are only too happy to bring them on.

Laena M. McCarthy (user: laena). Laena is currently an Assistant Professor and Image Cataloger at the Pratt Institute in NYC. Her bio contains the priceless bit:

“Previously, she worked in Antarctica as the world’s Southernmost librarian, where she provided a remote research station with access to information. She incorporated into the library the first permanent art gallery in Antarctica.”

Laena’s teaching and research focus on the application of bottom-up, usability-centric design and collaboration. She is currently researching image tagging, FRBR for works of art & architecture, and information architecture. Her work has been published in Library Journal and the forthcoming Magazines for Libraries 2008.

In her free time, among other things, she can be found making jam, competing in food competitions, scuba diving and writing.

David Conners (user: conners). David is the Digital Collections Librarian at Haverford College in Pennsylvania. At Haverford, David works to make the College’s unqiue materials, such as the first organized protest against slavery in the New World, available online. He also oversees the College’s oral history program and the audio component of Special Collections exhibits such as “A Few Well Selected Books.”

David’s research interests include subject analysis, FRBR, and, occasionally, doped ablators. His work has been published in Library Journal, The Serials Librarian, and Physics of Plasmas.

The torch is passed! From this point on, it’s their project to direct. But we’re in agreement on their role: They aren’t royalty, they’re facilitators. They’re there to listen and to encourage conversation. They’re there to guide things toward consensus. They’re there too see the project stays on track and true to its goals. They’re there to propose forking the project or moving it elsewhere, if that’s what it needs and the community wants it.
Laena and David are doing this for fun and interest. As a fun side-project with no financial component—OSC is by definition public domain in every respect—we can’t pay them. But we’ve promised to help pay their way to LIS conferences, if someone wants them to talk about it. (At least one group already does.) And there’s the hope that, if OSC can accomplish its goals, they will have helped create something highly beneficial for libraries and library patrons everywhere.

If you’re interested in the project, come join the group and find out more.

Labels: DDC, dewey decimal, Dewey Decimal Classification, open data, Open Shelves Classification, OSC

Monday, August 4th, 2008

API to Common Knowledge

In case you don’t subscribe to the main blog, there’s a development there of interest to readers of this blog: We’ve unwrapped a free public API to all our Common Knowledge data—series, fictional places, characters, author educational histories, etc.

I’d love to see some of this data appear in library catalogs. The series coverage is really quite excellent.

At one point I made a series widget for LibraryThing for Libraries–listing other members of the series–but I didn’t deploy it. There was some concern that LT’s series data would fight with the libraries’ own series data. If an LTFL library wants to use it, however, let me know.

Labels: apis, common knowledge

Sunday, August 3rd, 2008

Crowdsourcing Dr. Horrible, Jeff Atwood

Do you recognize this man?


Yes, it’s Dr. Horrible, star of Dr. Horrible’s Sing-Along Blog (played by Neil Patrick Harris, the former Doogie Howser). Dr. Horrible is quirky web-only super-hero musical comedy created by Joss Whedon (Firefly, Buffy the Vampire Slayer). Abby, Sonya and I are fans.

Anyway, I was re-watching the video and noticed two copies of Harry Potter and the Deathly Hallows on one of Dr. Horrible’s shelves. As a joke on our Legacy Libraries program, where members collaborate to catalog libraries by Jefferson*, Plath or Yeats, I suggested that members catalog Dr. Horrible’s other books.

So they, um, did. They didn’t start a catalog, but they figured them out even so. I was particularly impressed they were able to figure out the ones on the left, making a guess and then asking a member who had the book (Albert L. Lehninger’s Principles of Biochemistry) to check the spine. The guess was right.

Score one for crowdsourcing!

Jeff Atwood? Which brings me to the other Dr. Horrible, Jeff Atwood, programmer, podcaster and author of the influential blog Coding Horror.

Jeff published a great post, “Programmers Don’t Read Books — But You Should,” which included a shot of his “programming bookshelf.” They’re not just any books, but his enduring favorites. As he writes:

“The best programming books are timeless. They transcend choice of language, IDE, or platform. They do not explain how, but why. If you feel compelled to clean house on your bookshelf every five years, trust me on this, you’re buying the wrong programming books.”

With Jeff’s permission, I started him an account, and asked members to help catalog his books, using the photo he provided. Again, the were able to do it with ease.

Two ideas follow naturally from this:

  • I’d love to see LibraryThing members catalog people’s books from shelf-photos. As I wrote on the Atwood thread, I could see this being a paid service, with part of the proceeds going to charity.
  • Aren’t there sites where regular people take apart celebrity photos, identifying shoes and clothing so other people can copy them? Wouldn’t it be fun/ironic to do that for books, taking apart TV, movies and candids for the books in them? Of course, celebrities are not necessarily great readers, but people do occasionally read in movies, and some celebrities do too. For example, the word on the street is that Marilyn Monroe really was reading Ulysses.

Don’t worry. I’ve got a half-dozen bugs and important features to go through before toying with anything like this. 

Also, the freeze ray needs work.


*Not to be confused with Dr. Horrible villain Fake Jefferson.

Labels: dr. horrible, jeff atwood, legacy libraries

Wednesday, July 30th, 2008

Google goes after the Library of Congress for “mature content”

UPDATE: They relented. Woo-hoo!

LibraryThing shows Google Adsense ads on a small number of templates. The ads appear only if you’re not a member at all—paid or unpaid. They don’t make much money, but we’ve never had a problem with them.

Today I got a form letter from Google, alerting me that Google had detected “adult or mature content” on LibraryThing. They gave one example, the LibraryThing.fr page for the Library of Congress Subject Heading (LCSH) “Erotic stories.” No doubt some algorithm caught a few keywords, like “sex” or the common porn-word “Lolita” (it’s a book, guys).

Needless to say, they run ads against most of these books on Google Book Search. Our competitors, who all rely on Google Adsense for all their revenue run ads against the same books, apparently without incident (although, I suppose, one can hope!). I must therefore conclude, the problem is the Library of Congress Subject Headings, and that it’s a good thing the Sandy Berman-inspired LCSH “Strap-on Sex” hasn’t made it into LibraryThing yet!

A follow-up email triggered another form-letter, including the helpful suggestion to remove content like:

“image or video content containing lewd or provocative poses, strategically covered nudity, see-through or sheer clothing, and close-ups of breasts, butts, or crotches.”

I have accordingly been consulting with Casey on how to remove all the butt-shots from the Yale University MARC records.

I have three days to comply or be terminated. So, what do I do? Clearly I’m not getting anywhere with their response system. And LibraryThing has something like 100-millon pages. Should I start running pages against keyword lists before showing Google Ads?

That sounds like a big pain, I’ll tell you—and not worth it.

Labels: ads, google, google book search

Tuesday, July 15th, 2008

Wikimania 2008 (Alexandria, Egypt)

In other news, I’m currently on a train to New York, from which I fly to Athens, with a day-long layover, and then Alexandria, Egypt, where I am due to talk at Wikimania 2008, the annual Wikipedia/Wikimedia conference. I’m talking on “LibraryThing and Social Cataloging.”

I plan to center my talk on how LibraryThing’s social production, or “Social Cataloging,” stacks up against the Wikipedia model and similar projects. I think there are some interesting similarities, and more interesting departures. I shall post a screencast, at a minimum.

Anyone know these people? I am particularly eager to mingle with the other attendees and speakers. Apart from Brewster Kahl (Internet Archive), I hardly see a name I recognize. But I’m sure there will be some interesting conversations.

When it comes to Wikipedia, I’m no expert. My account lists some 746 edits since 2004, which probably puts me in the top percent, but my output is spotty, and I have never been obsessed with the site as some have.

Things not to say around Jimmy Wales. Worse, I am not a true believer. Of course, I think Wikipedia is extraordinary. I use it every day. When it’s works, like most pop culture, it’s an unmatched resource. But from working mostly on topics of Greek history, I have acquired a sour perspective on Wikipedia’s ability to resolve conflicts, tamp down ignorance, and cover topics which, quite simply, require more than curiosity and popular secondary sources.

Alexander the Great, for example, has seen periodic, bitter warfare on national or sexual grounds and, although randomly wonderful, with extensive hyperlinking and some exceptional tidbits, has never grown into a decent summary. It’s lumpy, unbalanced, poorly written and poorly sourced—a bright fourteen year-old child sitting next to you on a bus, telling you everything he knows.* Parts are good. Parts are bad. Parts are just off somehow—their correction requiring un-Wikipedia-esque virtues like restraint, proportionality and style. At one point I watched it closely and made substantial edits. I’ve moved on. In my opinion, if the Wiki culture and process were going to produce a good article on Alexander, they would have done so already.

If that’s too pessimistic, it’s surely true of bit players like Ada of Caria, Aristander of Telmessus or a work like the Geoponica? I think all three are passable now, but almost all the work is mine. Not only am I not scalable, but it shouldn’t work that way. Tim Spalding, a PhD drop-out whose knowledge of the Geoponica is mostly second hand, even if he does read Greek, should not be the almost sole author of the article on this rather important work.**

Anyone know Alexandria? I should have no trouble filling my layover in Athens. I’ve been a few times before, so I’ll be filling holes. But I’ve never been to Egypt.

I’ll have early mornings, nights and one day free in Alexandria. (I’m not going to try to get to Cairo and the pyramids.) I want to make the most of the time I have, and feel extremely ignorant. Although Hellenistic Alexandria was a research interest of mine, the ancient city is largely gone, and I know little about what came after. I love Cavafy, so I shall probably check out his house museum, but I am completely ignorant about Durrell, the usual touchstone. Nor is Alexandria what it was in their day–the Greeks, Jews, Albanians and other minorities have mostly left. What the modern city is like, I have no idea. I can’t count to ten in Arabic. I don’t even have a guidebook. This is the new, non-obsessive tourist me. ..

If you know the city, leave comments. Tell me where to go and I’ll tell you what I thought of it! Think of it as social production of tourist memories…


*My favorite Wikipedia criticism is surely Karen Schneider’s, best expressed with reference to Orson Scott Card’s page: 
“But if you read this blog you know I have written that Wikipedia often seems more like a Secret Treehouse Club than everyone’s encyclopedia. Card’s Wikipedia page isn’t a biography, it’s an encomium by true believers who maintain fierce control over Card’s myth.”

Labels: Alexandria Egypt, Social Cataloging, Wikimania 2008, Wikimania2008

Tuesday, July 8th, 2008

Build the Open Shelves Classification

This mural is said to depict Dewey and the railroad service he gave to Lake Placid, FL. It’s time to throw Dewey under the train.

I hereby invite you to help build the Open Shelves Classification (OSC), a free, “humble,” modern, open-source, crowd-sourced replacement for the Dewey Decimal System.

I’ve been speaking of doing something like this for a while, but I think it’s finally going to become a reality. LibraryThing members are into it and after my ALA panel talk, a number of catalogers expressed interest too. Best of all, one library director has signed on as eager to implement the system, when it comes available. Hey, one’s a start!

The Call. I am looking for one-to-five librarians willing to take leadership on the project. LibraryThing is willing to write the (fairly minimal) code necessary, but not to lead it.

As leaders, you will be “in charge” of the project only as a facilitator and executor of a consensus. Like Wikipedia’s Jimmy Wales, your influence will depend on listening to others and exercising minimal direct power.

For a smart, newly-minted librarian, this could be a big opportunity. You won’t be paid anything, but, hey, there’s probably a paper or two in it, right?

Why it’s necessary. The Dewey Decimal System® was great for its time, but it’s outlived that. Libraries today should not be constrained by the mental models of the 1870s, doomed to tinker with an increasingly irrelevant system. Nor should they be forced into a proprietary system—copyrighted, trademarked and licensed by a single entity—expensive to adopt and encumbered by restrictions on publishing detailed schedules or coordinating necessary changes.

In recent years, a number of efforts have been made to discard Dewey in favor of other systems, such as BISAC, the “bookstore system.” But none have proved good enough for widespread adoption, and license issues remain.

The vision. The Open Shelves Classification should be:

  • Free. Free both to use and to change, with all schedules and assignments in the public domain and easily accessible in bulk format. Nothing other than common consent will keep the project at LibraryThing. Indeed, success may well entail it leaving the site entirely.
  • Modern. The OSC should map to current mental models–knowing these will eventually change, but learning from the ways other systems have and haven’t grown, and hoping to remain useful for some decades, at least.
  • Humble. No system–and least of all a one-dimensional shelf order–can get at “reality.” The goal should be to create a something limited and humble–a “pretty good” system, a “mostly obvious” system, even a “better than the rest” system–that allows library patrons to browse a collection physically and with enjoyment.
  • Collaboratively written. The OSC itself should be written socially–slowly, with great care and testing–but socially. (I imagine doing this on the LibraryThing Wiki.)
  • Collaboriately assigned. As each level of OSC is proposed and ratified, members will be invited to catalog LibraryThing’s books according to it. (I imagine using LibraryThing’s fielded bibliographic wiki, Common Knowledge.)

I also favor:

  • Progressive development. I see members writing it “level-by-level” (DDC’s classes, divisions, etc.), in a process of discussion, schedule proposals, adoption of a tenative schedule, collaborative assignemnt of a large number of books, statistical testing, more discussion, revision and “solidification.” 
  • Public-library focus. LibraryThing members are not predominantly academics, and academic collections, being larger, are less likely to change to a new system. Also, academic collections mostly use the Library of Congress System, which is already in the public domain.
  • Statistical testing. To my knowledge, no classification system has ever been tested statistically as it was built. Yet there are various interesting ways of doing just that. For example, it would be good to see how a proposed shelf-order matches up against other systems, like DDC, LCC, LCSH and tagging. If a statistical cluster in one of these systems ends up dispersed in OSC, why? 

I have started a LibraryThing Group, “Build the Open Shelves Classication.” Members are invited to join, and to start working through the basic decisions.

Labels: DDC, Dewey Decimal Classification, Open Shelves Classification, OSC

Monday, July 7th, 2008

LibraryThing JSON-based books API

Over on the main blog I posted news about the new LibraryThing JSON-based books API (see here). The new API, which supplements our works API, comes with a small library of functions to manipulate it–all open source.

The API should be of interest to the libraries, as there are a couple of cool things they can do with the API. For example, with a few tweaks, it should be possible for libraries that use LibraryThing to showcase new or selected titles—a very popular thing—to create a widget that links into their OPAC, not to Amazon or whomever.

I’ll probably write some basic functions to change linking along these lines, if someone doesn’t do it for me first…

Labels: apis, JSON

Thursday, July 3rd, 2008

Future of Cataloging

Part one. Part two is here.

On Sunday I participated in the ALA panel Creating the Future of the Catalog and Cataloging. My panel-mates were Diane Hillmann, Jennifer Bowen, Roy Tennant and Martha Yee. Robert Wolven moderated.

The whole panel was four hours long, with brief presentations by each of us and a lot of conversation. I recorded almost all of it, but the quality is very poor and I’d need everyone’s permission—including the questioners—to put it up. I can, however, put up my presentation. I had do re-record the screencasting part, which therefore isn’t click-perfect.

The second part is here: http://youtube.com/watch?v=hD2plk4vT3Y&feature=related.

Reading the Book. As usual, I neglected to underline just what all my evidence demonstrated, expecting the evidence to speak for itself. Thus my point in mentioning my wife’s book’s wrong LCSH’s was to point out that, while expert training is certainly valuable, the untrained taggers on LibraryThing often exceed the trained expert in having actually read the book. I should add that I say this to emphasize one way in which tagging is good, not to attack catalogers who have insisted, quite rightly, that they don’t have time to read the book, and aren’t being lazy or slapdash.

As you can imagine, this observation of mine has got me into some hot water. But I think it deserves saying, particularly as, despite all the discussions of cataloging vs. tagging out there, I have never seen this point mentioned.

To press my luck a bit, I’d also like to note that it sets the professional classification-vs.-tagging argument apart from similar arguments in related fields, e.g., real journalists vs. citizen journalists, real dentists vs. your dad with some string and a doorknob, etc.

But there’s an easy retort here too. Once cataloging is fully distributed—with librarians around the country able to take part—we can certainly imagine a future where, in addition to everyone else, at least one qualified, degreed library professional has also read the book and classified it. Wouldn’t that be the best of both worlds?

If I get some time—in short supply after letting emails pile up for a week!—I’ll blog about the panel in general. Despite its topic and length, it was very well attended—the police actually removed people from the room for overcrowding! And it spurred a lot of people to come by the LibraryThing booth to congratulate me or take me up on some point or another.

Incidentally, I forgot to name Jeremy Dibbell, who heads up Legacy Libraries now, and I referred to him as an archivist, not a librarian. I do my talks ad lib and make such mistakes. Mea Culpa!

Update: Diane Hillmann posted here slides here.

Labels: ala 2008, ala2008, future of cataloging