Wednesday, July 30th, 2008

Google goes after the Library of Congress for “mature content”

UPDATE: They relented. Woo-hoo!

LibraryThing shows Google Adsense ads on a small number of templates. The ads appear only if you’re not a member at all—paid or unpaid. They don’t make much money, but we’ve never had a problem with them.

Today I got a form letter from Google, alerting me that Google had detected “adult or mature content” on LibraryThing. They gave one example, the page for the Library of Congress Subject Heading (LCSH) “Erotic stories.” No doubt some algorithm caught a few keywords, like “sex” or the common porn-word “Lolita” (it’s a book, guys).

Needless to say, they run ads against most of these books on Google Book Search. Our competitors, who all rely on Google Adsense for all their revenue run ads against the same books, apparently without incident (although, I suppose, one can hope!). I must therefore conclude, the problem is the Library of Congress Subject Headings, and that it’s a good thing the Sandy Berman-inspired LCSH “Strap-on Sex” hasn’t made it into LibraryThing yet!

A follow-up email triggered another form-letter, including the helpful suggestion to remove content like:

“image or video content containing lewd or provocative poses, strategically covered nudity, see-through or sheer clothing, and close-ups of breasts, butts, or crotches.”

I have accordingly been consulting with Casey on how to remove all the butt-shots from the Yale University MARC records.

I have three days to comply or be terminated. So, what do I do? Clearly I’m not getting anywhere with their response system. And LibraryThing has something like 100-millon pages. Should I start running pages against keyword lists before showing Google Ads?

That sounds like a big pain, I’ll tell you—and not worth it.

Tuesday, July 15th, 2008

Wikimania 2008 (Alexandria, Egypt)

In other news, I’m currently on a train to New York, from which I fly to Athens, with a day-long layover, and then Alexandria, Egypt, where I am due to talk at Wikimania 2008, the annual Wikipedia/Wikimedia conference. I’m talking on “LibraryThing and Social Cataloging.”

I plan to center my talk on how LibraryThing’s social production, or “Social Cataloging,” stacks up against the Wikipedia model and similar projects. I think there are some interesting similarities, and more interesting departures. I shall post a screencast, at a minimum.

Anyone know these people? I am particularly eager to mingle with the other attendees and speakers. Apart from Brewster Kahl (Internet Archive), I hardly see a name I recognize. But I’m sure there will be some interesting conversations.

When it comes to Wikipedia, I’m no expert. My account lists some 746 edits since 2004, which probably puts me in the top percent, but my output is spotty, and I have never been obsessed with the site as some have.

Things not to say around Jimmy Wales. Worse, I am not a true believer. Of course, I think Wikipedia is extraordinary. I use it every day. When it’s works, like most pop culture, it’s an unmatched resource. But from working mostly on topics of Greek history, I have acquired a sour perspective on Wikipedia’s ability to resolve conflicts, tamp down ignorance, and cover topics which, quite simply, require more than curiosity and popular secondary sources.

Alexander the Great, for example, has seen periodic, bitter warfare on national or sexual grounds and, although randomly wonderful, with extensive hyperlinking and some exceptional tidbits, has never grown into a decent summary. It’s lumpy, unbalanced, poorly written and poorly sourced—a bright fourteen year-old child sitting next to you on a bus, telling you everything he knows.* Parts are good. Parts are bad. Parts are just off somehow—their correction requiring un-Wikipedia-esque virtues like restraint, proportionality and style. At one point I watched it closely and made substantial edits. I’ve moved on. In my opinion, if the Wiki culture and process were going to produce a good article on Alexander, they would have done so already.

If that’s too pessimistic, it’s surely true of bit players like Ada of Caria, Aristander of Telmessus or a work like the Geoponica? I think all three are passable now, but almost all the work is mine. Not only am I not scalable, but it shouldn’t work that way. Tim Spalding, a PhD drop-out whose knowledge of the Geoponica is mostly second hand, even if he does read Greek, should not be the almost sole author of the article on this rather important work.**

Anyone know Alexandria? I should have no trouble filling my layover in Athens. I’ve been a few times before, so I’ll be filling holes. But I’ve never been to Egypt.

I’ll have early mornings, nights and one day free in Alexandria. (I’m not going to try to get to Cairo and the pyramids.) I want to make the most of the time I have, and feel extremely ignorant. Although Hellenistic Alexandria was a research interest of mine, the ancient city is largely gone, and I know little about what came after. I love Cavafy, so I shall probably check out his house museum, but I am completely ignorant about Durrell, the usual touchstone. Nor is Alexandria what it was in their day–the Greeks, Jews, Albanians and other minorities have mostly left. What the modern city is like, I have no idea. I can’t count to ten in Arabic. I don’t even have a guidebook. This is the new, non-obsessive tourist me. ..

If you know the city, leave comments. Tell me where to go and I’ll tell you what I thought of it! Think of it as social production of tourist memories…

*My favorite Wikipedia criticism is surely Karen Schneider’s, best expressed with reference to Orson Scott Card’s page: 
“But if you read this blog you know I have written that Wikipedia often seems more like a Secret Treehouse Club than everyone’s encyclopedia. Card’s Wikipedia page isn’t a biography, it’s an encomium by true believers who maintain fierce control over Card’s myth.”

Tuesday, July 8th, 2008

Build the Open Shelves Classification

This mural is said to depict Dewey and the railroad service he gave to Lake Placid, FL. It’s time to throw Dewey under the train.

I hereby invite you to help build the Open Shelves Classification (OSC), a free, “humble,” modern, open-source, crowd-sourced replacement for the Dewey Decimal System.

I’ve been speaking of doing something like this for a while, but I think it’s finally going to become a reality. LibraryThing members are into it and after my ALA panel talk, a number of catalogers expressed interest too. Best of all, one library director has signed on as eager to implement the system, when it comes available. Hey, one’s a start!

The Call. I am looking for one-to-five librarians willing to take leadership on the project. LibraryThing is willing to write the (fairly minimal) code necessary, but not to lead it.

As leaders, you will be “in charge” of the project only as a facilitator and executor of a consensus. Like Wikipedia’s Jimmy Wales, your influence will depend on listening to others and exercising minimal direct power.

For a smart, newly-minted librarian, this could be a big opportunity. You won’t be paid anything, but, hey, there’s probably a paper or two in it, right?

Why it’s necessary. The Dewey Decimal System® was great for its time, but it’s outlived that. Libraries today should not be constrained by the mental models of the 1870s, doomed to tinker with an increasingly irrelevant system. Nor should they be forced into a proprietary system—copyrighted, trademarked and licensed by a single entity—expensive to adopt and encumbered by restrictions on publishing detailed schedules or coordinating necessary changes.

In recent years, a number of efforts have been made to discard Dewey in favor of other systems, such as BISAC, the “bookstore system.” But none have proved good enough for widespread adoption, and license issues remain.

The vision. The Open Shelves Classification should be:

  • Free. Free both to use and to change, with all schedules and assignments in the public domain and easily accessible in bulk format. Nothing other than common consent will keep the project at LibraryThing. Indeed, success may well entail it leaving the site entirely.
  • Modern. The OSC should map to current mental models–knowing these will eventually change, but learning from the ways other systems have and haven’t grown, and hoping to remain useful for some decades, at least.
  • Humble. No system–and least of all a one-dimensional shelf order–can get at “reality.” The goal should be to create a something limited and humble–a “pretty good” system, a “mostly obvious” system, even a “better than the rest” system–that allows library patrons to browse a collection physically and with enjoyment.
  • Collaboratively written. The OSC itself should be written socially–slowly, with great care and testing–but socially. (I imagine doing this on the LibraryThing Wiki.)
  • Collaboriately assigned. As each level of OSC is proposed and ratified, members will be invited to catalog LibraryThing’s books according to it. (I imagine using LibraryThing’s fielded bibliographic wiki, Common Knowledge.)

I also favor:

  • Progressive development. I see members writing it “level-by-level” (DDC’s classes, divisions, etc.), in a process of discussion, schedule proposals, adoption of a tenative schedule, collaborative assignemnt of a large number of books, statistical testing, more discussion, revision and “solidification.” 
  • Public-library focus. LibraryThing members are not predominantly academics, and academic collections, being larger, are less likely to change to a new system. Also, academic collections mostly use the Library of Congress System, which is already in the public domain.
  • Statistical testing. To my knowledge, no classification system has ever been tested statistically as it was built. Yet there are various interesting ways of doing just that. For example, it would be good to see how a proposed shelf-order matches up against other systems, like DDC, LCC, LCSH and tagging. If a statistical cluster in one of these systems ends up dispersed in OSC, why? 

I have started a LibraryThing Group, “Build the Open Shelves Classication.” Members are invited to join, and to start working through the basic decisions.

Monday, July 7th, 2008

LibraryThing JSON-based books API

Over on the main blog I posted news about the new LibraryThing JSON-based books API (see here). The new API, which supplements our works API, comes with a small library of functions to manipulate it–all open source.

The API should be of interest to the libraries, as there are a couple of cool things they can do with the API. For example, with a few tweaks, it should be possible for libraries that use LibraryThing to showcase new or selected titles—a very popular thing—to create a widget that links into their OPAC, not to Amazon or whomever.

I’ll probably write some basic functions to change linking along these lines, if someone doesn’t do it for me first…

Thursday, July 3rd, 2008

Future of Cataloging

Part one. Part two is here.

On Sunday I participated in the ALA panel Creating the Future of the Catalog and Cataloging. My panel-mates were Diane Hillmann, Jennifer Bowen, Roy Tennant and Martha Yee. Robert Wolven moderated.

The whole panel was four hours long, with brief presentations by each of us and a lot of conversation. I recorded almost all of it, but the quality is very poor and I’d need everyone’s permission—including the questioners—to put it up. I can, however, put up my presentation. I had do re-record the screencasting part, which therefore isn’t click-perfect.

The second part is here:

Reading the Book. As usual, I neglected to underline just what all my evidence demonstrated, expecting the evidence to speak for itself. Thus my point in mentioning my wife’s book’s wrong LCSH’s was to point out that, while expert training is certainly valuable, the untrained taggers on LibraryThing often exceed the trained expert in having actually read the book. I should add that I say this to emphasize one way in which tagging is good, not to attack catalogers who have insisted, quite rightly, that they don’t have time to read the book, and aren’t being lazy or slapdash.

As you can imagine, this observation of mine has got me into some hot water. But I think it deserves saying, particularly as, despite all the discussions of cataloging vs. tagging out there, I have never seen this point mentioned.

To press my luck a bit, I’d also like to note that it sets the professional classification-vs.-tagging argument apart from similar arguments in related fields, e.g., real journalists vs. citizen journalists, real dentists vs. your dad with some string and a doorknob, etc.

But there’s an easy retort here too. Once cataloging is fully distributed—with librarians around the country able to take part—we can certainly imagine a future where, in addition to everyone else, at least one qualified, degreed library professional has also read the book and classified it. Wouldn’t that be the best of both worlds?

If I get some time—in short supply after letting emails pile up for a week!—I’ll blog about the panel in general. Despite its topic and length, it was very well attended—the police actually removed people from the room for overcrowding! And it spurred a lot of people to come by the LibraryThing booth to congratulate me or take me up on some point or another.

Incidentally, I forgot to name Jeremy Dibbell, who heads up Legacy Libraries now, and I referred to him as an archivist, not a librarian. I do my talks ad lib and make such mistakes. Mea Culpa!

Update: Diane Hillmann posted here slides here.

Tuesday, July 1st, 2008

Congrats to Otis

Congratulations to Otis Chandler and his new wife Elizabeth Khuri-Yakub, co-founders of Goodreads. (NYT piece!)

Otis and I met at the O’Reilly TOC conference. He is both very smart and very nice. I wish them all the best.

Tuesday, July 1st, 2008

Jason Griffey on conferences, library blogging and the death of the library

I decided to do a quick 30-minute podcast with Jason Griffey (member: griffey), the Head of Library IT at University of Tennessee at Chattanooga, and one of my favorite Library 2.0 people.

Jason was the organizer of this year’s BIGWIG Showcase, an innovative “camp”-style session at the American Library Associations conference in Anaheim. He is also the co-author of the recent Library Blogging, with Karen Coombs (who gets the first-author love).

It’s my plan to talk with interesting people from all parts of the book “world.” Casual blog readers should be aware, though, that this is a very library-focused talk.

We spent the first 14 minutes talking about BIGWIG and about library conference talks generally. Then we got into his book and I tried to stir things up a bit by challenging him on library blogging. We closed with the death of the library—and what can prevent it.

I may need to sit down with Library Podcasting to figure out the best way to make podcasts available. Until then, I’m just going to throw the file up as a MP3 here (here) and through this nifty flash plug-in.

