Tuesday, February 13th, 2007

Blyberg’s SOPAC

This isn’t breaking news at this point, but it’s still cool. John Blyberg has announced what he calls “SOPAC”:

It’s basically a set of social networking tools integrated into the AADL catalog. It gives users the ability to rate, review, comment-on, and tag items.

Tags, ratings, and reviews in an OPAC! I think it’s great that he’s done it— it’s no surprise that we’d love to put some of LT’s features into OPACs, and to see a big library like AADL take on social stuff legitimates the point.*

Anyway, you should check out his post about it, which even has a nifty screencast.

*Tim has blogged before about putting LT and OPACs… We’re thinking it’ll be a sort of OPAC widget, so hold onto your hats.

Labels: Uncategorized

Tuesday, February 13th, 2007

Introducing the book

Labels: Uncategorized

Monday, February 12th, 2007

Library of Congress Authority Files, Open!

So begins the PDF announcing and detailing a major new development for the library-data world. Simon Spero, library-geek extraordinaire, has released a nearly complete copy of the Library of Congress Authority Files.

Get them here:
http://www.ibiblio.org/fred2.0/authorities/

Simon assembled the files, available in MarcXML, by querying the Library of Congress’ Authorities website one-by-one over months. He’s a patient man.

As I’ve discussed before, Library of Congress data is both free and unfree. As a work of the US government, it cannot be copyrighted.* But the LC has traditionally restricted access, offering small amounts through public interfaces**, and selling larger amounts through its Cataloging Distribution Service. A small industry has developed where the CDS’s buyers resell it commercially. Until now, nobody has decided to just… let it go.

I anticipate that Simon’s action will draw some criticism. If the LC can’t make money selling its cataloging, how will it support this vital work? This sentiment will grow stronger when Casey Bisson releases the full LC Marc data, but whether for authorities or other cataloging data I think this is short sighted.

As I see it, the failure of the LC and other libraries to get their data “out there” on the open web has hurt them far more deeply than their catalog sales could ever recoup. It has made them seem irrelevant, standing silent and apart from the great conversation, which grows more interesting with each passing year.

The first culprits are the online catalogs***, ugly, backward things lamed with session-based URLs. If you want to link to the LC, you can’t. The URL you get will only work for you, for ten minutes. Linking–the very soul of the Web–is impossible.

The second culprit is how libraries have distributed the data itself. Amazon makes its book data accessible to all in a handy, universally-understood XML format. It’s so easy and appealing, over 140,000 developers have signed up to receive it. Libraries by contrast generally make their data available—if they make it available—over a tricky and obscure protocol know as z39.50. And the data itself is in MARC, a rich but impenetrable spectrum of formats—eg., DanMARC, the Danish MARC format!—used by and largely only understood by librarians.

With wretched web sites and unretrievable, unparseable data, libraries have lost vital ground. If the world worked right, Googling a book should turn up a library within the first few results. But libraries seldom make the top 100, and despite being the largest library on the planet and producing the lion’s share of original cataloging, the Library of Congress is completely absent. In its place are Amazon, its peers and sites that use Amazon data.**** Libraries may know a lot, but simplicity, attractiveness and ubiquitous data have won out.

It’s time to fight back. Libraries and library data can change the book web for the better. Three cheers to Simon for making a critical first step. Viva La Revolución, my brother.*****

*The LC reserves the right to copyright it outside of the United States. It’s unclear if they ever have.
**In LibraryThing’s case, through a z39.50 connection. Although the limits are not clearly specified, we’ve been given to understand that large-scale mining will not be tolerated.
***What library-techs called OPACs—Online Public Access Catalog. The fact that someone still needs to to add “Public Access” to “Online” is the problem in miniature. Does Google call itself a Public Access Search Engine?
****Don’t get me wrong; Amazon is a great site, and should be up in the top results too.
*****In so far as both Simon and I blogged the death of Milton Friedman, I suspect we’re equally uneasy with revolutionary Spanish.

Labels: Uncategorized

Thursday, February 8th, 2007

Web 2.0 Video

Unless you’ve been on Mars, you have seen this. Chris Anderson put it best: “This is why I do what I do.”

Labels: Uncategorized

Monday, February 5th, 2007

Can subjects be relevancy ranked?

I wrote this up on the plane from San Francisco. (I was there on a secret, unbloggable mission!*) It’s a bit involved and it doesn’t “arrive” anywhere, but, if you’re interested in subjects and relevancy ranking, it might be worth thinking about.

There are a couple differences between user tagging (“free tagging,” “social tagging,” etc.) and traditional library classification. “Who does it?” is the most obvious difference, followed by whether or not the labeling action takes place within a predefined ontology, or is made up on the fly.

It’s easy to ignore a third, and very critical difference. Subject classifications, like the Library of Congress Subject Headings (LCSH), are essentially binary. It’s non-overlapping buckets. Something either does or does no belong in a subject. There are no gradations of belonging.

The idea is, as Clay Shirky and David Weinberger have reminded us, rooted in the physical world. Subject classification escapes the physicality of shelf-order classification, in which a book must be shelved in a single place, but is still restrained by the physicality of the catalog card. A catalog card can only reference a certain number of subjects. Nobody wants a book to take up twenty cards. And the subject cards can only reference so many books. About 90% of all literature could fall under the LCSH subject Man-woman relationships. But it would make no sense to slot this 90% under that heading in a physical card catalog–the card catalog would instantly grow by 90%! And there seem to be very real differences in relevancy and “what-the-heck”-ness between real-life members of the “Man-woman relationships” LCSH: High Fidelity, Great expectations, The Fountainhead, I Kissed Dating Goodbye, and The Official Hottie Hunting Guide.

If you’re very selective, you can keep the numbers down. But, apart from the rule that the first subject is generally the primary one, there’s no good way to relevancy rank the books belonging to a subject.

Tags can do it, because tens, hundreds or thousands of users applying tags creates a “statistics of meaning.” So, 1984 is tagged dytopia 549 times, torture six times and Great Britain two times. The numbers can be turned into ranking, so 1984 shows up high on a list of books about “dystopia,” lower under “torture” and near the end of a list of books about Great Britain.

This is all well-worn territory. My question is this: Is there any way to relevancy-rank books within subjects?

I was reminded of the question when checking out OCLC’s new project, FictionFinder. I’ll blog about the whole later, but for now know that you can search for a LCSH subject and get back a list of books belonging to it. (I can’t link to the results, which are session based.**) Check out the LCSH “City and Town Life” and the top book is Red Badge of Courage. Lacking a better method, FictionFinder let popularity (the number of OCLC libraries with a copy) stand in for relevance. LibraryThing does the same, using our popularity numbers instead. The results are not systemmatically better (in this case Ulysses wins).

I tried two solutions:

The first was to tie into LibraryThing’s tags. So, figure out what tags are most characteristic of books with the subject “Man-Woman Relationships,” and then use the presence and number of these tags to rank the subject results. So, for example, “Man-Woman Relationships” has a global correlation with “relationships,” “dating” and “romance,” none of which are very prominent among the tags applied to Great Expectations, so it can fall low on the list.

I got far enough down this road to know it was going to help.

The second and more interesting algorithm was to see if books can be ranked within subjects without any other information. This would help OCLC, who are unlikely to pay for LibraryThing data, and to any library that employs LCSH, most of which would have no “popularity” data to use either.

I hit upon the idea that subjects “reinforce” each other, and that this must leave a statistical signature. For example, it seems that “Love stories” and “Psychological fiction” are commonly applied to books about “Man-Woman Relationships,” but that “Androgynous robot alone on an island — Stories” is not. (Okay, that’s not real, but the point stands.) Can these “related subjects” relevancy rank the subject itself?

I wish so, but I can’t get it to work well enough. It works for some topics, but falls down for others, laughably.

Some ideas I’ve considered:

  • Treating subjects as links, and running some sort of “page-rank” style connection algorithm against them. Maybe this would bring out coincidences that simple statistics misses.
  • Using other library data, such as LCC and Dewey. This would be reminiscent of how I made LibraryThing’s LCSH/LCC/Dewey recommendations.
  • Doing statistics on other fields, such as the title. So, for example, there’s probably a statistical correlation between “Man-woman relationships” and books with “dating,” “men and women” and “proposal” in the title.

None strike me as the silver bullet.

Anyway, my plane has landed–allowing me to do real work again–so I end in aporia. Ideas?

*I’m itching to blog it, but I have to hold off for now. I’ll throw some pictures up soon, however. I’d never been to San Francisco before. What a wonderful wonderful town.
**One can understand why OPACs made in 1996 are session based. How frustrating to see a new product with them.

Labels: Uncategorized

Monday, January 29th, 2007

Prolegomena to a Review of Everything is Miscellaneous

A week or two ago I received an advance reader’s copy of David Weinberger‘s Everything is Miscellaneous, due to be published in May. Over the next four months I’m going to be mentioning Weinberger and his book a lot, culminating in some sort of “real” review timed to the release date.

Everything is Miscellaneous is about “what’s happening to knowledge” in the digital age–how structures of knowledge suited for the physical world are being transformed in the digital one, and what this means. The topic is of direct interest to what LibraryThing is “all about,” and I think, to the Library and Information Science and Information Architecture readers of Thingology generally. Tags, faceted classifications, Flickr, Del.icio.us and Dewey—need I say more?*

As will become clear, I’m a huge fan on Weinberger’s work (while still finding grounds for criticism). Everything is Miscellaneous is well-written and accessible, and not without intellectual depth. A philosophy PhD, radio commentator and business consultant, Weinberger’s book has a shot at becoming the “next Tipping Point,” while also mentioning Heidegger far more than most business readers will expect (or want).**

In subsequent posts I’m going to spend some time exploring Weinberger’s ideas, drawing mostly on his many talks, some of which actually map to chapters of his book. But I won’t cite the book a lot. I once managed to review a Hollywood movie long before its release date, and wound up the first review online.*** But I think ARC etiquette dictates I hold off. Someone tell me if I’m wrong.

If you haven’t heard any of Weinberger’s “preview talks” to Everything is Miscellaneous, I’ve provided a guided list to all I’ve seen online. They differ a good deal, but clearly partake of a single Platonic talk.

Check one out and come over to the LibraryThing’s Everything is Miscellaneous group. Weinberger started it himself, perhaps thinking it wouldn’t see action until the book came out. In this as everything else, I’m acting without coordination.****

  • Talk in Scotland, close the UNC talk, but slower and more focused in presentation. The video is the only I’ve seen that shows his creative animated slides (audio and video)
  • Weinberger at the LC, with some bits and pieces from his second book, Small Pieces Loosely Joined, in the mix (audio)
  • “Messiness is a virtue.” A lengthy, intellectual, messy slice from the upcoming book (audio)
  • Another slice (audio)
  • Short All Things Considered piece on tagging
  • At the University of North Carolina School of Information and Library Science (audio)
  • What’s happening to knowledge? Wikimania 2006. I can’t put my finger on it, but this is my least favorite one (video)

*And LibraryThing briefly and—very distressingly—wrongly. I’m going to see if he can at least eliminate the error.
**I thank Weinberger for finally giving me something to say about Heidegger at cocktail parties while allowing me to remain unsullied by his impenetrable prose and political villainies.
***My review of Oliver Stone’s Alexander the Great.
****Full disclosure: Apart from asking for and getting an ARC, I have nothing to disclose. He’s never bought me a beer and I don’t owe him money. He once let me look at an essay he was writing, and I made pedantic objections related to Greek oleiculture. I ate some free food at one of his talks, but I don’t think he paid for it.

Labels: Uncategorized

Sunday, January 7th, 2007

We’re hiring!

See the post on the main blog.

Labels: Uncategorized

Sunday, December 17th, 2006

Person of the Year: Me

Okay, all of us. See Time’s Person of the Year. And here’s 1982. It took a long time, but I think it was you all along.

Update: I recently emerged into the “real world,” and discovered the cover is a mirror. Very clever, but it raises the question—when will screens be able to display reflective metals, and not just colors?

Labels: Uncategorized

Sunday, December 17th, 2006

LCSH: Dave’s topic

From the Library of Congress Authorities. Click to enlarge or use this permalink.

Hat tip to the Mysterious Stranger, whose identity will be revealed in a future post!

Labels: Uncategorized

Friday, December 15th, 2006

Top LIS Stories of 2006

Check out the Ten Stories that Shaped 2006 from Library and Information Science News. Library 2.0 makes the list, with LibraryThing mentioned by name, as does the TASERing of a student at the UCLA library, the “Library Weblog Explosion,” James Frey, Privacy and Censorship.

Labels: Uncategorized