Archive for April, 2007

Monday, April 30th, 2007

LibraryThing for Libraries launches

We’ve launched the LibraryThing for Libraries demo site. After CIL we pushed everything back a week to work on speed, add fielded imports, and make some interface changes to the tag browser.*

Here’s the demo site: http://www.librarything.com/forlibraries/

So far we have about two dozen libraries and consortia interested enough to send us ISBNs. Over the next few days we’ll be getting back to people with directions on testing the service out.

Sad to say, but we’re still trying to figure out pricing. Here’s my thinking, which ends in aporia.

  • It seems right to tie the price to the number of ISBNs that LibraryThing can potentially enhance. For public libraries, this is about 50-70% of ISBNs. For academic libraries it’s more like 25-50%. So, my thought was to make it $.02 for the first 25,000 ISBNs, and $.01 after that. (The two levels try to get at the shape of interest in a given ISBN; it’s more valuable to enhance Harry Potter than some obscure book.)
  • So, a small city in New England (pop. 75,000), has 84,612 ISBNs. 57,312 (67%) are enhanceable by LibraryThing. That comes to $823/year. That seems like a very good deal.
  • Clearly a consortium needs to pay more than a single library with the same number of ISBNs. After all, the consortium will have multiple copies of the item spread around the various consortium members. But a consortium of fifty libraries won’t actually have fifty copies over every ISBN, and there ought to be some “bulk” savings for them anyway.
  • This lead me to charging consortia a multiple of the square root of the number of members. So, for example, a library with 284,742 enhanceable ISBNs would pay $3,097, and an identical consortium with 28 members would pay $3,097 x SQRT(28) = $16,390.
  • Then you have the “branch” problem. A large city signed up for a beta test. They have 270,002 ISBNs—$2,950. But they have some 30 branches, a population of 600,000, and a library budget of $30 million dollars! This doesn’t work.

So, I think I need to get total collection or circulation figures, and multiply them by the percentage of ISBNs we can enhance.

I wish we could expand our pay what you want program…

(Money photo courtesy Jessica Shannon on Flickr, under Attribution-ShareAlike 2.0)

*Among other things, we normalized ISBNs, moving from storing a 13-character string in every table that needed them, to storing a four-byte integer tied to a table that mapped the integers to ISBN. Normalizing textual data happens all the time here, but normalizing something already so compacy and inherently unique was force on us by the dawning realization that we’re going to be handling dozens or hundreds of millions of bibliographic records. So now LTFL tosses around arbitrary ISBN keys like mad, without ever knowing what ISBN they represent. O brave new world…

Labels: Uncategorized

Friday, April 27th, 2007

Metacritic links added

I knocked out a quick feature connecting LibraryThing works pages to their corresponding page on Metacritic. Metacritic is like RottenTomatoes, giving brief excerpts from press reviews and boiling them down into a single number (the “Metascore”). Unlike RottenTamatoes, Metacritic covers more than movies. Here’s an example from Sam Harris’ The End of Faith.

Metacritic’s books coverage does not seem as strong as some other categories (560 ISBNs total), but I think it’s useful and interesting. Perhaps some day LibraryThing will collect review snippets itself, but for now Metacritic should be a useful link.

Metacritic was informed of what we were planning, but no money changed hands. Hey, who can turn down a free link?

Labels: 1

Thursday, April 26th, 2007

Bugs, New York, Radio

Today was a full day—New York, radio, publishers and bug-fixing. In reverse order:

Bug fixing. I finally slew the bug that sent work copies off into la-la land. I also found why book-swap data was screwed up. It turns out Bookmooch’s data feed is now too large for PHP’s 40MB default memory space, and this was short-circuiting other feeds. Wow—way to go Bookmooch. I increased it to 80MB until I can rewrite it to load the data in pieces, and reloaded everything. I also fixed a matching algorithm, so that http://www.librarything.com/title/the_perfect_store goes to The Perfect Store, not The Great Gatsby. I’ll be working the rest of the night, except when I have to put my laptop through the x-ray.

Publishers. I gave a talk to the Association of American Publishers. Twenty minutes is too little time, starting from zero and trying to get to what’s “happening” with social software and books. But I think I got across the central message—(1) I’m crazy*, (2) LibraryThing is orders larger and more interesting than its competitors, (3) stop marketing at people and get in the conversation, (4) get involved with LibraryThing.

It certainly would be nice if the publishing world were as friendly to LibraryThing as the library world.

New York. I flew into JFK this morning (6am departure, ouch!). I was there on business, but, since I work all day long, I don’t feel guilty spending the afternoon at The Strand, diligently confirming they do, in fact, have eighteen miles of books. Today’s haul: Richard Westfall‘s The Life of Isaac Newton and Adam Cohen‘s The Perfect Store: Inside eBay.

Radio. I appeared on Public Radio International‘s Radio Open Source. They were doing a show on David Weinberger’s upcoming book Everything is Miscellaneous: The Power of the New Digital Disorder. I’ve blogged about David and his book before. To repeat: It’s excellent. Weinberger, a true Miscellaneous Man**, explores how digitization and mass-collaboration, -filtering and -classification (eg., tagging) are changing knowledge, and its relation to authority. After an introduction with David, host Christopher Lydon brought in super-librarian Karen Schneider, then me, to chime in on the topic.

I pointed out how tagging worked for tags like chick lit, queer, glbt and lgbt. I also tried to get at a nagging issue for me—does “knowledge” change, or do we just get new perspectives and ways of getting at it? I’m happy to see the realm of debate, uncertainty, personal choice and personal understanding expand—for us to “swim in the complex,” as David writes. But I won’t give up on a small, hard (Pluto-like?) core of truth. More on that later.

OpenSource streams at 7pm tonight. After that, the audio—direct or podcast—will be available here.
*I love explaining to people that LibraryThing has no advertising or funded promotions, and doesn’t push affiliate links, but is profitable. On a more personal note, it was unreal being back among “publishing types.” I never mentioned it, but I used to work at Houghton Mifflin. I felt at home in uncomfortableness, as it were.
** Who else has a PhD in Philosophy and wrote jokes for Woody Allen? He’s more varied than my junk drawer!

Labels: Uncategorized

Saturday, April 21st, 2007

Tanned, rested and ready

I’m flying back from four talks in four days—Computers in Libraries (twice), the Library of Congress, and Digital Odyssey in Toronto.

The Library of Congress talk was videoed, and will be public in a couple weeks.* It was great fun to do. (Who could pass up the chance to discuss the tag vampire smut with some of the world’s top catalogers?) And as a long-time user and admirer of the Library of Congress, it was quite an honor. I pushed them hard on openning up their data, and the shortcomings of the LC subject system, but they were good natured about it. And my anti-OCLC feelings drew no fire. As one of them put it, only half-kidding, “They wouldn’t be anything without us.”**


Derik A. Badman’s cartoon of Roy Tennant (left) and me (right) giving talks. Actually, Roy’s example was about murdered midget gypsy prostitutes. Well that’s three conferences he won’t be invited to!

At CIL I got a lot of opportunity to show off LibraryThing for Libraries, our new push to put LibraryThing data into library catalogs. Response was positive, even fevered.*** Demos went well, showing book recommendations and tags in a large public library. (“Chick lit” and “cyberpunk” are great examples, but I have to size people up quickly to know which one to use.) There was a certain amount of disbelief about its coolest feature–no back-end integration and working with any system. But anti-system-vendor sentiment is so high that this was welcomed. The first round of libraries should be at least a dozen strong, with both academics and small and large publics.

The highlight of all three conferences was the chance to puts faces to names, often names of blogs. My Google Reader feed is suddenly full of people I know! (But if I start listing I’ll surely forget someone…) I had a couple good meals, one good argument, a great lunch conversation at the LC and, as a coda, a stroll around Toronto.

(later) I’m off the plane now and in D.C., staying with friends for a few days. There’s a lot left to do for LibraryThing for Libraries, but the big initial push is over, and we can throw time back into building new features for LibraryThing. A number of them will revolve around JavaScript. Altay, our new JS guru, will be rolling out some serious magic.

*Among other things they need to synch it up with the “slides.” But I do my talks live, driving around at breakneck speed. The staffer assigned to coordinate the synch looked positively frightened.
**I owe a blog post revising my post about OCLC and MIT. Apparently OCLC didn’t stop them, but MIT legal.
***I came back from one talk to find the booth table littered with business cards. I felt like an NBA star at a nightclub.

Conference coverage:

Labels: 1

Tuesday, April 17th, 2007

5¢/patron, $1/student

From now on if a public library or a college or university wants to buy memberships for everyone in a community, it’s 5¢/patron, $1/student.

(see Thingology)

Labels: 1

Tuesday, April 17th, 2007

5¢/patron, $1/student

For a while now, libraries have been approaching us about whether LibraryThing would sell them bulk memberships—so all their patrons could, potentially, become members. Today at CIL two more people asked. Time to act.

From now on if a public library or a college or university wants to buy memberships for everyone in a community, it’s 5¢/patron, $1/student.

The math is easy. If a town wants to give out free LibraryThing memberships, and they have 20,000 patrons—defined as working library cards—they would pay $1,000. If a college or university want it, they pay $1 for every student, grad and undergrad—profs. and staff ride for free. The library gets a stack of membership cards, each with a unique code, good for a year’s membership from the date of activation.

Details:

  • Patron cards would have to be given out in person, not over the phone.
  • Student accounts would require email confirmation to a valid school email (like Facebook)
  • Communities may elect to set up a group. Members would get an automatic invite for that group.
  • We will work to make sure LibraryThing links to and collects data from the institution in question. The latter requires an open Z39.50 connection.

If interested, write tim@librarything.com.

Labels: Uncategorized

Sunday, April 15th, 2007

Tim to CIL and the Library of Congress, Abby to Australia

UPDATE: If you’re in DC and want to come to CIL (Librarian? Enjoy vendor-tchotchkes?), I have 50 free tickets. I’m supposed to give them out to my important vendors and clients. That’s you. Email me and we’ll figure out how to do this. I’ll probably leave a stack at the closest Starbucks.

I’m off to Computers in Libraries in Washington, DC. LibraryThing will have a booth there, and I’ve giving two talks:

  • Tuesday, 1:30-2:30. “Cutting Edge Leaders.”* One whole hour of me, giving my general talk about what LibraryThing is and what it means, amped-up for savy CILers.
  • Wednesday. 11:30-12:15. “Catalogs/OPACs for the Future,” with me and Roy Tennant. I’ll probably do LibraryThing for Libraries.

I’ll be showing LibraryThing for Libraries at the talks. Unfortunately, it’s just me, so I’m going to torn between manning the booth and going to all the great talks. I’m bringing along forty CueCat barcode readers. Free? No. I’ll be giving them out at cost—$5.

On Thursday I’m doing a talk at the Library of Congress. I am completely psyched. It’s not open to the public, but they said I could sneak in a friend or two.

Also on Thursday, Abby will be in Australia at the Innovative Ideas Forum, hosted by the National Library of Australia.**

On Friday*** I’ll be the closing keynote at Digital Odyssey 2007, hosted by the Faculty of Information Studies, University of Toronto. I’m talking about “Social Cataloging and the ‘Fun’ OPAC?” I put them in myself, but I want to remove those quotes.

*Apparently I am one, because it’s just me and I not planning to talk about the others.
**Synchronicity. We have tried and failed to find national libraries for the other LibraryThing employees to talk at on the same day. If you represent such a library, please contact us.
***Portland->Boston->DC->Toronto->DC->Portland. Gulp.

Labels: Uncategorized

Saturday, April 14th, 2007

LibraryThing for Libraries: How it works / The five-second rule

The LibraryThing for Libraries widgets have a unique architecture. You install it on your OPAC’s HTML pages, but the OPAC doesn’t “do anything.” All the work takes place in browser JavaScript requests to the LibraryThing for Libraries servers. Only when the patron clicks on a specific book does the library OPAC come into the picture again.

Your creaky OPAC can rest easy. All the database work and the statistical number-crunching that makes something like recommendations or tag browsing possible takes place elsewhere. You get beefy new functionality without a single extra OPAC request. (Of course, we think using a LibraryThing-enhanced catalog will be so fun—we don’t mean that ironically—that patrons will spend more time browsing them.)

*BUT* before LibraryThing can take the work off your hands, it needs to know what ISBNs you have. So we ask for an export with ISBN data, and accept any format your OPAC makes.* And if a link to a book is to display the same title and author given in your OPAC, it needs to get them. Exporting and uploading them is impracticable. There are dozens of possible formats to parse, and anything that complicates the export process will limit our potential user-base. LibraryThing for Libraries needs to be dirt-simple. It needs to be people-who-doesn’t-even-know-HTML simple.

So, LibraryThing for Libraries hits your OPAC to collect titles and authors, “screen scraping” the pages. The question is: How fast can it go?

Good question, and one we’ve struggled with. In search-engine industry, the standard maximum is one request/second. Google, Yahoo, AskJeeves, MSN (who?) and their peers use that as their benchmark, although you can request to speed them up or slow them down using standards like robots.txt. And they’ll do it all day long every day, and obviously without regard for how many others are hitting you too. In March LibraryThing was visited by 71 registered “bots.” The greediest, Google, hit us 11,338,467 times–an average of 4 times/second–and took almost 200GB. As our total bandwidth was 650GB, you can understand why Google sometimes seems a a bit, er, codependent.

Anyway, I wrongly believed that most OPACs could handle 1/second. After all, the libraries who’ve contacted us all have systems that cost hundreds or millions of dollars. And most have unspiderable “sessions,” so LibraryThing wouldn’t be competing with Google and its ilk.

Apparently I was wrong. Until Thursday, the requests were sporadic or round-robin-ed, so the effective time between requests was more than a second. Thursday afternoon we threaded the process, so they could run mostly continuously and concurrently. This morning I heard back that LibraryThing was taking too much from one OPAC, and slowing performance. Yipes! The system in question served a consortium of more than 25 libraries, so one can expect it isn’t the slowest, worst OPAC out there! We yanked the spidering. They took it well, even so. We owe them.

So, the new rule will be one request/five seconds max. And I’ll put in the rule of monitoring how fast it took the document to come in, and waiting a multiple of that, so any performance issue is adjusted for in real time. The LibraryThing for Libraries interface–not yet publicly available–allows libraries to speed up or slow down the process. “Slow” will reduce it to 10 seconds; “fast” will increase it to 2 seconds.

The new speed will mean longer waits before a library can see LibraryThing for Libraries in action. In our experience, we run about 50% coverage on US publics, so a 250,000-ISBN library will have 125,000 overlapping ISBNs and take a week for us to fetch all titles and authors. With almost three million ISBNs in LibraryThing already, we can show a library what the widgets will look like before, so long as they understand the titles may not match theirs exactly.

We thank the dozen libraries who are participating in our initial tests of the system. We think everyone is going to be impressed with the result. We got the tag-browsing widget working last night, and it’s absolutely fantastic. Altay, our JavaScript guru, is outdoing himself. And I celebrated with a big hunk of brie. I can’t wait to finish it up and show it off at CIL and the Library of Congress next week.

*This is possible because ISBNs aren’t just numbers, but numbers with structure. They are either ten digits (and maybe an X) long or thirteen digits starting with 978 or 979.** And the last digit is a checksum–a calculation based on the others. So ISBN 0747532699 is the first British edition of Harry Potter and Philosopher’s Stone, now selling for upwards of $1,000. But change a digit and you don’t get another book, but an error. The checksum won’t work. If anything bad slips through, running the ISBNs against LibraryThing’s books tosses them out.
**ie., ([0-9]{9}[0-9X]|(978|979)[0-9]{10}) in regular-expression land, where I live.

Labels: Uncategorized

Friday, April 13th, 2007

Going to CIL, with inflatable

What do you do when Computers in Libraries charges LibraryThing thousands of dollars to exhibit, a cool $1,000 for two days of internet access and THEN tells us we can only bring in “what one person can carry in one trip”?

Well, what large thing can one person carry in one trip?

If you answered “a five-foot long inflatable rhino,” you think like we do. Here’s Abby showing us that it’s arrived. John and Altay (who?) and I are in the background.

Labels: Uncategorized

Thursday, April 12th, 2007

Cooking book pile contest

Our newest book pile contest theme is cooking.* For something new, we’re teaming with Abebooks. They’re co-sponsoring the contest, and offering a fantastic grand prize—$100 to use on AbeBooks.com. See their announcement of the contest here.

Without further ado…

The rules
1. Take a photo of your books. Be creative and inspired! Remember, the theme is cooking.
2. Post the photo to Flickr. You’ll need a Flickr account to do this (it’s easy and free).
3. Tag your photo on Flickr with “LibraryThingAbeBooks”

The prizes
First prize—$100 to use on AbeBooks.com
Second prize—Free lifetime membership on LibraryThing

The deadline
May 15, 2007

Special note on Flickr—photos from new accounts don’t always show up on the global tag page right away. If your photo isn’t showing up here, you can always post the URL in the comments to this blog post, we’ll make sure to find it.

Need inspiration? Check out the book pile archive for past winners, or Abebooks’ new “Books for Cooks” section.

*Maybe it’s that my recent move unearthed all the fancy but long-forgotten kitchen tools, or maybe it’s that I just read Julie and Julia,** but I’m in the mood to cook lately.
**The book that introduced me to the concept of aspics. Never have I been more glad to be a vegetarian…

Labels: book pile

Thursday, April 12th, 2007

Drop Everything and Read

No really! One of my favorite bookstores* reminded me that today is “Drop Everything and Read Day”. Sponsored by Ramona Quimby, of course, and celebrated on Beverly Cleary’s birthday. So go to it!

*Harvard Bookstore. They rock.

Labels: 1

Thursday, April 12th, 2007

WorldCat: Think locally, act globally

OCLC just announced a “pilot” of WorldCat Local. In essence, WorldCat local is OCLC providing libraries with a OPAC.

That’s the news. Here’s the opinion. Talis’ estimable Richard Wallis writes:

“Yet another clear demonstration that the library world is changing. The traditional boundaries between the ILS/LMS, and library and non-library data services are blurring. Get your circulation from here; your user-interface from there; get your global data from over there; your acquisitions from somewhere else; and blend it with data feeds from here, there and everywhere is becoming more and more a possibility.”

I think this is exactly wrong. OCLC isn’t creating a web service. They’re not contributing to the great data-service conversation. They’re trying to convert a data licensing monopoly into a services monopoly. If the OCLC OPAC plays nice with, say, the Talis Platform, I’ll eat my hat. If it allows outside Z39.50 access I’ll eat two hats.

They will, as the press release states “break down silos.” They’ll make one big silo and set the rules for access. The pattern is already clear. MIT thought that its bibliographic records were its own, but OCLC shut them down when they tried to act on that. The fact is, libraries with their data in OCLC are subject to OCLC rules. And since OCLC’s business model requires centralizing and restricting access to bibliographic data, the situation will not improve.

As a product, OCLC local will probably surpass the OPACs offered by the traditional vendors. It will be cleaner and work better. It may well be cheaper and easier to manage. There are a lot of good things about this. And—lest my revised logo be misunderstood—there are no bad people here. On the contrary, OCLC is full of wonderful people—people who’ve dedicated their lives to some of the highest ideals we can aspire. But the institution is dependent on a model that, with all the possibilities for sharing available today, must work against these ideals.

Keeping their data hidden, restricted and off the “live” web has hurt libraries more than we can ever know. Fifteen years ago, libraries were where you found out about books. One would have expected that to continue on the web–that searching for a book would turn up libraries alongside bookstores, authors and publishers.

It hasn’t worked out that way. Libraries are all-but-invisible on the web. Search for the “Da Vinci Code” and you won’t get the Library of Congress–the greatest collection of books and book data ever assembled–not even if you click through a hundred pages. You do get WorldCat, seventeen pages in!

The causes are multiple, and discussed before. But a major factor is how libraries deal with book data, and that’s largely a function of OCLC’s business model. Somehow institutions dedicated to the idea that knowledge should be freely available to all have come to the conclusion that knowledge about knowledge—book data—should not, and traditional library mottos like Boston‘s “Free to All” and Philadelphia‘s Liber Libere Omnibus (“Free books for all!”) given way to:

“No part of any Data provided in any form by WorldCat may be used, disclosed, reproduced, transferred or transmitted in any form without the prior written consent of OCLC except as expressly permitted hereunder.”

We now return you to our regularly-scheduled blogging.

Labels: library of congress, oclc, open data, worldcat local