Archive for February, 2007

Monday, February 26th, 2007

New feed: Wikipedia citations

I parsed the English Wikipedia looking for ISBNs and came up with a public feed of ISBNs to articles.

Over on the LibraryThing blog, I posted about it. LibraryThing is now showing the results on work pages, but others should feel free to use the feed for their own ends. (Cause trouble. Put Wikipedia in your OPAC!)

Enjoy.

Labels: Uncategorized

Monday, February 26th, 2007

O’Reilly Radar on tagging

The O’Reilly Radar Blog does a nice post on my When tags work and when they don’t: Amazon and LibraryThing.

O’Reilly blogger Brady Forrest notes that much of what I said echoed Joshua Schachter of Del.icio.us:

“You have to understand the selfish user – user #1 has to find the system useful or you won’t get user #2. Systems that only become useful when lots of people are using them usually fail, because there’s no incentive for people to contribute themselves.”

So now would a good time for me to say I’m not claiming my insights are all my own. Most of this stuff has been in the air for a while. While I’d never seen that piece by Schachter, I’ve seen similar statements by him and others. (And there’s a good short panel taped with him and David Weinberger.) I’d love to talk tags with these guys. No doubt like Wikipedia founder Jimmy Wales, Schachter’s so swamped he’s not even answering Bono’s email (source: net@night).*

A few corrections:

  • AllConsuming is already owned by Amazon. That is, Amazon now owns two of LibraryThing’s competitors!
  • Shelfari allows tagging. Honesty hurts.
  • As good as “CEO” sounds, I’m only the president. LibraryThing is an LLC.

Labels: Uncategorized

Monday, February 26th, 2007

Wikipedia citations, with feed

Update: Changed feed URL.

I’ve added a cool new feature, building on some work by library programmer Lars Aronsson—Wikipedia citations to all works pages. That is, work pages now list of all the Wikipedia articles that cite the work. The data is also available in feed form.

Here’s how it goes. At the top of J. F. C. Fuller’s A Military History of the Western World it lists how many citations, with a link:

And, down below, it shows all the articles:

How we I did it. Basically, I did a complete run through the Wikipedia dump files (source), parsing out anything that looked like an ISBN and checking if it is. It’s pretty easy. So it sees:

Fuller, J.F.C. A Military History of the Western World. Three Volumes. New York: Da Capo Press, Inc., 1987 and 1988. — v. 1. From the earliest times to the Battle of Lepanto; ISBN 0-306-80304-6: 255, 266, 269, 270, 273 (Trajan, Roman Emperor).

and gets the ISBN. I’ve started in on the harder problem, parsing books without ISBNs, like:

Bowersock, G.W. Roman Arabia, Harvard University Press, 1983.

It’s not actually that hard. But it’s fiddly. And it’s one of those problems where each additional percent of accuracy costs 50% more effort.

What’s the most cited books? The most cited book on Wikipedia is… The Official Pokemon Handbook. Surprised? Don’t be. In fact, eighteen of the top twenty most-cited works are Pokemon books. It boggles the mind. Somebody, or a bunch of somebodies went ISBN-happy on all the Pokemon entries. Fortunately, the existence of so many citations to Pokemon does not impair the quality of the rest. It’s just… Wikipedia. There’s a decidedly quirky character to many of the other winners, testimony to some serious passions. Number 28, with 177 citations, is Richard Grimmett‘s Birds of India, Pakistan, Nepal, Bangladesh, Bhutan, Sri Lanka and the Maldives. I think this effect would be diminished a lot if non-ISBN books were added.

Where did this come from? I owe the idea to Lars Aronsson, who came up with a simple script and ran it against the Wikipedia dumps and posted the results on Web4Lib back in September. I wrote him soon after to see if he was going to provide a public data feed, or if he minded if I did. He did not. His results differed a bit from mine. I’ll be in touch with him to square the differences.

Unfortunately, the Wikipedia data is not updated as often as one might like. The most recent is from November of last year. I’ll keep an eye on the download page, and reparse the data when a new dump comes available.

What’s this about a feed? We’re big fans of openness. And it’s Wikipedia data anyway. So we’ve made a feed of it. You can get it here:

http://www.librarything.com/feeds/WikipediaCitations.xml.gz

UPDATE: I changed the URL and gzipped it. Needness to say, I’m not putting any restrictions on this, but if you do something cool, I’d love to hear about it.

As usual, tell me what you think.

*We’ve seriously considered open-sourcing LibraryThing. But given the state of the code, it would be, as Nabokov said of rough drafts, like passing around samples of our sputum. We may out-source pieces of the code—the pieces we’re happiest about.
**LibraryThing is in the odd position of having almost as much bot traffic as we have person traffic. Google loves us. Guys, you love us too much!

Labels: 1

Monday, February 26th, 2007

Introducing the Helpers log

Update: Author links added. See below.

I’ve added a new page, the Helpers log, that tracks the various ways users help LibraryThing and each other—work, author and tag combinations, author picture and “author nevers.” (John will add author links tomorrow.) The new page will make it easier for eagle-eyed Thingamabrarians to watch over what’s going on with these critical activities, and smite miscreants.

By the way, did you know we are averaging 2,000 work-combination actions per day? Per day, folks! That’s not even works combined, which is higher since a combination will have at least two and and high as twenty. It boggles the mind.

This isn’t a small thing. You guys have up-ended the world of book data. And we’ve only just begun.

Update: Author links added. Unfortunately, we weren’t storing the right data for author links. So it’s only showing ones added since we fixed the system. It also means we don’t know who added links before, not exactly anyhow. Again, apologies.

Labels: 1

Friday, February 23rd, 2007

Not the Ninja!

I’m loathe to take the last post, When tags work and when they don’t: Amazon and LibraryThing, off the top. It got a *lot* of attention*, and I owe the commenters a follow up.

The Shifted Librarian found this YouTube video “put together by Steven Reed’s students at Wilmington High School.”

It’s a fun video, no question. It’s an *amazing* demonstration of what kids can do these days. My highschool had the best Super8 program in Massachusetts, and this level of professionalism would have been way past our capabilities. The book-throwing is great. The editing is quick and professional. The kids get an A. They rock.

But I can’t leave it at that. The kids are rock stars, but the message is all wrong—and it’s wrong in a very telling way.

The situation is completely false. I don’t mean the ninja—they’re increasingly common in libraries of all sizes—but the contest and its results.

Type Capital of Russia into Google and you get this:

You don’t even need to GO to a page—the answers are in the page titles themselves. Face it, the Web is *great* for this sort of thing. You’re not going to “defend” books by claiming they’re better for looking up trivial facts. They’re not. Breathe deep and repeat after me: They’re worse.**

The second false idea is that libraries and the web are rivals, two competing ways to get the same thing (which is mostly factoids). This is all too often how popular culture sees libraries, and it’s a disaster. If libraries are just low-tech search engines, they are bad ones. They should be defunded and closed.***

I’m not going to launch into a defense of libraries and books. Of course I love them. I started LibraryThing because I loved them so much. But I don’t love them because I hate computers, or because books are better than computers.**** I don’t see them as rivals. The web has supplanted a few things that books used to do, but not the important ones. And libraries can do things with computers they are only just starting to explore.

People who love books need to fight against these ideas. They’re a trap. They’re wrong, and they’re very dangerous to the things we love.*****

Yeah, I know, lighten up, Tim!

*Alexa is under the impression LibraryThing’s traffic doubled the day the post hit. That’s total nonsense and a great proof of Alexa’s failings. It makes me wonder how much they rely on new link creation, not traffic.
**Where did http://www2.blogger.com/img/gl.link.gifthat guy find the books? Does he have the shelving memorized, or did he consult and OPAC to find Countries of the World?
***And don’t tell me it’s about not everyone having a computer. If so, libraries should be just computer centers.
****For starters, books are worse at email, worse at social networking, and they are hands-down a lousy way to blunder upon shocking new types of pornography.
*****Can anyone help me find a quote? I think it was from random sci-fi movie or show, taking place in the near future. The quote was something like “Would you have wanted to shut down the internet just to keep the libraries open?” Don’t even try to Google it. (And I recommend not going to the library either.)

Update: This ninja movie has a good message.

Labels: Uncategorized