Wednesday, May 2nd, 2007

Many more Wikipedia citations

You’ll notice many more Wikipedia links from work pages. The total has increased by about 200%, and the coverage by at least that.

This improves what I did in February. That worked by looking for ISBN patterns. Of course, not all books cited in Wikipedia have ISBNs. And even when there is one, many Wikipedia contributors omit it. (As far as I’m concerned, ISBNs look chintzy in a bibliography anyway.)

I’ve redone it, this time also looking for telltale title/author patterns, and running the matches against LibraryThing’s vast and usefully messy dataset. The logic is somewhat fuzzy and therefore imperfect. But I haven’t noticed any problems.

The number of citations expanded a lot.* Some entries exploded. Take Thomas Kuhn’s The Structure of Scientific Revolutions:

Notably, it caught casual references to books, not just structured ones. For example, the article on Science wars mentions Kuhn’s work in running prose, not in the bibliography or footnotes.

I haven’t updated our free Wikipedia citation feed. That maps articles to ISBNs, but the new data is work-based. If anyone wants to use the new data, let me know and I’ll tackle the problem. Cool as I think it would be, I haven’t seen any libraries adding Wikipedia links to their catalogs yet.

*The fact that its a new feed, and the somewhat fluid interactions between ISBN-based and work-based matching make it tricky to estimate, but it looks like a 200% increase.

Labels: 1


Leave a Reply