Saturday, June 16th, 2012

The Top Books on Wikipedia

We just re-analyzed all of English Wikipedia, looking for citations to LibraryThing works (using ISBNs, OCLC numbers, title and authors, etc.) We found over 1.1 million citations to 412,000 LibraryThing works(1).

Work pages have a “References” section, showing all the links. These have been improved. Links now go to a lightbox, showing all the other pages that link to the page, and of course linking to Wikipedia. I’ve also added a Zeitgeist: Wikipedia page, showing the most frequently cited works and entries.

Here are the top 100 works, by Wikipedia citations, a catalog of pop-songs, decorated German soldiers, politicians, and… fungi!

25 Top-Cited books on Wikipedia

  1. Guinness World Records: British Hit Singles and Albums by David Roberts (3,889 citations)
  2. Ritterkreuzträger 1939-1945 by Veit Scherzer (2,317 citations)
  3. The Billboard Book of Top 40 Country Hits by Joel Whitburn (1,908 citations)
  4. British parliamentary election results 1832-1885 by Fred W. S. Craig (1,747 citations)
  5. Air Force Combat Units of World War II by Maurer Maurer (1,548 citations)
  6. Dictionary of the Fungi by Paul M Kirk (1,464 citations)
  7. The Text of the New Testament an Introduction to the Critical Editions and to the Theory and Practice of Modern Textual by Kurt Aland (1,276 citations)
  8. Andrews’ Diseases of the Skin: Clinical Dermatology (James, Andrew’s Disease of the Skin) by William D. James MD (1,247 citations)
  9. Jane’s Encyclopedia of Aviation: Revised Edition (Jane’s Encyclopedia of Aviation) by Studio Editions Ltd. (1,186 citations)
  10. Elections in Europe : a data handbook by Dieter Nohlen (1,174 citations)
  11. New Zealand mollusca: Marine, land, and freshwater shells by A. W. B. Powell (1,135 citations)
  12. A Plain Introduction to the Criticism of the New Testament (2 volumes) by Frederick Henry Ambrose Scrivener (1,129 citations)
  13. Joel Whitburn Presents Hot Country Songs 1944 to 2008 by Joel Whitburn (1,114 citations)
  14. Handbook of British Chronology (Royal Historical Society Guides and Handbooks) by E. B. Pryde (1,040 citations)
  15. The Directory of Railway Stations: Details Every Public and Private Passenger Station, Halt, Platform and Stopping Place by R. V. J. Butt (1,020 citations)
  16. Port Vale Personalities: A Biographical Dictionary of Players, Officials and Supporters by Jeff Kent (949 citations)
  17. Wrestling Title Histories by Royal Duncan (944 citations)
  18. Dermatology (2 Volume Set) by Jean L. Bolognia (938 citations)
  19. British Parliamentary Election Results by F. W. S. Craig (911 citations)
  20. Civil War High Commands by John Eicher (879 citations)
  21. The encyclopedia of AFL footballers by Russell Holmesby (877 citations)
  22. The Dinosauria by David B. Weishampel (860 citations)
  23. Conway’s All the World’s Fighting Ships: 1906-1921 (Conway’s All the World’s Fighting Ships, Vol 2) by Randal Gray (817 citations)
  24. Kurzgefasste Liste der griechischen Handschriften des Neuen Testaments by Kurt Aland (803 citations)
  25. The Ship of the Line: The Development of the Battlefleet, 1650-1850 (The Ship of the line) by Brian Lavery (778 citations)
  26. Australian Chart Book 1970-1992 by David Kent (760 citations)
  27. Football League Players’ Records 1888 to 1939 by Michael A Joyce (747 citations)
  28. Conway’s All the World’s Fighting Ships, 1860-1905 by Robert Gardiner (738 citations)
  29. Japan Encyclopedia (Harvard University Press Reference Library) by Louis Frédéric (736 citations)
  30. The Profile Method for Classifying and Evaluating Manuscript Evidence by Frederik Wisse (730 citations)
  31. Snake Species of the World: A Taxonomic and Geographic Reference, Volume 1 by Roy W. McDiarmid (711 citations)
  32. British Warships in the Age of Sail 1793-1817: Design, Construction, Careers and Fates by Rif Winfield (647 citations)
  33. The Great Rock Discography by Martin C. Strong (644 citations)
  34. The New Grove Dictionary of Music and Musicians (20 Volume Set) by Stanley Sadie (636 citations)
  35. The Book of Golden Discs by Joseph Murrells (633 citations)
  36. The science-fantasy publishers: A critical and bibliographic history by Jack L. Chalker (571 citations)
  37. Air Force combat wings : lineage and honors histories, 1947-1977 by Charles A. Ravenstein (570 citations)
  38. The Concise Oxford Chronology of English Literature by Michael Cox (559 citations)
  39. Encyclopedia of Minor League Baseball: The Official Record of Minor League Baseball by Lloyd Johnson (548 citations)
  40. The Oxford Dictionary of Byzantium (3-Volume Set) by Alexander P. Kazhdan (541 citations)
  41. The Birth of the Palestinian Refugee Problem by Benny Morris (530 citations)
  42. CRC Handbook: Chemistry & Physics by David R. Lide (525 citations)
  43. Birmingham City by Tony Matthews (519 citations)
  44. All That Remains: The Palestinian Villages Occupied and Depopulated by Israel in 1948 by Walid Khalidi (517 citations)
  45. The PFA Premier & Football League players’ records, 1946-2005 by Barry J. Hugman (512 citations)
  46. The Oxford Companion to Wine by Jancis Robinson (491 citations)
  47. Mammal species of the world : a taxonomic and geographic reference by Don E. Wilson (478 citations)
  48. DC Comics Year by Year: A Visual Chronicle by Daniel Wallace (478 citations)
  49. The DC Comics Encyclopedia, Updated and Expanded Edition by Michael Teitelbaum (477 citations)
  50. The New Grove Dictionary of Opera : A-D by Stanley Sadie (476 citations)
  51. Fields of Praise by David B. Smith (474 citations)
  52. The Great Indie Discography by Martin C. Strong (467 citations)
  53. Irish Kings & High Kings: Irish Kings and High Kings (Four Courts History Classics) by F. J. Byrne (460 citations)
  54. The Canadian directory of Parliament, 1867-1967 by James K. Johnson (449 citations)
  55. Oxfordshire by Jennifer Sherwood (443 citations)
  56. The Empire Ships: A Record of British-Built and Acquired Merchant Ships During the Second World War by W. H. Mitchell (437 citations)
  57. Chronology of British History by Alan Palmer (423 citations)
  58. The Encyclopedia of Science Fiction and Fantasy Through 1968: A Bibliographic Survey of the Fields of Science Fiction, F by Donald Henry Tuck (414 citations)
  59. Enzyklopädie des deutschen Ligafußballs 7. Vereinslexikon by Hardy Grüne (406 citations)
  60. Oregon Geographic Names by Lewis A. McArthur (406 citations)
  61. Michigan Place Names by Walter Romig (402 citations)
  62. Squadrons of the Royal Air Force and Commonwealth, 1918-88 by James J. Halley (395 citations)
  63. Who’s Who 2008: 160th annual edition (Who’s Who) by A&C Black (392 citations)
  64. Dictionary of Minor Planet Names by Lutz D. Schmadel (392 citations)
  65. The Norton/Grove Concise Encyclopedia of Music by Stanley Sadie (385 citations)
  66. World Encyclopaedia of Aero Engines by Bill Gunston (363 citations)
  67. Cheshire: The Buildings of England (Pevsner Architectural Guides) by Clare Hartwell (357 citations)
  68. The Clements Checklist of Birds of the World by James F. Clements (357 citations)
  69. Collins guide to the sea fishes of New Zealand by Tony Ayling (354 citations)
  70. The Illustrated Encyclopedia of Dinosaurs and Prehistoric Animals by Douglas Palmer (353 citations)
  71. Fitzpatrick’s Dermatology in General Medicine (2 Volume Set) by Irwin M. Freedberg (352 citations)
  72. Ohio Atlas & Gazetteer by DeLorme Publishing (349 citations)
  73. The Complete Encyclopedia of World Aircraft by David Donald (346 citations)
  74. NFL 2001 Record and Fact Book by National Football League (338 citations)
  75. The Men Who Made Gillingham Football Club by Roger Triggs (336 citations)
  76. The Greenhill Napoleonic Wars Data Book by Digby Smith (335 citations)
  77. The Billboard Book of Top 40 Hits by Joel Whitburn (334 citations)
  78. Warships of the Imperial Japanese Navy, 1869-1945 by Hansgeorg Jentschura (329 citations)
  79. The Book of Sydney Suburbs by Gerald Healy (327 citations)
  80. The Complete Directory to Prime Time Network and Cable TV Shows 1946 – Present, Eighth Edition by Tim Brooks (327 citations)
  81. The New Rolling Stone Album Guide: Completely Revised and Updated 4th Edition by Nathan Brackett (327 citations)
  82. Chronicle of Gods and Sovereigns by Chikafusa Kitabatake (327 citations)
  83. Total Television: A Comprehensive Guide to Programming from 1948 to the Present by Alex McNeil (322 citations)
  84. Birds of Venezuela (Princeton Paperbacks) by Steven L. Hilty (322 citations)
  85. The Complete Book of Fighters: An Illustrated Encyclopedia of Every Fighter Aircraft Built and Flown by William Green (320 citations)
  86. Heroic Worlds: A History and Guide to Role Playing Games by Lawrence Schick (317 citations)
  87. Cassell’s Chronology of World History: Dates, Events and Ideas That Made History by Hywel Williams (316 citations)
  88. The New Penguin Opera Guide (Penguin Reference Books) by Amanda Holden (314 citations)
  89. The Oxford Companion to Chess by David Hooper (307 citations)
  90. 328 Outstanding Japanese Photographers by Tokyo Metropolitan Museum of Photography (306 citations)
  91. Conway’s All the World’s Fighting Ships 1922-1946 by Roger Chesneau (303 citations)
  92. U.S. Submarines Through 1945: An Illustrated Design History by Norman Friedman (302 citations)
  93. Blackpool: A Complete Record, 1887-1992 by Roy Calley (298 citations)
  94. The Welsh Academy Encyclopaedia of Wales by John Davies (296 citations)
  95. Historic Spots in California by Mildred Brooke Hoover (295 citations)
  96. The Virgin Encyclopedia of Reggae (Virgin Encyclopedias of Popular Music) by Colin Larkin (295 citations)
  97. Indie Hits: The Complete UK Independent Charts 1980-1989 by Barry Lazell (292 citations)
  98. Encyclopedia of Stoke City 1868-1994 by Tony Matthews (291 citations)
  99. Billboard’s Hot Dance/Disco 1974-2003 by Joel Whitburn (290 citations)
  100. The New Grove Dictionary of Opera: 4 volumes by Stanley Sadie (289 citations)

See the rest on Zeitgeist: Wikipedia page.

1. This is a 120% more than 2009.

Wednesday, October 10th, 2007

Common Knowledge: Social cataloging arrives

Chris has just released Common Knowledge, the innovative, open-data and insanely addictive “fielded wiki” we’ve been talking about for a month.

Common Knowledge adds fields to every author and work, like:

  • Author: Places of residence, Awards and honors, Agent
  • Work: Important places, Character names, Publisher’s editor, Description

All-told there are fourteen fields. But Common Knowledge is less a set of fields than a structure for adding fields to LibraryThing. Adding more fields is almost trivial, and they can be added to anything existing or planned—from tags and subjects, to bookstores and publishers. They can even be added to other Common Knowledge fields, so that, for example, agents and editors can, in the future, sport photos and contact information.* This can lead to, as Chris puts it, “nearly infinite cross-linking of data.”

Common Knowledge works like a wiki. Any member can add information, and any member can edit or revert edits. All fields are global, not personal. Common Knowledge diverges from a standard wiki insofar as each field works like its own independent wiki page, with a separate edit history.

Some example:

  • Jonathan Strange and Mr. Norrell. I’ve been conservative with characters and places. (See Longitude, worked on by Chris for the opposite approach.) But I wish I had her editor!
  • The history page for “important places” in Jonathan Strange and Mr. Norrell, showing improvement over time.
  • David Weinberger. Half-filled. He mentions his agent, but I can’t tree his major at Bucknell and the honors section is empty.
  • Hugo Award Winners. This is going to get very cool.
  • The global history page. Mesmerizing.

Right now we’re basically slapping fields on pages, but this structure is built for reuse. The license is also built for reuse. We’re not asking members to help us create a repository of saleable, private data. Whatever you add to Common Knowledge falls under a Creative Commons Attribution license. So long as you include a short notice (eg., “Powered by the LibraryThing community”), you can do almost anything you want with the data—take it, change it, remix it, give it to others. You can even sell it, if someone will buy it. Regular people, bookstores, libraries–even our competitors–are free to use it. We’ll be adding APIs to get it out there all the more. Go crazy, people.**

Common Knowledge isn’t the answer to everything. Some data, like web links, requires a more structured approach; some, like our “work” titles, works best when it “bubbles up” from user data; and some, like page counts, have yet to be extracted from the MARC and ONIX information we have. But the possibilities are great. Series information? Blurbers? Cover designers? Books about an author? Tag notes? Other classification schemes?*** Bookstore locations? Publicists? Venues? Book fairs? Pets? Pets’ vacination dates?

Anyway, we’ve done our thinking, but this is the ultimate member-input feature. We’re going to have to figure it out together. Fields will need to be added (and removed?). Rules will be debated, formatting discussed. Although the base is solid, the feature set is still skeletal.****

Go ahead and play. Chris, John and I spent the evening playing with it, and we guarantee it’s addictive. Or talk about. Leave a note here. I’ve also changed the WikiThing group into a Common Knowledge and WikiThing group. I’ve started a first-reactions topic and another for bug reports.

Why I’m excited. LibraryThing means a lot of things to a lot of people. Some come for the cataloging, some for the social aspect. A lot come for what happens between those two poles. As I see it, Common Knowledge is the perfect LibraryThing feature. I don’t mean it’s good; I mean it’s in tune with what makes LibraryThing work. It’s social, sure, but it’s based in data. It’s not private cataloging and it’s not MySpace-like “friending.”

LibraryThing is sometimes called a “social cataloging” site. When I used this term at the American Library Association, it became an unintentional laugh line. Social cataloging sounded impossible and funny, like feline water-skiing. This more than anything else got me fired up about doing this. True “social cataloging”; it was an idea that had to be tried!*****

Details, acknowledgements and caveats. Common Knowledge is deeply unstructured. This is going to give some members hives! Names aren’t in first-middle-last format, but free text. You can enter places however you want. We’ve arranged some careful “hint” text, and fields have a terrific “autocomplete” feature, but we’re not validating data and returning hostile error messages. We’re aiming for accessibility and reach, not perfection. This is Wikipedia, not the Library of Congress. It scares us too, but we’re also excited.

Abby, Casey, Chris and I planned this feature during the Week of Code. We worked through the issues together, and Casey, Chris and I all wrote the initial code. When we broke up, the rest of the coding and the interface design all fell to Chris. Although it was a team effort, this is really his feature. I’m very pleased with what he did with it.

We decided to work on this (and on our standard wiki, WikiThing, which grew out of it) because it was an ideal project for the entire group to tackle. This jumped it past collections. I still think this was a good idea, but there has certainly been some grumbling. We heard you. Collections is next on our list, with nothing new in between.

*So far we have only three data types—radio buttons (gender), long fields (book descriptions and author disambiguations) and short fields (everything else).
**Competitors who use it might want to stop asserting copyright over everything posted to their site. This was legally bogus already, but it certainly would conflict with a Creative Commons license… Incidentally, we haven’t decided whether to go with CC-Attribution Share-and-Share-Alike or straight CC-Attribution (discussion here), but it’s going to be one or the other.
***This particular one may happen very soon.
****And yes, we can discuss the whole radio-buttons-for-gender topic. See here, here. I’m of the opinion that two genders plus maybe “unknown” and “n/a” (for Nyarlathotep?) are the best you can get without consensus-splitting disagreement. You’ll note we aren’t including other potentially-contentious fields, like sexual orientation or religion.
*****In conception, Common Knowledge most closely resembles the Open Library Project, the Internet Archive‘s incipent effort to “wikify” the library catalog. Open Library is also a “fielded wiki,” based on Aaron Schwartz’s superior Infogami platform. You’ll notice that we’ve mostly steered clear of the “traditional” cataloging fields that Open Library is starting from. We do cataloging differently, and we don’t want to duplicate effort. Anyway, we’re hoping they and others mash up the two data sets, and others.

Monday, September 10th, 2007

WikiThing: A wiki for LibraryThing

We’ve had the whole team up in Portland, ME, getting to know each other, brainstorming, planning and working on projects. We chose two projects to work on all together. We wanted something that could engage the talents of the whole team.

The first release is WikiThing*, a full-featured wiki for LibraryThing. A wiki is, of course, “a collaborative website which can be directly edited by anyone.” You can use them for lots of things. Wikipedia is an encyclopedia. DiscourseDB tracks published opinion pieces. So what’s WikiThing for?

We’re not sure! But we’re kicking it off with:

  • FAQ. We’ve put our static Frequently Asked Questions pages up on the wiki, where everyone (including us) can edit them. If it works out, we’ll get rid of the static pages, or reduce them to a few questions, and link to WikiThing.
  • Help. We’ve got a few Help pages that aren’t FAQ pages.
  • Bug tracking. This was a tough one. We do not want to move all bug conversations to the wiki. Bug tracking can seem like a simple record, but it is generally a conversation, with questions and answers back and forth. Feature requests are even more so. At the same time, a simple list of bugs, with links to Talk posts, could be a big help for everyone.

What do you want to do with it? Leave a note here or on the Talk: New Features post about ThingWiki.

How do I do it? Editing is super easy. Just go to a wiki page and click the “edit” link at the top, or one of the “edit” links by a section.

WikiThing is based on the MediaWiki engine, the same software that runs Wikipedia. So, if you know how to edit Wikipedia, you know how to edit WikiThing. If you don’t, it’s easy to learn. Mostly you just type. If you need to do something fancy, like insert a link, we have a Wiki help. If you screw up, don’t worry. Someone else will come along and fix it.

What about a “content” wiki? We thought long and hard about having a “content wiki.” A content wiki would have wiki pages for all works, authors and so forth. It would cover often-requested fields, like the year of original publication for a work and series information, and hitherto unrequested ones, like the name of the acquiring/literary editor. Members would be able to edit them and the edits would get picked up and put on work and author pages.

After a lot of thought and experimentation we decided that MediaWiki wasn’t the right tool for the job**. We needed a true “fielded wiki.” We looked at options like Aaron Swartz‘s Python-based Infogami, which also runs Open Library.****

In the end, we decided to do it ourself, and it turned out easier than we thought.

We’ve got one more day together, and plan to make the most of it. Whether we can finish it up today or now, we should get it out this week.

*I was overuled on the name. I wanted ThingWiki, in keeping with ThingISBN, ThingTitle and so forth. Casey and Chris** were against it.
**The individual formerly known as “Christopher” (ConceptDawg) shall henceforth be known as “Chris.” Although friends call him Chris, we were calling him Christopher because we also had a Chris (Chris Gann), but Chris Gann is long gone, and Chris—the Christopher Chris—wants his name back! Who’s on first?
***We also decided that tools like Semantic MediaWiki and WikiForms weren’t there yet.
****Since Infogami runs ThingDB—yes, he used the name first—we were thinking of calling our product ThingGami!

