Archive for February, 2009

Saturday, February 28th, 2009

Rocky Mountain News: Final Edition

Another important newspaper dies.

Sure, models change and things are gained too. But things are also lost. Denver is definitely the worse for this. You’ve got to worry it’ll be publishers and libraries in ten years.*


*Both are suffering now—witness the recent HarperCollins layoffs and the Philadelphia closings, but Newspapers are in really deep trouble.

Labels: newspapers, print culture

Thursday, February 26th, 2009

Flash-Mob Cataloging

I’ve just created a group dedicated to Flash-Mob Cataloging. Flash-Mob Cataloging is when a horde of LibraryThing members descend on some small library with laptops and CueCat barcode scanners, catalog their books in LibraryThing, eat some pizza, talk some talk and leave them with a gleaming new LibraryThing catalog.

Why do it? There are many small libraries that use LibraryThing as their online catalog–museums, organizations, churches, schools, synagogues, temples, even some embassies! It’s an easy cheap solution to library automation. (More on organizational LibraryThing accounts here.) And having a flash-mob do the cataloging makes it easy and fun to do the data entry! Emphasis on the fun, trust me.

We’ve done two so far (Rhode Island Audubon Society and St. John’s Church in Beverly MA), to great success. Both were in New England because, well, that’s where the most LibraryThing employees are located. But the concept isn’t limited by location! Anyone can organize one–hence, the new Flash-Mob Cataloging group. So come join us and plan your own flash-mob event. We’ll help you get organized, blog it for you so you can get the word out, and we’ll even send you some CueCats, tshirts, and laptop stickers to give away.

Labels: flash-mob cataloging

Monday, February 23rd, 2009

Flash-mob cataloging: We did it!



We did it! Eighteen flash-mob catalogers descended upon the Audubon Society of Rhode Island and left having cataloged a wonderful 2,500-book library (available here).

I’ve posted my photos here. (UPDATE: link is here.) Jeremy has a nice blog post and some photos. Brian, the “Swiss Army Librarian,” posted his photos here.

For me the highlights were:

  • The diversity of people—LibraryThing nuts, local librarians, Audubon people.
  • The Audubon people were grateful, if a little stunned. Katya, who drove five hours to get there, floored them.
  • The Audubon library had its own bespoke classification system–I’m trying to get hold of it. They translated it to tags, which rebellious LibraryThingers added to as necessary (ie., no moths, pshaw!)
  • The couple—librarian, programmer—who competed to do the most books. The programmer won. How did he do it? “I pretended I was killing orcs.” With reference to multi-volume sets (echoing Gimli) “It only counts as one!”
  • It was great showing one retired librarian to cataloging books on LibraryThing and have him say “That’s it?”
  • The books were different. Our last flash-mob cataloging effort was for an Episcopal church, which had a lot of overlap with my library and interests. The Audubon Society shared only two of those books, and only one with me (The Diversity of Life). My dad’s (partial) library overlapped a lot more.
  • What do we make of the Personality of insects? Carl Sandburg also had a copy. But LCSH does not allow “Personality” to be so subdivided. Species-ists!
  • Most Legacy Libraries share no books. Darwin and Hemingway do, of course. And Walker Percy who has, I think, the best library of the Legacy Libraries, excepting maybe Jefferson.
  • As Jeremy points out in the notes, Audubon shares with Ian Flemming James Bond’s Birds of the West Indies. (Yes, that’s where he got the name.)
  • Again, Katya did all the “hard” cataloging, including two not in WorldCat.
  • Books with rulers. News to me.
  • Taxidermy animals. My son, Liam, should have been there.
  • Mike and I fixed bugs in real time–and pushing collections (again) by mistake. (We pushed a major speed-up for the Audubon library alone; I’ll be looking at extending it to all members.)

Next time we do this, we need to plan for a group-wide dinner/drinks afterward. With no group event, Mike, Jeremy, Katya and I headed to Cafe of India in Harvard Square for dinner, and a brief prowl of Harvard Book Store. Mike and I learned a lot, as usual. If librarianship were to be extinguished from the earth, I bet Jeremy and Katya could bring it back–with all the rigor it ever had (although it would be friendlier to tags).

Thanks to everyone who participated. You gave a day’s worth of your time, with only a CueCat and a t-shirt in return–and the knowledge that naturalists throughout Rhode Island will be able to search the Audubon library from home, something many public libraries in New England still don’t allow!

What’s next? With a church and an Audubon society under our belt, I want to do something different, like a historical society.* Katya and Jeremy both had good ideas there–something in Maine perhaps? Stay tuned!

Labels: Audubon Society, flash-mob cataloging

Monday, February 23rd, 2009

Classify your heart out

Here it is, the revised list of top level categories. These have been vetted by all of us for awhile and it’s time to start building subcategories. We’ve created threads in the Group to discuss the subcategories of each top level. Keep in mind that these need to be comprehensive, but not excessively granular. Take a look at this example of possible subcategories for PETS.

After more of the second levels are fleshed out, we plan to have a new classify-this feature to test out the classification system on books in LibraryThing.

Until then, classify and discuss!

Labels: open shelves classification, osc

Monday, February 23rd, 2009

uClassify contest winner

After some delay, I can announce that the LibraryThing/uClassify contest has been won by Kelly Vista—the only entrant, but a worthy one. (Kelly gets a copy of Programming Collective Intelligence and $100 from Amazon or IndieBound.) She described her “LibraryThing classifier” as follows:

“My goal was to create a classifier that would automatically “tag” any book description based on actual LibraryThing tags. For example, if you paste the book description for “Truman” into UClassify, it should return to you LibraryThing tags that suit the book. This is one step more general than one of [Tim’s] ideas (fiction vs. non-fiction).”

In my testing, it does a pretty good job of hitting the top tags. Pasted descrptions of Harry Potter give “young adult” and “children’s.” John Adams gives “american history” and “biography.” It’s not perfect—Adams is also labelled “young adult”—but the initial results are good and the whole point of uClassify is to enable accelerating accuracy.

uClassify seems to be growing apace. They recently opened up public classifications for external access, so I’ll be looking into automatic text-language classification of LibraryThing reviews.

Labels: kelly vista, uclassify

Monday, February 23rd, 2009

Research libraries clobber OCLC Policy

The Association of Research Libraries released its report on the new, now delayed OCLC Policy, and it’s a doozy—a forceful rejection of both the process and content of the Policy.

The full report makes for enjoyable reading—outside of Dublin, Ohio anyway. The task force members, research-library heavyweights all, fully and finally put to rest the notion that the only people bothered by OCLC’s power grab are open-data crazies and evil commercial companies.

There appears to have been a significant split. The majority felt it “desirable to have a policy that limits large-scale redistribution of records that could be harmful to the collective” and a minority did not. (It’s great to hear that a team of veterans had at least one member willing to reject the whole structure of cooperative-restriction!) But if the majority felt some policy was called for, they were apparently unanimous in condemning OCLC’s unilateral, non-consultative approach and concerned by a host of issues, large and small. Surveying the current Policy they urge a “fresh start.”

Vague legal language, unclear goals, worrying process, the split between the “nice” FAQs and the actual language of the Policy, issues of clouded ownership and responsibility for bibliographic data, termination provisions, the lack of respect for federal libraries and the legal impossibility of binding them without explicit renegotiation—it’s all here! There’s even a legal opinion, attached to the document, pouring cold water on the idea that the Policy will have any “downstream” effect on parties that haven’t explicitly agreed to it (ie., LibraryThing members). In all, a good drinking game could be invented—every time the ARL report validates or recapitulates a point made on this blog, and on other opponents‘, drink. (If you’re going to Code4Lib this week, I’ll buy the drinks!)

Most striking are the report’s vision of OCLC as a cooperative, and the ways the OCLC policy undermined that trust:

“The collective activity of shared cataloging is a source of deep pride and success in libraries in the U. S. and around the world. OCLC was created as, and is viewed as, a membership organization formed for the purpose of enabling this collective activity…. Members view WorldCat as a collective enterprise, not as a product that they license for use. …”

“The new Policy is clearly intended as a unilateral contract, unilaterally imposed on any entity using records from the WorldCat database, including member libraries…. The member community has seen the introduction of the new Policy as a fundamental change in the nature of the relationship between OCLC and its member libraries. In the eyes of the community, the guidelines expressed a mutual social contract, and the new Policy represents an authoritarian, unilaterally imposed legal restriction.”

Now let’s see what comes of this. OCLC has a needle to thread. The ARL report sets a high bar for consultation and consensus—higher than I think OCLC can reach without rethinking its whole communication model. And the core research-library concerns are serious*. I don’t think they can address them without failing to ensure what I believe to be the Policy’s true intent—establishing a permanent and lucrative data monopoly.

My prediction: Keep an eye on OCLC’s “regional service providers.” Various signs, including what reporters call “highly-placed sources” confirm that OCLC/regional tension is at an all-time-high, with OCLC increasingly rewriting the rules there too—selling directly to libraries in unprecedented ways. I think we can see in these moves a common historical pattern: when the structures that give a powerful institution strength start to weaken, it reaches for a new level of authority not based in the previous structure and therefore not susceptible to weakening. (In this case, OCLC is moving from a robust, often mediated cooperative to a unmediated, contractually-drawn licensure.) Sometimes the effort succeeds; sometimes the attempt crystalizes opposition and hastens and ensures the institution’s decline.


*Even if they picked the members of the Review Board, they may still face trouble from that direction. I doubt that OCLC’s Review Board has what the ARL board apparently had—members who apparently questioned the very idea of restricting access and use!—but all but one of the board members are academic/research librarians and can be expected to understand and appreciate the concerns raised by their ARL colleagues.

Labels: arl, oclc

Friday, February 20th, 2009

What do Ben Franklin and C.S. Lewis have in common?

Answer: They’re both on LibraryThing!

I’m pleased to announce the completion of Benjamin Franklin’s LT catalog. This project wouldn’t have been possible without the gracious permission of the American Philosophical Society and the Library Company of Philadelphia, the publishers of The Library of Benjamin Franklin (Philadelphia: American Philosophical Society, 2006). Not only have they made the book available via Google Books (here), but they also gave us permission to enter the data from it completely, including the wonderful and incredibly useful annotations by Edwin Wolf 2nd and Kevin Hayes, whose hard work and bibliographical sleuthing made the book possible in the first place.

On the LT end, thanks to pdxwoman, who got the project off the ground way back in January 2008, to hopeglidden and benjclark who cataloged portions of the collection, and to katya0133, who
entered a major chunk of the titles. I jumped in in November and worked to add more titles and augment the records by entering the annotations. We got on a roll in January; since the start of the year, Katya and I added 2,009 titles, ~800 of them in the last ten days.

You can browse the catalog here, read Franklin’s reviews, and check out his stats. Not surprisingly, he shares many titles with his other Early American comrades.

No sooner is one finished than another is begun, around here. I’ll be tackling the Virginia Georges next (Washington and Wythe) but BOB81 has taken on the task of heading up the creation of an LT catalog for C. S. Lewis, based on a listing created by the Marion E. Wade Center at Wheaton College. If you’re interested in helping out, sign up here.

[So far Lewis and Franklin only have one work in common, The Spectator. More to come, I’m sure.]

Labels: ben franklin, c.s. lewis, legacies

Friday, February 20th, 2009

Flash-mob cataloging tomorrow in Rhode Island

Join us tomorrow for the second Flash-Mob Cataloging Party, at the Audubon Society of Rhode Island in Smithfield, RI.

See the main post.

I’ll be driving some people from Boston tomorrow morning. If you want a ride—no guarantees—drop me an email (tim@librarything). I check it all the time.

Labels: flash-mob cataloging

Thursday, February 19th, 2009

Seeing parallels

Steve Lawson wrote this wonderful piece for his blog See also…, reprinted here (by permission) in full:

There is a large organization whose main business isn’t producing information, but instead hosting and aggregating information for many thousands of users on the web. Users upload content, and use the service to make that content public worldwide, and, likewise, to find other users’ content. Then one day the large organization decides to change the rules about how that information is shared, giving the organization more rights–to the point where it sounds to some people like the organization is trying to claim ownership of the users’ content, rather than simply hosting it and making it available on the web.

A small but vocal and influential group of users object to the policy change. The organization protests that it isn’t their intent to fundamentally change their relationship with their users and that legal documents tend to sound scarier than they really are. Most customers are either unaware or unconcerned by the change in policy, but the outcry continues until the organization backs down a bit, sticking with the old policy for the time being. The future, though, is up in the air.

Facebook? Or OCLC?

Perfect, just perfect.

Labels: facebook, oclc, open data, steve lawson

Monday, February 16th, 2009

Portland, not the other one!

American City Business Journals has named LibraryThing’s home town, Portland, ME as the 10th-best place to start a small business. Best of all, Portland beat “the other Portland.” (And did you know they were named after us?)

Three cheers for Portland. But at the risk of being ejected from the ranks of Portland, Maine’s tech startup community, I think that—wait, there’s no local startup community to be ejected from! There’s LibraryThing. There’s Foneshow (two guys?) and that’s about it! What businesses are they talking about anyway?

This city has grown on me. It’s scenic, quirky and cheap. My wife and I think we can find both the right school and the right house, and avoid some of the craziness of Boston. But the business climate here leaves a lot to be desired, especially if you aren’t in tourism.

American City Business Journals must be talking about some industry I’m not in, with very different inputs. For a tech startup the labor market is a train wreck—way too small and illiquid. Even if you could hire them, the people are wrong. There aren’t any top-notch universities spitting smart young hackers out into the local community.* And there are too many people who want “quality of life,” which is great if you can get it, but hard-driving companies want hard-driving employees.** As Paul Graham wrote, ambition is a big city phenomenon. New Yorkers want to get richer. Cambridge people smarter. I still don’t quite understand what Portland people want. Smart, ambitious people tend to leave Maine—it’s a big problem.***

I’m sorry for the harsh tone of this post, but I generally don’t hide my feelings. Do you run a local small business? A local tech business? Send me a comment and I’ll buy you lunch. As we both know, there are some amazing places to eat around here.


*There are, it’s true, more local tech people that it seems at first. But, like Alexandria, they’re mostly “in” not “of” Portland—Bostonians who moved to Portland and still service Boston-area clients.
**That comment will no doubt draw objections. But nobody with knowledge of the community in Cambridge or the Valley work can dispute it. Startups work because people make them their lives. Any anyway, when startup people aren’t working, they want to hang out with other driven people.
***Back in 2003, a study concluded that “half of the state’s college graduates in 1998 wanted to live and work in Maine, but three of four ultimately left.” Subsidizing Maine graduates who stay in Maine probably helps, but it’s not the answer.

Photo by PhilipC, from Wikimedia Common (link).

Labels: maine, portland

Sunday, February 15th, 2009

Can your Kindle read to you?

The new Kindle apparently can “read out loud”—that is speech-synthesize—its books. Paul Aiken, director of the Author’s Guild, told the Wall Street Journal they can’t do that:

“They don’t have the right to read a book out loud. … That’s an audio right, which is derivative under copyright law.”

Renowned (and Newbery) author Neil Gaiman begs to differ:

“When you buy a book, you’re also buying the right to read it aloud, have it read to you by anyone, read it to your children on long car trips, record yourself reading it and send that to your girlfriend etc. This is the same kind of thing, only without the ability to do the voices properly, and no-one’s going to confuse it with an audiobook.”

My opinion. Gaiman is right on the way it should work. The Kindle, with its DRM model, undermines what Gaiman got from “buying” a physical book, but it’s certainly strange to imagine people can own a piece of text free and clear, but not be allowed to run a program that reads it aloud.

On the legal grounds, however, I fear Aiken might be right. As a rule authors grant publishers highly specific rights. These limits generally include countries, copies, covers, formats and timeframes. That’s one reason eBooks took so long to take off—a million contracts needed to fly here and there before publishers could sell their books in the new format.

Anticipating future media is hard. My favorite passage in Shelley’s Prometheus Unbound (okay, the only passage I remember from that deeply weird work*), predicts a world of freedom in which

[L]ovely apparitions…
Shall visit us the progeny immortal
Of Painting, Sculpture, and rapt Poesy,
And arts, though unimagined, yet to be.

In the real world, I fear, “arts, though unimagined, yet to be,” require a contract addendum.


*The passage made it into the LibraryThing terms of use. I love my job.

Labels: drm, kindle, neil gaiman, rights

Sunday, February 15th, 2009

Why Wirral? One partial explanation.

A recent article in the Telegraph describes a worrying fall-off in library books and library usage in the UK.

Over the past six years books in public libraries in the UK have fallen 12%, from 116 million to 103.2 million. Library check-outs have fallen faster—16.5%. According to the Telegraph, UK librarians are bracing for another round of declining numbers, coming amid budget shortfalls across the board—and expecting to get their budgets slashed.

Reflecting on these problems, the CEO of the Museums, Libraries and Archives Council (MLA) told the Telegraph:

“[W]e live in an age where books can be bought cheaply from supermarkets or the internet so the reasons to visit a library have changed for many users.”

Wirral as a microcosm. Cuts have started. The Wirral council system in NW England (LibraryThing Local), is closing 11 of 24 branches.

They sure don’t deserve it. Taking a look at the Wirral Libraries website, anyone can see they’re doing a lot of things right. The branches look well-organized and inviting. They’ve got a fair number of computers and free Wifi. They have a special outreach program for the house-bound. They even lend toys!*

But they are doing one thing very wrong—namely that Wirral, like most libraries, isn’t really “on” the web.

People are finding things in supermarkets and the internet because it’s easy to do so. On the internet, one-stop shopping means that a huge panaply of useful and interesting things are available from a single, unified and well-understood interface—from local bars, to local bands, to some 600 pizza and 400 curry joints in the area (Man, I love Britain!). Many of these resources are not only in Google searches, but Google will plot them on a map for your convenience.

What isn’t online are library books! The Wirral Libraries’ catalog, a Talis Prism OPAC, hardly registers in Google, which knows only 7,000 pages, from a library with more than 300,000 items. Worse, virtually every Wirral page in Google is broken. On the right are a representative sample of what Google knows about from the Wirral catalog. Each link has the same title. And each links to an expired session that proclaims:

You can, of course, get to the Wirral Libraries catalog if you know that’s where you want to go—fifth link down, then the top rounded button on the right. That’s not the same thing.

And even if you find a book, you can’t bookmark it for yourself or forward it to a friend–the links will die off in a few minutes. In refusing to allow links and spider, the Wirral website sets itself apart from the other websites Wirral residents might use. The rest of the web just works—it’s in your search box, where most internet-aware people do most of their information finding.

Lastly, where is WorldCat in all this, the “switching mechanism” and “point of concentration” (Karen Calhoun) OCLC provides libraries as an alternative to the “lunacy” (Roy Tennant) of libraries being on the web for themselves? Nowhere. None of the Wirral Libraries are in it, and WorldCat doesn’t list a copy of Harry Potter in the Deathly Hallows closer than 60 miles away (postal code: CH46 6DE?). One may speculate that Wirral wasn’t willing to pay for the service, which anyway gets quite insignificant traffic.***

Who’s to blame? Wirral Libraries’ misfortunes are no doubt many, and not being part of the web is not the largest. But it’s a part. Wirral citizens aren’t seeing their library appear in their search results. They aren’t as aware of its riches as they might otherwise be. If they were aware, it’s likely they’d use these resources more, and the system would be easier to defend politically.

It won’t do to blame Wirral for this. Library vendors have long handicapped their products in this way, and Wirral Libraries surely bought their Talis Prism system a while ago.** Budgets are short—and getting shorter. Both the web and this recession have hit libraries by surprise.

But refusing to participate in the central information technology of the age has its costs. And the leaders of Libraryland who advocated and continue to advocate for closed solutions, closed data and staying out of search indexes—except as “negotiated” with Google—have contributed to this situation. The respected guides have taken libraries off the great river of information, and left them grounded on the shore. Now someone’s coming for the boat.

I hope the residents of Wirral fight like hell to keep their libraries open. Then they should fight like hell to make their libraries truly open.


*I don’t know how common this is in Britain. I get the sense it’s not too common in the US, but it happens. The Hingham Public Library in Hingham, MA lends practically everything, from toys to paintings on the wall.
**It’s ironic that Wirral’s OPAC was made by Talis, now one of the more progressive and forwarding thinking library vendors. I’ll put this in a footnote to avoid “shilling,” but if Wirral can get a new OPAC, I’ll arrange for them to get LibraryThing for Libraries for free until they get back most of their funding. Maybe Talis would kick in an incentive to upgrade their OPAC?
***WorldCat is supposed to be the central website of Libraryland, but third-tier websites like LibraryThing and Dogster—the social network for dog lovers!—are currently beating it.

Labels: indexing, oclc, riverine metaphors, web, wirral, wirral libraries

Tuesday, February 10th, 2009

Flash-Mob Cataloging Party: Rhode Island Audubon Society


It’s time for another cataloging flash-mob*! This time we’re heading to the Audubon Society of Rhode Island to add their small lending collection to LibraryThing.

LibraryThing members can help catalog around 2,000 items at the beautiful Powder Mill Ledges Wildlife Refuge, where I’m told we can take a nice walk for a break if the weather cooperates.

Need a little motivation? Read about our previous flash mob cataloging party in November here.
* The LibraryThing wiki page for the event.
* The day: Saturday, February 21st.
* The time: TBD, probably 10:00 a.m. till 4:00 p.m., but come whenever you’re able.
* The place: Rhode Island Audubon Society Powder Mill Ledges Wildlife Refuge, 12 Sanderson Road, Smithfield, RI (Google map)
* Lunch will be provided by the Audubon Society

RSVP to sonya (at) librarything.com.

*What’s a flash mob?

Labels: Audubon Society, cataloging, flash-mob cataloging, party, Rhode Island, RI

Monday, February 9th, 2009

February Early Reviewer books

The February batch of Early Reviewer books is up! We’ve got 68 books this month, and a grand total of 1760 copies to give out.

(Not enough books to choose from, you say? Check out our new Member Giveaway program as well. Member Giveaways is like Early Reviewers, but isn’t limited to select publishers–any author or member can post books! We launched it last week, and currently there are 285 copies of 70 books being given away. Combined with the February batch of Early Reviewer books, that’s 2045 copies of 138 different books available right now!)

First, make sure to sign up (one sign up for both Early Reviewers and Member Giveaways). If you’ve already signed up, please check your mailing address and make sure it’s correct. Then request away!

The list of available Early Reviewer books is here:
http://www.librarything.com/er/list

The deadline to request a copy from the February batch of Early Reviewers books is Wednesday, February 25th at 6PM EST.

Eligiblity: Publishers do things country-by-country. This month’s batch of Early Reviewer books has publishers who can send books to the US, Canada, the UK, Israel, Australia, France and Germany. Make sure to check the flags by each book to see if it can be sent to your country.

Thanks to all the publishers participating this month!

St. Martin’s Press Candlewick Henry Holt and Company
Tyndale House Publishers Crossway Bethany House
Beacon Press Springboard Press Doubleday Canada
Hyperion Books Firecrest Books PublicAffairs
Other Press DiaMedica HarperCollins
Faber and Faber Harper Bond Street Books
DK Publishing Demos Medical Publishing Orbit Books
Picador Grand Central Publishing W.W. Norton
Springer The Permanent Press Hampton Roads Publishing Company
Shambhala MSI Press Pomegranate
Atria Books Curbstone Press St. Martin’s Minotaur
Andrews McMeel Publishing Ballantine Books Orca Book Publishers
Tarcher B&H Publishing Group

Labels: early reviewers, LTER

Monday, February 9th, 2009

Open Shelves Classification Update

Hello! Well we have been busy since Tim announced the classify-this feature. The OSC group has been extremely active with over 300+ posts about the top level categories (not to mention insightful threads popping up to discuss second level categories). Thank you for your feedback! Meanwhile, at the Midwinter meeting of the American Library Association we were able to have a really valuable face-to-face conversation with LibraryThing users.

We have been processing all your feedback and working on version 2.0 of the top level categories. Before we get to that, we wanted to let everyone know that we do read all the posts in the Open Shelves Classification group. Because of the high quantity of posts (and our day jobs) we cannot comment or respond individually as often as we would like.

Some key points after discussion, feedback and analysis:

The number of categories in the top level. As decided last summer, we will have more rather than fewer top level categories. The top levels are not supposed to represent an even distribution of all possible branches of knowledge. Instead, the OSC top levels should represent the largest categories that public libraries will want to use. [Similar to how Library of Congress classification was built to meet the needs of the Library of Congress, while Dewey’s system tried to contain all recorded knowledge.]

Complaints about specific topics in the top level. Remember, there is no value judgment in a topic being placed at the top level or underneath a broader topic. For now, topics like Pets, Gardening, and True Crime are present because of feedback from public librarians that these are heavily requested books that are often pulled out into their own sections. As a guiding principle, the OSC will be statistically tested, so some of our top level categories may change as actual libraries begin to reclassify their collections.

-The nature of classification. Any classification system forces us to choose one topic for the book, even though that book may be about more than one topic. This is not a flaw in the OSC categories but in the nature of classification. Libraries will still use multiple subject headings in the catalog to capture all the topical aspects of the work.

Facets. As talked about a few months ago, we currently plan on the top level categories being only topical while other aspects of the work will be represented by facets. For example, format will be captured in a separate facet. [And to clear up any lingering confusion, Comics will be a format facet.] Another facet talked about was audience. This means children’s books will be tagged in the audience facet. We envision that these facets will be optional and libraries can use them if, for example, they want to pull out all the comics and shelve them in a unique section. Alternatively, the facet could be ignored and then graphic novels would be intershelved with other like topics. Here is a picture of what we are envisioning:
Classification versus Signage. The top levels categories have nothing to do with
signage. This is particularly true with children’s books, which can be grouped/displayed as the library desires (e.g. picture books, infants, board books, etc.).

We will posting an updated version of the top levels very soon, so stay tuned!

Labels: open shelves classification, osc

Monday, February 9th, 2009

One million facts

“Now, what I want is, Facts. Teach these boys and girls nothing but Facts. Facts alone are wanted in life. Plant nothing else, and root out everything else. You can only form the minds of reasoning animals upon Facts: nothing else will ever be of any service to them. This is the principle on which I bring up my own children, and this is the principle on which I bring up these children. Stick to Facts, sir!” — first paragraph of DickensHard Times

Three cheers for LibraryThing’s dilligent members. Our Common Knowledge system has hit 1,000,000 member contributions.

Common Knowledge is an innovative “fielded wiki” for book information—collaborative, piecemeal “cataloging” of information about books and authors. We created it back in October 2007—Chris did most of the coding—and it has exceeded our expectations.

The focus is on things not found anywhere else—not cataloged by librarians or publishers. The system’s biggest strength is probably is series coverage, 26,890 and counting. More comprehensive than paid series data, it is also often of higher quality. There is surely no library in the world that accounts for the Star Wars series (plural) better than what LibraryThing members have assembled! Common Knowledge also tracks some 8,860 awards, from the Wolfson History Prize to Nestlé Smarties Book Prize.

Fun, if not quite as full, are lists of 78 books with Lincoln in them, and 23 with Emma Goldman and Puck. Almost 1,700 books take place in New York, 90 in Mars and 49 in Hell. Some 626 authors went to Harvard, three were gas station attendants and four were burried in Uppsala Cathedral. No doubt, there are more of all, but the data is starting to really pile up—a confirmation that Social Cataloging is no joke.

Wherever Common Knowledge goes, it will not be locked up. All Common Knowledge data is free for reuse outside the site, with a handy API as well.

Picking up. The one-millionth entry came early. Edits picked up dramatically when, ten days ago, I introduced a Dead or Alive? page for every member, allowing you to find out how your authors break down on the living/dead scale. They went through the roof when I introduced a similar Male or Female? page. CK also attracted some interest from the initial release of distinct authors—a method for distinguishing between distinct, homonymous authors. (It was a busy weekend.)

The one-millionth Common Knowledge entry was added at 6:47pm (EST) by ladybug1983, who assigned the contemporary romance Taking the Heat as the third book in the series O’Neil Family.

Hey LadyBug, want a t-shirt?

Labels: common knowledge, social cataloging

Sunday, February 8th, 2009

Male or Female?

I’ve added a new meme page for “Male of Female?” (see yours or mine).

The page is similar to Dead or Alive?. It’s based on our Common Knowledge, an editable, fielded wiki for author and work information. So if someone shows up under “Uncertain” you can edit in the right gender.

This feature is, of course, frosting. The cake was released Saturday: Introducing Distinct Authors. Check that out.

Labels: authors, new feature, new features

Saturday, February 7th, 2009

Distinct authors, phase 1 / Steve Martin is funny again

Short version. I’ve added a mechanism to “split” distinct authors with the same name. You can find it on the right of any author page, under “Author Disambiguation.” The feature is only partially rolled-out, without separate pages for distinct authors or other rammifications for the LibraryThing system.

Long version. Since its inception, LibraryThing has been plagued by the “Steve Martin” problem. We all know Steve Martin, the comic and author of Shop Girl. But what about Steve Martin the author of Britain’s Slave Trade, Sold! How to Make it Easy for People to Buy from You or some book about Newfoundland ships. Why was the original wild-and-crazy-guy writing such evidently unfunny books—or who were these other people?

The problem is deep in the data. Libraries have a system for disambiguating authors, called Authority Control, based on coming up with authorized forms of a name and adding dates and other metadata to make them unique, and then applying these forms across the books. Authority control is a good idea—if often problematic to implement—but it falls down in the face of LibraryThing’s data. Libraries don’t coordinate their authority control as much as you’d think, and LibraryThing draws from almost 700 libraries. And even if authority control worked in libraries, 90% of LibraryThing content comes from other sources, mostly Amazon. This data has no concept of authority control. (See Steve Martin at Amazon, for example.)

In solving the problem, I decided to ignore how libraries solved the issue and concentrate on how LibraryThing could do it most easily. Authority control requires librarians to assemble data (eg., birth and death dates) about name variants before a split is made. (Thus was born librarians’ unfortunate policy of putting out hits on individuals they could not otherwise distinguish.*) Although LibraryThing members have done an amazing job finding birth and death dates, it was still a lot of work. And a full authority-control solution would have members updating each other’s records with the “authorized” forms of the names!

I felt a better way could be found. Instead of establishing unique names and pushing them to records, members could split works arbitrarily, and the authors would come to be known by the name they share and the works that cluster under them. This is actually an old system—calling someone “the author of Ivanhoe” or “the one who wrote the Parthian history.” And, as with other features of LibraryThing cataloging, it accords with how regular people talk about. In a real-world situation, like a meeting of Newfoundland commedians, you wouldn’t refer to “Martin, Steve, 1945-” and “Martin, Steve, 1947-” but “Steve Martin, you know, the one who wrote Shopgirl” and “Steve Martin, the one who wrote that book about that boat.”

How it works. To split an author, find the area on the right labelled “Author Disambiguation.” It will take you to a splitting page; here’s Steve Martin’s. This page allows you to assign all the author’s works to numbers. As you assign the works, LibraryThing assigns separate colors, making it easy to see at a glance how the thing is going.

More to do. This is just a first step. The “distinct authors” feature has to “go” all sorts of places on the site. First up will be separate pages for distinct authors–and a “disambiguation page” (a la Wikipedia) tying them together. Once that’s done we can move to separate author metadata, such as Common Knowledge, bettween distinct authors.

Quite frankly, I’m going to do a few more things and then let this sit for a while. My main focus right now—and Chris’—is to see “collections” to the finish line. When I realized I could bang out the first phase of distinct authors in a long evening (it’s after 5am now), I went ahead and did it. But now I need to refocus on collections.

Talk about it. I’ve set up a New features post to discuss the change, and its potential rammifications. I suspect that the Combiners! group will get in on the act quickly as well, working out various technical issues. They have a number of threads (here, here and here, at least), in which members have made lists of “identically named authors.” They would be a good starting-point.


*The hits are, of course, carried out by OCLC.

Labels: distinct authors, new feature, new features

Friday, February 6th, 2009

Facebook in reality

Labels: humor, social networking

Thursday, February 5th, 2009

Random tags for scholars

Someone asked me to come up with a page of truly random tags for an academic project that needed to assess typical tagging. It might prove interesting to other students and scholars doing projects on LibraryThing.

Here’s the page.

It’s an HTML page, not an XML feed or other such format. Techies will scoff, but I’ve been asked for a lot of data like this, particularly from MLS students. The people who can easily parse XML in programming languages are not generally writing graduate school papers.

Labels: folksonomy, tagging

Tuesday, February 3rd, 2009

Member Giveaways: Early Reviewers for everyone

We’ve just introduced a major new feature: Member Giveaway, a simple but flexible way for authors to get review copies into readers’ hands, and other members to clean out their attics!

Member Giveaway is built on top of our Early Reviewers program, which invites publishers to send LibraryThing members pre-publication copies of upcoming books. It has been a huge success, often giving out more than 1,500 books per month. But Early Reviewers has strict rules on participating, quantity and release dates, to keep up quality and encourage publishers to send out as many copies as they could spare.

Member Giveaway differs from Early Reviewers in a couple of ways:

  • Any LibraryThing member can participate.
  • There are no quantity restrictions. You can post a single book or a hundred.
  • Books do not need to be pre-release or even new.
  • Members are encouraged to review Giveaway books, but not reviewing them cannot hurt you.
  • Giveaway selection is random, not based on a similar-books algorithm. To discourage sockpuppetry, requesting members must have cataloged at least fifty books or be a premium (ie., paid) member.
  • Early Reviewers has a bird, but Member Giveaways uses squirrels. As you know, squirrels are lovely, sociable animals who share books readily.

Some other fun details:

  • If you’ve signed up for Early Reviewers, you are ready for Member Giveaways. The two programs have the same sign-up.
  • When you post a book you have a lot of options, including length of time it will last and where you’re willing to send it.
  • The sending member is responsible for all shipping. If you request and receive a book, the sending member will get your shipping address.

We made Member Giveaway for authors who couldn’t get their publisher to sign on to Early Reviewers, couldn’t get enough copies together or whose book was already out. (Early Reviewers also does not allow most self-published works, which has angered a few members, but both publishers and members reacted strongly when we included self-published books before.)

Publishers and authors aside we wanted to give regular members a chance to send good books to good homes. We have long pondered whether LibraryThing should enable book-swaps. But our friends at BookMooch do that so well already, and swapping is very hard to get right. But many members still wanted a simple way to get their old books to new homes. So, we set up a system to do that too.

We’ve started Member Giveaways off with seven great books.

Cancer is a Bitch and Beef were offered by my friend Larry Weissman, literary agent to both authors.

Released this Fall, both have already drawn great reviews from LibraryThing members and others. LibraryThing member skrishna wrote of Cancer Is a Bitch: “It’s funny, witty, sarcastic and will have you laughing out loud. Read this book. That’s all I really have left to say.” Of Beef, a microhistory in the tradition of Salt, the Boston Globe praised its “bovine evolution is riveting stuff.” Eats.com called it an “eloquent, poignant and influential account of man’s historical relationship with the cow.”

The other five books all come from a single member, keigu, Robin D. Gill, of Paraverse Press, which promises bilingual books “at a monolingual price.”

The books consist of Japanese text and English translations of hundreds or thousands of short Japanese poems—haiku and senryu on various topics. The publisher, who is also the author, sent LibraryThing a huge box some time ago, in anticipation of such a program. Abby and I, custodians of the books for so long never got around to reading them, but we will sorely miss people’s reactions at finding tall stacks of The Woman Without a Hole and Rise Ye, Sea Slugs!.

Three cheers for Mike! Memeber Giveaways was developed by Mike Bannister (LTMike) after I rather blithly tossed out the idea of opening Early Reviewers to everyone on a separate page. It took a while, but i is a beautiful, and solid piece of code.

Its completion frees Mike up to concentrate on Facebook full time, while Chris and me (but my programming time is somewhat hobbled by everythin else I do) continue work on collections.

Come talk about it here.

Labels: early reviewers, new feature, new features

Tuesday, February 3rd, 2009

Microsoft Songsmith: The only blog post you need to read.

The ascending hilarity around Microsoft Songsmith is a bit far from our usual topics here. I could attempt to connect it to social networks, open data and virtues like experiment, remixability and authenticity, but I think I should just shut up and let you enjoy three videos on the topic.

1. The Microsoft Songsmith promo video. It makes me want to turn myself inside out like a slug in beer. What does it do to you?

Hat-top: TechCrunch.

2. White Wedding, redone in Microsoft Songsmith.

Hat-tip: Mashable, with many more. “Beat-it” is also wonderful.

3. Economic Failure Medley by Microsoft Songsmith. Melodies from stock charts, proving yet again how the web can spin gold from tin.

Hat-tip: Mashable

Labels: mashups, microsoft, microsoft songsmith, remixability, songsmith

Monday, February 2nd, 2009

OSC gets the once-over at ALA in Denver

As most of you know, back in July the Open Shelves Classification was conceived as a free, crowdsourced alternative to the Dewey Decimal System. The Group has been very active during initial development, and the top levels are being heatedly debated.

David, Tim and I held an OSC open-discussion at the American Library Association (ALA) conference in Denver. A great group of people participated in a lively debate about the project.

To summarize:

There was some room confusion with the Marriott and, unfortunately, many people left before it was all figured out.

10 people attended: Tim, Laena, David, a mix of public librarians, academic librarians, and one interested non-librarian. The librarians were catalogers, reference librarians, and one library director.

Comments during the meeting included:

The random works feature is not that useful because half of all the works are fiction and fiction is not broken out at the top level.

If a public library may reasonably want to aggregate at a certain level (e.g. fiction or science) then it should exist as a top level. No one aggregates at non-fiction, hence it is not useful.

Working on the second level for fiction should happen sooner rather than later.

Children’s books are a challenge.

Perhaps using an audience facet would help (for example, CH, YA)?

  • Yes, but the topics of some books are hard to determine. Should they be put in fiction? If so, a scope note is needed.
  • Speaking of which, there is no good way of dealing with series when written by separate authors, like Spongebob Books.

How should series be handled in a classification?

The Darien library is reorganizing their collection, particularly children’s books, in interesting ways (here, you can listen to Gretchen Hams tell you all about it).

For OSC to be successful, it must be easy to implement for public libraries.

It must be inexpensive to go from DDC-OSC.

A crosswalk is essential!

  • There needs to be a way to determine how much space is needed ahead of time to move the books around.
  • It must be easy to print labels.
  • Backstage Library Works was a company that moved Duke University Libraries from DDC to LC, so there must be models out there on how to do this.

An audience facet would be a good way to handle reading level as well, either by grade or age.

  • Example: 0-1, 1-3, 9-10, etc.
  • There is a tension between having too many optional facets and universality.

The facets need to transcend stickering, the current practice in most public libraries.

We need a reality check before getting to far down the road with proposed schedules for OSC– will it work in an actual library?

  • We could upload a library’s MARC records into LT and try it there virtually before asking a library to use it.
  • Two potential public libraries were listed as testing grounds.

So far, the top levels testing on LibraryThing has provided the following results:

  • 56641 acts of classification
  • By 1000+ users
  • On 22,000+ works

What are the biggest tags in LibraryThing, can we use those to determine the levels?

  • They were looked at and evaluated, hence True Crime is a top level.
  • This can’t really be done in an automated way.

What is the product plan for OSC?

  • The data is open source & free.
  • If people want to package services around the data (such as reclassifying books for you), then that is a possibility, but we do not see this developing for at least a year or so.

What does “shelf-ready” mean?

  • A vendor puts on labels, dust jackets, tattle tape, creates catalog records for a public library.
  • Different people at the meeting had differing levels of success with outsourcing their books to be made shelf-ready by vendors.

Is bleed over between categories in OSC a bug or a feature?

  • Memoirs/Autobiographies was seen as a bug.
  • Others such as Pets/ScienceAnimals were not seen that way.

Putting categories in an order may help people’s confusion of where to put things.

  • This is called “flow” in bookstores.
  • E.g. Cooking—Health—Sports or Biography—History—Poly Sci

Confusion arose over facets.

  • You add and delete depending on the libraries needs.
  • Huge collection? Use them all. Small and only need Science and Religion? Go ahead, the system is flexible.

The top level testing will stop and the levels will begin to be re-worked this week.

How should Art, Architecture, Design, and Photography be handled?

  • After much discussion, the consensus was reached that Art, Architecture, and Design should be separate top level categories, but that Photography would go under Art.

The first test round has been closed. Visit the Open Shelves Classification group for details.

Meeting like this was great and very helpful in making OSC usable. Another meeting is planned in New York for early April–we’ll keep you posted!

Labels: open shelves classification, osc

Sunday, February 1st, 2009

Right this way for the homophily, sir.

Over on the main blog I posted a long-ish blog post on “homophily” and serendipity. I should have posted it on Thingology instead (the main blog tends to focus on feature development and other, more concrete issues). But people have commented on it and made links, so I can’t move it.

Check out the post here.

Labels: homophily, serendipity

Sunday, February 1st, 2009

The evil 3.26%

The question has arisen of why I advocate against OCLC’s attempt to monopolize library data. Roy Tennant of OCLC, an intelligent, likeable man whom, although we disagree on some issues, has done more for libraries than most, accused me of writing and talking about the issue because:

“… your entire business model is built on the fact that you can use catalog records for free that others created and not contribute anything back unless they pay (yes, there is a limited set of data available via an API, but then they need the chops to do something with it).”

Fair enough. Let’s look at the numbers, and the argument.

I did a comprehensive analysis, available here as a text file, with both output and PHP code. If anyone doubts it, send me an email and I’ll let run the SQL queries yourself.

The numbers. As of 6:17pm Sunday, some 3.5 years after LibraryThing began, our members have added 35,831,904 books from 690 sources:

  • 85.48% came from bookstore data (almost exclusively Amazon).
  • 4.88% were entered manually by members
  • 9.63% were drawn from library sources

Now, where did that 9.63% come from?

These sources were in every case free and open Z39.50 connections our members accessed through us. Very frequently they accessed records of their own academic institution, but in any case, these members accessed these records alongside everyone else—libraries, museums, public agencies of one sort or another and all the students and scholars who use RefWorks, EndNote and other such services. Meanwhile LibraryThing has never been asked to stop accessing a source. On the contrary, libraries frequently ask to include themselves on our list of sources.

Of the 9.63%, by far the largest source is the US Library of Congress, the source of 2,203,182 books, or 6.15% of the total. The Library of Congress is a Federal organization, created for the benefit of the country and falling under the government-wide rule that public work is for the benefit of the public, and cannot be copyrighted or otherwise “owned.” As long as technology was there the Library of Congress has allowed access to its cataloging data; the OCLC policy change will not affect that.* We are grateful the Library of Congress does this. But insofar as we are taxpayers and support American notion of public ownership of public resources, I will not apologize for it. (On the contrary, I feel that OCLC should apologize for attempting to restrict and profit from public work.)

3.26%. That leaves 3.48%—more appropriately 3.26%**—the evil sliver upon which our “entire business model is built.” Take a look at the top fifteen here:

  • Koninklijke Bibliotheek — 130,406 books (0.36%)
  • National Library of Scotland — 80,826 books (0.23%)
  • British Library (powered by Talis) — 80,205 books (0.22%)
  • Gemeinsamer Bibliotheksverbund (GBV) — 77190 books (0.21%)
  • National Library of Australia — 72,896 books (0.2%)
  • Helsinki Metropolitan Libraries : 70,551 books (0.2%)
  • The Royal Library of Sweden (LIBRIS) : 63,430 books (0.18%)
  • Italian National Library Service : 60,643 books (0.17%)
  • Vlaamse Centrale Catalogus : 58,936 books (0.16%)
  • LIBRIS, svenska forskningsbibliotek — 54,339 books (0.15%)
  • ILCSO (Illinois Libraries) : 28,517 books (0.08%)
  • Yale University : 26,885 books (0.08%)
  • Det kongelige Bibliotek : 24,564 books (0.07%)
  • University of California : 20,098 books (0.06%)
  • Bibliotek.dk : 19,628 books (0.05%)

With 690 possible sources, it’s a long, long tail. We take 2087 from the Russian State Library, 1067 records from the Magyar Országos Közös Katalógus, 286 from Princeton, 106 from Koç (in Izmir), 63 from Hong Kong Baptist, 4 from the Universidad Pública de Navarra, etc.

It should be apparent to anyone looking at the above that the 3.26% is largely about satisfying the needs of foreign LibraryThing members–a small percentage of our membership and hardly central to our “business model.” Equally clear is the government orientation of the list—only one, Yale—is a private institution. The rest are all government agencies. Of course, no records actually came from OCLC itself!

All-in-all, library data from non-federal sources is a negligible component of LibraryThing’s content. LibraryThing is not some big plot to capture library records. That idea is simply not in the figures.

Do we give back? What of the second half of the accusation, that we “not contribute anything back unless they pay” and the bit against APIs.

First, assuming Roy means LibraryThing data generally, it’s absurd to suggest that because LibraryThing draws 3.26% of its data from free, unlicensed sources, our members’ data and services are owned by OCLC or its members. OCLC no more owns members’ tags and reviews on bibliographic metadata than Saudi Aramco owns the furniture I bring home in my car. Who in their right mind would every accept a list of titles and authors from a library, if that meant ceding ownership over what you think about the book?

LibraryThing and OCLC both have terms. But LibraryThing license terms are unlike OCLC’s in a number of ways. LibraryThing members knew what they’re getting, unlike OCLC members, who thought they were sharing with other libraries, but find themselves the lynchpin of a monopoly. From our inception LibraryThing has reserved a right to sell aggregate or anonymized data. We also sell some reviews—giving members the option to deny them to us. All our member data is non-exclusively licensed, so members can do anything they want with it outside of LibraryThing, and members can leave at any time. Neither is true of OCLC members’ data under the Policy.

Cataloging data. That leaves LibraryThing cataloging data, of which we have three types. We don’t have any legal responsibility to make it free, but we do so anyway.

First, we would be happy to offer downloads of original or modified MARC records! We haven’t done so in order to avoid attracting a suit from OCLC. But perhaps we were mistaken. If OCLC would like us to start releasing our MARC records to others, someone should let us know. We will release them under the same terms they were given to us—freely.

Second, our Common Knowledge cataloging (series, awards, characters, etc.) is free and available to all. We can’t think of a better way to provide it other than through an API, but we’re all ears if Roy knows of a better way. And if OCLC would like to admit it to WorldCat, without subverting its always-free license, they don’t even need our permission. Go on, OCLC, make my day!

Thirdly, there’s ThingISBN, which was directly patterned on OCLC’s xISBN service. Despite Roy’s criticism, they are identical in format and delivery so if there’s something wrong with its XML APIs, OCLC has only itself to blame. Indeed the only difference is cost: ThingISBN is completely free, both as an API and as a feed; xISBN, which member data creates, is sold back to members.


Stop killing the messenger. It’s time for OCLC to recognize they made this mess, not others. They have perpetrated some astouding missteps—from attempting to sneak through a major rewrite of the core member policy in a few days without consultation, to a comic series of rewrites and policy reversals, culminating in withdrawing the policy entirely for discussion. (It now seems clear they did so on the heels of a member revolt, whether general or just of some key libraries.)

It’s also important to see that, before OCLC started threatening companies and non-profits doing interesting but non-competing things with book data—notably LibLime, Open Library and LibraryThing—they had none of the problems they have now. Now, by attempting to control all book data, they’ve spurred the creation of LibLime’s ‡Biblios system, a free, free-data alternative to OCLC and, well, sent me, Aaron Swartz of Open Library and dozens of prominent library bloggers into orbit.

Being caught so flat-footed can’t feel nice. It must be hard feeling like royalty and discovering your subjects think themselves a confederacy. But this is no time for OCLC to start attacking the credibility of its opponents. Surely LibraryThing is an unusual case—a company that has an opinionated, crusading—okay, loud—president. But the thousands of librarians and other individuals who supported our calls, or raised other objections to the OCLC policy are not less well-motivated than OCLC and its employees. They do not love libraries less. They are, rather, concerned that OCLC’s urge to control library metadata threatens longstanding library traditions of sharing, and sets libraries on a path of narrowness and restriction that will surely prove no benefit in this increasingly open, connected world.


*I need to write a blog post on this, but I was recently informed that whatever changes OCLC makes cannot touch federal libraries without explicit authorization. That is, federal law does recognize clauses like “if you continue to use” or “we can change this at any time.”
** It should more accurately be 3.48%, because we are getting our British Library records through Talis, who have a contract with the British Library.

Labels: oclc

Sunday, February 1st, 2009

The Guardian on homophily

From Ethan Zuckerman’s blog post.

The Guardian (UK) yesterday carried a wonderful column by Oliver Burkeman, “This column will change your life” on a topic dear to our heart—and mentioning LibraryThing to boot.

The topic is “homophily,” the “faintly depressing human tendency to seek out and spend time with those most similar to us.” Homophily informs whom we spent time with and filters our understanding of the wider world. As the author writes, his American friends were sure Obama was going to win:

“[T]hey hadn’t met one person—not one!—who planned on voting Republican. They were right about the outcome, of course. But 58m people voted against Obama; it was just that you didn’t run into them in the coffee shops of Brooklyn.”

Quoting the Harvard sociologist Ethan Zuckerman that “Homophily causes ignorance,” Burkeman adds that it tends to make people more extreme. The internet can increase the effect, allowing dittoheads of various persuasions to “exist almost entirely within a feedback loop shaped by your own preferences.”*

Burkeman closes by recommending the LibraryThing Unsuggester:

“You don’t need technology to do that, but then again, technology needn’t be the enemy: Facebook could easily offer a list of the People You’re Least Likely To Know; imagine what that could do for cross-cultural understanding. And I love the Unsuggester, a feature of the books site LibraryThing.com: enter a book you’ve recently read, and it’ll provide a list of titles least likely to appear alongside it on other people’s bookshelves. Tell it you’re a fan of Kant’s Critique Of Pure Reason, and it’ll suggest you read Confessions Of A Shopaholic by Sophie Kinsella. And maybe you should.”

The topic is interesting to me from a number of different angles. First, as a social network that works largely through shared reading, LibraryThing gets the upside of homophily and is subject to the downside too.

Second, with Zuckerman, I’ve fascinated by the notion of serendipity, of “surprising someone helpfully.” As I’ve argued to library audiences in the past, both Amazon-style collaborative filtering and contemporary library catalogs are bad at serendipity—worse, in some ways, than browsing physical shelves can be. As Zuckerman notes, the somewhat mechanical process of subject assignment can break through the “flocking together” tendency of collaborative filtering. But I bet there are better ways too. Is a true “serendipity algorithm” possible?

Third, my own experience is characterized by some rather vexed homophily issues. Zuckerman mentions “02138” at one point, no doubt baffling some internet listeners. It is, of course, the zipcode of Harvard and much of west Cambridge, where I grew up and spent most of my life. A popular t-shirt (I own one) proclaims “02138: The World’s Most Opinionated Zip Code,”** but there can be no mistaking that opinions largely go one way. Growing up in Cambridge, and attending a certain private school, taught me that respect for diversity was at the center of human virtues—something I still agree with—but that everyone had houses filled with books***, Volvo was the nation’s most popular automaker, that large families and stay-at-home mothers were suspect, that religion was for mental defectives, that Mondale was going to win in 1984, and so forth. In a very real way Cambridge taught me how to think—and I’ve spent the rest of my life thinking through what to keep and what to chuck.

For more on this topic, check out:


*David Weinberger has a very good reply somewhere—in Everything is Miscellaneous?—where David argues (as I recall) that this is an unrealistic notion. Conversations happen because of shared ground. I shall avoid thumbnailing any more because I shall surely get it wrong.
**See Flickr user Nabeel_H for the motto on a window, allegedly quoting the NYT. 02138 is now also the title of a Magazine for Harvard Alumni (see it). As a lifelong resident of 02138, but not a Harvard Alumnus, I am considerably irritated that four-years residence in that second-rate sausage factory gives people the right to claim my zipcode.
***Certain books, mind you. I am a great connoisseur of Cambridge bookshelves.

Labels: amazon, ethan zuckerman, homophily, social networking