Archive for February, 2009

Monday, February 9th, 2009

Open Shelves Classification Update

Hello! Well we have been busy since Tim announced the classify-this feature. The OSC group has been extremely active with over 300+ posts about the top level categories (not to mention insightful threads popping up to discuss second level categories). Thank you for your feedback! Meanwhile, at the Midwinter meeting of the American Library Association we were able to have a really valuable face-to-face conversation with LibraryThing users.

We have been processing all your feedback and working on version 2.0 of the top level categories. Before we get to that, we wanted to let everyone know that we do read all the posts in the Open Shelves Classification group. Because of the high quantity of posts (and our day jobs) we cannot comment or respond individually as often as we would like.

Some key points after discussion, feedback and analysis:

The number of categories in the top level. As decided last summer, we will have more rather than fewer top level categories. The top levels are not supposed to represent an even distribution of all possible branches of knowledge. Instead, the OSC top levels should represent the largest categories that public libraries will want to use. [Similar to how Library of Congress classification was built to meet the needs of the Library of Congress, while Dewey’s system tried to contain all recorded knowledge.]

Complaints about specific topics in the top level. Remember, there is no value judgment in a topic being placed at the top level or underneath a broader topic. For now, topics like Pets, Gardening, and True Crime are present because of feedback from public librarians that these are heavily requested books that are often pulled out into their own sections. As a guiding principle, the OSC will be statistically tested, so some of our top level categories may change as actual libraries begin to reclassify their collections.

-The nature of classification. Any classification system forces us to choose one topic for the book, even though that book may be about more than one topic. This is not a flaw in the OSC categories but in the nature of classification. Libraries will still use multiple subject headings in the catalog to capture all the topical aspects of the work.

Facets. As talked about a few months ago, we currently plan on the top level categories being only topical while other aspects of the work will be represented by facets. For example, format will be captured in a separate facet. [And to clear up any lingering confusion, Comics will be a format facet.] Another facet talked about was audience. This means children’s books will be tagged in the audience facet. We envision that these facets will be optional and libraries can use them if, for example, they want to pull out all the comics and shelve them in a unique section. Alternatively, the facet could be ignored and then graphic novels would be intershelved with other like topics. Here is a picture of what we are envisioning:
Classification versus Signage. The top levels categories have nothing to do with
signage. This is particularly true with children’s books, which can be grouped/displayed as the library desires (e.g. picture books, infants, board books, etc.).

We will posting an updated version of the top levels very soon, so stay tuned!

Labels: Open Shelves Classification, OSC

Friday, February 6th, 2009

Facebook in reality

Labels: humor, social networking

Thursday, February 5th, 2009

Random tags for scholars

Someone asked me to come up with a page of truly random tags for an academic project that needed to assess typical tagging. It might prove interesting to other students and scholars doing projects on LibraryThing.

Here’s the page.

It’s an HTML page, not an XML feed or other such format. Techies will scoff, but I’ve been asked for a lot of data like this, particularly from MLS students. The people who can easily parse XML in programming languages are not generally writing graduate school papers.

Labels: folksonomy, tags

Tuesday, February 3rd, 2009

Microsoft Songsmith: The only blog post you need to read.

The ascending hilarity around Microsoft Songsmith is a bit far from our usual topics here. I could attempt to connect it to social networks, open data and virtues like experiment, remixability and authenticity, but I think I should just shut up and let you enjoy three videos on the topic.

1. The Microsoft Songsmith promo video. It makes me want to turn myself inside out like a slug in beer. What does it do to you?

Hat-top: TechCrunch.

2. White Wedding, redone in Microsoft Songsmith.

Hat-tip: Mashable, with many more. “Beat-it” is also wonderful.

3. Economic Failure Medley by Microsoft Songsmith. Melodies from stock charts, proving yet again how the web can spin gold from tin.

Hat-tip: Mashable

Labels: mashups, microsoft, microsoft songsmith, remixability, songsmith

Monday, February 2nd, 2009

OSC gets the once-over at ALA in Denver

As most of you know, back in July the Open Shelves Classification was conceived as a free, crowdsourced alternative to the Dewey Decimal System. The Group has been very active during initial development, and the top levels are being heatedly debated.

David, Tim and I held an OSC open-discussion at the American Library Association (ALA) conference in Denver. A great group of people participated in a lively debate about the project.

To summarize:

There was some room confusion with the Marriott and, unfortunately, many people left before it was all figured out.

10 people attended: Tim, Laena, David, a mix of public librarians, academic librarians, and one interested non-librarian. The librarians were catalogers, reference librarians, and one library director.

Comments during the meeting included:

The random works feature is not that useful because half of all the works are fiction and fiction is not broken out at the top level.

If a public library may reasonably want to aggregate at a certain level (e.g. fiction or science) then it should exist as a top level. No one aggregates at non-fiction, hence it is not useful.

Working on the second level for fiction should happen sooner rather than later.

Children’s books are a challenge.

Perhaps using an audience facet would help (for example, CH, YA)?

  • Yes, but the topics of some books are hard to determine. Should they be put in fiction? If so, a scope note is needed.
  • Speaking of which, there is no good way of dealing with series when written by separate authors, like Spongebob Books.

How should series be handled in a classification?

The Darien library is reorganizing their collection, particularly children’s books, in interesting ways (here, you can listen to Gretchen Hams tell you all about it).

For OSC to be successful, it must be easy to implement for public libraries.

It must be inexpensive to go from DDC-OSC.

A crosswalk is essential!

  • There needs to be a way to determine how much space is needed ahead of time to move the books around.
  • It must be easy to print labels.
  • Backstage Library Works was a company that moved Duke University Libraries from DDC to LC, so there must be models out there on how to do this.

An audience facet would be a good way to handle reading level as well, either by grade or age.

  • Example: 0-1, 1-3, 9-10, etc.
  • There is a tension between having too many optional facets and universality.

The facets need to transcend stickering, the current practice in most public libraries.

We need a reality check before getting to far down the road with proposed schedules for OSC– will it work in an actual library?

  • We could upload a library’s MARC records into LT and try it there virtually before asking a library to use it.
  • Two potential public libraries were listed as testing grounds.

So far, the top levels testing on LibraryThing has provided the following results:

  • 56641 acts of classification
  • By 1000+ users
  • On 22,000+ works

What are the biggest tags in LibraryThing, can we use those to determine the levels?

  • They were looked at and evaluated, hence True Crime is a top level.
  • This can’t really be done in an automated way.

What is the product plan for OSC?

  • The data is open source & free.
  • If people want to package services around the data (such as reclassifying books for you), then that is a possibility, but we do not see this developing for at least a year or so.

What does “shelf-ready” mean?

  • A vendor puts on labels, dust jackets, tattle tape, creates catalog records for a public library.
  • Different people at the meeting had differing levels of success with outsourcing their books to be made shelf-ready by vendors.

Is bleed over between categories in OSC a bug or a feature?

  • Memoirs/Autobiographies was seen as a bug.
  • Others such as Pets/ScienceAnimals were not seen that way.

Putting categories in an order may help people’s confusion of where to put things.

  • This is called “flow” in bookstores.
  • E.g. Cooking—Health—Sports or Biography—History—Poly Sci

Confusion arose over facets.

  • You add and delete depending on the libraries needs.
  • Huge collection? Use them all. Small and only need Science and Religion? Go ahead, the system is flexible.

The top level testing will stop and the levels will begin to be re-worked this week.

How should Art, Architecture, Design, and Photography be handled?

  • After much discussion, the consensus was reached that Art, Architecture, and Design should be separate top level categories, but that Photography would go under Art.

The first test round has been closed. Visit the Open Shelves Classification group for details.

Meeting like this was great and very helpful in making OSC usable. Another meeting is planned in New York for early April–we’ll keep you posted!

Labels: Open Shelves Classification, OSC

Sunday, February 1st, 2009

Right this way for the homophily, sir.

Over on the main blog I posted a long-ish blog post on “homophily” and serendipity. I should have posted it on Thingology instead (the main blog tends to focus on feature development and other, more concrete issues). But people have commented on it and made links, so I can’t move it.

Check out the post here.

Labels: homophily, serendipity

Sunday, February 1st, 2009

The evil 3.26%

The question has arisen of why I advocate against OCLC’s attempt to monopolize library data. Roy Tennant of OCLC, an intelligent, likeable man whom, although we disagree on some issues, has done more for libraries than most, accused me of writing and talking about the issue because:

“… your entire business model is built on the fact that you can use catalog records for free that others created and not contribute anything back unless they pay (yes, there is a limited set of data available via an API, but then they need the chops to do something with it).”

Fair enough. Let’s look at the numbers, and the argument.

I did a comprehensive analysis, available here as a text file, with both output and PHP code. If anyone doubts it, send me an email and I’ll let run the SQL queries yourself.

The numbers. As of 6:17pm Sunday, some 3.5 years after LibraryThing began, our members have added 35,831,904 books from 690 sources:

  • 85.48% came from bookstore data (almost exclusively Amazon).
  • 4.88% were entered manually by members
  • 9.63% were drawn from library sources

Now, where did that 9.63% come from?

These sources were in every case free and open Z39.50 connections our members accessed through us. Very frequently they accessed records of their own academic institution, but in any case, these members accessed these records alongside everyone else—libraries, museums, public agencies of one sort or another and all the students and scholars who use RefWorks, EndNote and other such services. Meanwhile LibraryThing has never been asked to stop accessing a source. On the contrary, libraries frequently ask to include themselves on our list of sources.

Of the 9.63%, by far the largest source is the US Library of Congress, the source of 2,203,182 books, or 6.15% of the total. The Library of Congress is a Federal organization, created for the benefit of the country and falling under the government-wide rule that public work is for the benefit of the public, and cannot be copyrighted or otherwise “owned.” As long as technology was there the Library of Congress has allowed access to its cataloging data; the OCLC policy change will not affect that.* We are grateful the Library of Congress does this. But insofar as we are taxpayers and support American notion of public ownership of public resources, I will not apologize for it. (On the contrary, I feel that OCLC should apologize for attempting to restrict and profit from public work.)

3.26%. That leaves 3.48%—more appropriately 3.26%**—the evil sliver upon which our “entire business model is built.” Take a look at the top fifteen here:

  • Koninklijke Bibliotheek — 130,406 books (0.36%)
  • National Library of Scotland — 80,826 books (0.23%)
  • British Library (powered by Talis) — 80,205 books (0.22%)
  • Gemeinsamer Bibliotheksverbund (GBV) — 77190 books (0.21%)
  • National Library of Australia — 72,896 books (0.2%)
  • Helsinki Metropolitan Libraries : 70,551 books (0.2%)
  • The Royal Library of Sweden (LIBRIS) : 63,430 books (0.18%)
  • Italian National Library Service : 60,643 books (0.17%)
  • Vlaamse Centrale Catalogus : 58,936 books (0.16%)
  • LIBRIS, svenska forskningsbibliotek — 54,339 books (0.15%)
  • ILCSO (Illinois Libraries) : 28,517 books (0.08%)
  • Yale University : 26,885 books (0.08%)
  • Det kongelige Bibliotek : 24,564 books (0.07%)
  • University of California : 20,098 books (0.06%)
  • Bibliotek.dk : 19,628 books (0.05%)

With 690 possible sources, it’s a long, long tail. We take 2087 from the Russian State Library, 1067 records from the Magyar Országos Közös Katalógus, 286 from Princeton, 106 from Koç (in Izmir), 63 from Hong Kong Baptist, 4 from the Universidad Pública de Navarra, etc.

It should be apparent to anyone looking at the above that the 3.26% is largely about satisfying the needs of foreign LibraryThing members–a small percentage of our membership and hardly central to our “business model.” Equally clear is the government orientation of the list—only one, Yale—is a private institution. The rest are all government agencies. Of course, no records actually came from OCLC itself!

All-in-all, library data from non-federal sources is a negligible component of LibraryThing’s content. LibraryThing is not some big plot to capture library records. That idea is simply not in the figures.

Do we give back? What of the second half of the accusation, that we “not contribute anything back unless they pay” and the bit against APIs.

First, assuming Roy means LibraryThing data generally, it’s absurd to suggest that because LibraryThing draws 3.26% of its data from free, unlicensed sources, our members’ data and services are owned by OCLC or its members. OCLC no more owns members’ tags and reviews on bibliographic metadata than Saudi Aramco owns the furniture I bring home in my car. Who in their right mind would every accept a list of titles and authors from a library, if that meant ceding ownership over what you think about the book?

LibraryThing and OCLC both have terms. But LibraryThing license terms are unlike OCLC’s in a number of ways. LibraryThing members knew what they’re getting, unlike OCLC members, who thought they were sharing with other libraries, but find themselves the lynchpin of a monopoly. From our inception LibraryThing has reserved a right to sell aggregate or anonymized data. We also sell some reviews—giving members the option to deny them to us. All our member data is non-exclusively licensed, so members can do anything they want with it outside of LibraryThing, and members can leave at any time. Neither is true of OCLC members’ data under the Policy.

Cataloging data. That leaves LibraryThing cataloging data, of which we have three types. We don’t have any legal responsibility to make it free, but we do so anyway.

First, we would be happy to offer downloads of original or modified MARC records! We haven’t done so in order to avoid attracting a suit from OCLC. But perhaps we were mistaken. If OCLC would like us to start releasing our MARC records to others, someone should let us know. We will release them under the same terms they were given to us—freely.

Second, our Common Knowledge cataloging (series, awards, characters, etc.) is free and available to all. We can’t think of a better way to provide it other than through an API, but we’re all ears if Roy knows of a better way. And if OCLC would like to admit it to WorldCat, without subverting its always-free license, they don’t even need our permission. Go on, OCLC, make my day!

Thirdly, there’s ThingISBN, which was directly patterned on OCLC’s xISBN service. Despite Roy’s criticism, they are identical in format and delivery so if there’s something wrong with its XML APIs, OCLC has only itself to blame. Indeed the only difference is cost: ThingISBN is completely free, both as an API and as a feed; xISBN, which member data creates, is sold back to members.


Stop killing the messenger. It’s time for OCLC to recognize they made this mess, not others. They have perpetrated some astouding missteps—from attempting to sneak through a major rewrite of the core member policy in a few days without consultation, to a comic series of rewrites and policy reversals, culminating in withdrawing the policy entirely for discussion. (It now seems clear they did so on the heels of a member revolt, whether general or just of some key libraries.)

It’s also important to see that, before OCLC started threatening companies and non-profits doing interesting but non-competing things with book data—notably LibLime, Open Library and LibraryThing—they had none of the problems they have now. Now, by attempting to control all book data, they’ve spurred the creation of LibLime’s ‡Biblios system, a free, free-data alternative to OCLC and, well, sent me, Aaron Swartz of Open Library and dozens of prominent library bloggers into orbit.

Being caught so flat-footed can’t feel nice. It must be hard feeling like royalty and discovering your subjects think themselves a confederacy. But this is no time for OCLC to start attacking the credibility of its opponents. Surely LibraryThing is an unusual case—a company that has an opinionated, crusading—okay, loud—president. But the thousands of librarians and other individuals who supported our calls, or raised other objections to the OCLC policy are not less well-motivated than OCLC and its employees. They do not love libraries less. They are, rather, concerned that OCLC’s urge to control library metadata threatens longstanding library traditions of sharing, and sets libraries on a path of narrowness and restriction that will surely prove no benefit in this increasingly open, connected world.


*I need to write a blog post on this, but I was recently informed that whatever changes OCLC makes cannot touch federal libraries without explicit authorization. That is, federal law does recognize clauses like “if you continue to use” or “we can change this at any time.”
** It should more accurately be 3.48%, because we are getting our British Library records through Talis, who have a contract with the British Library.

Labels: oclc