Archive for the ‘oclc’ Category

Tuesday, March 15th, 2011

VIAF, OCLC and open data

Yesterday I released a service called “LC AuthoritiesThing.” The service solved a problem many have had with the LC Authorities website. Although a fine searchable resource, LC Authorities does not have stable URLs. Links die after a short period and are tied to sessions in a way that prevents sharing URLs during that period. LC AuthoritiesThing provides a window into the LC Authorities site which allows hard, reliable links. Various catalogers have thanked us for making the service, as it will allow them to refer to authority records more easily.

As an update to the post I took notice of VIAF, the Virtual Authority File, recommended to me as a substitute by a cataloger on Twitter. I assumed (apparently wrongly) that VIAF would at some point supercede LC Authorities. And I wrote that VIAF wasn’t a good substitute because it is an OCLC project, and encumbered by licensing restrictions.

Since then, I have received a diversity of communications that I am wrong. Although its data is hosted by and its services were developed and served by OCLC, VIAF is not an OCLC project, and the project has no access terms. Thomas Hickey from OCLC even wrote on this blog that full dumps are also available, although they must be approved somehow by project leaders.

This is welcome news. LibraryThing will be submitting a request for a full VIAF dump, and we’ll see where that goes. We will also look into automated harvesting of the website, or at least the LC portion of the data.

So much so good. But the situation is illustrative. Select people within the library community may believe that VIAF is free. But every public indication is that it is not free.

These indications include:

  1. OCLC copyright notices on every single VIAF.org page, and all VIAF-related pages on OCLC.org.
  2. Links to the OCLC Terms and Conditions from multiple VIAF.org pages, including the Privacy page.
  3. A robots.txt file that prohibits automated access to result pages.
  4. The “About VIAF” project page prominently states “Use of our prototypes is subject to OCLC’s terms and conditions. By continuing past this point, you agree to abide by these terms.”

As all catalogers surely know, the OCLC Terms and Conditions are lengthy and explicit. Among other things they prohibit commercial use, automated use, storage of data, and use of the data for cataloging (!). They state that OCLC has sole and arbitrary discretion to discontinue access to anyone for any reason. They state that exceptions to the terms requires permission in writing from OCLC.

Meanwhile, apart from a blog comment from Thom Hickey, I can find no assertions that OCLC terms don’t apply to VIAF, no mention of dumps or of a process to get them.

VIAF is to be commended for its openness and lack of terms. This is a great move forward for open bibliographic data. But it needs to make greater efforts to make others aware of this state of affairs, and define the level and character of openness. (It’s still unclear to me whether VIAF asserts any ownership, or whether it is all in the public domain.) And VIAF should make efforts to remove multiple statements asserting that OCLC terms apply to VIAF data.

Labels: cataloging, oclc

Thursday, May 21st, 2009

Non est potestas: OCLC Policy withdrawn

Non est potestas super terram quae comparetur ei / There is no power on earth that compares with it (frontispiece to Hobbes’ Leviathan)

The OCLC Review Board finally put OCLC’s Policy to bed. In a short speech to the OCLC Members Council, board chair Jennifer Younger, affirmed “that a policy is needed, but not this policy.” After the drubbing they got from the ICOLC and ARL—neither of which took into consideration OCLC’s recent push into the software market—you can be sure OCLC will take its board’s advice.

While the result was the right one, and I’m sure the members are good, conscientious librarians, I’m not going to echo others’ praise about their decision. The writing was on the wall. If they had pushed forward, OCLC would have met even more hostility than it already engendered. The speech itself, like their “push poll” survey, show where the OCLC Review Board’s sympathies lie. I don’t think you could read the ICOLC or ARL reports against it and conclude OCLC “gets” it. It was, as one of my email correspondents put it “all about protecting WorldCat, identifying ‘threats,’ and ‘appropriate use by members.'”

I think one of Dr. Younger’s phrases nicely encapsulates the flaw in OCLC’s approach, namely “we must revisit the Social Contract between OCLC and its members.” I’d like to go into the phrase a little deeper, not without some fun-poking.

The Social Contract. The phrase “social contract” is an interesting one. The idea also appeared in the the ARL report*—and may well go have been used before. As everyone knows, it’s a key concept in political theory (hence the Hobbes’ frontispiece above) as the thing that, some believe, makes government power legitimate.

Why, therefore, does a cooperative need to express itself in terms of state-formation, rather than a voluntary cooperative? Why does OCLC want to cast itself as a government?

Sitting in OCLC’s Brobdingnagian Dublin, OH headquarters, with thousands of OCLC workers shuttling about like so many ministry secretaries, and an interior hall bedecked with flags like the United Nations, it must be easy to think of OCLC as a sort of “government of libraries.”**

Architectural criticism aside, OCLC’s answer might be that, unlike cooperatives, a government gets to enforce its will more broadly. If you run afoul of a cooperative, the mutual consent that bound you to the cooperative is withdrawn, and you leave or they kick you out. But if you break the rules of a state, they put you in prison, whether you consent or not. Indeed, while philosophers have sometimes proposed a formal ceremony of consent, states act on non-consenting members all the time. Anarchists go to prison too (indeed, they tend to go to prison for being anarchists). This model, therefore, fits in better with OCLC’s plan to bind former members and non-members use of library data. As a cooperative, OCLC is a purely member institution. With a “social contract,” OCLC get to dictate more like a state.***

If the idea of OCLC-as-state is accepted, however, there’s a gap between the ARL report‘s idea of a “mutual social contract”—a social contract between citizen equals—and Younger’s description of the “Social Contract between OCLC and its members.” The latter is a nice description of the more antique view of a contract between citizens and their sovereign. Even if libraries accept a government over them, I suspect they would be more comfortable with the more modern view of a contract between equals.

Nullum timeret? As libraries consider the matter, one factor can be put to rest: Fear.

Writing for the Next Generation Catalog for Libraries list, Karen Coyle speculated that libraries were responding to OCLC through organizations like ARL and ICOLC, out of a “combination of ‘strength in numbers’ and ‘safety in numbers.'”

“I’ve seen a remarkable tendency of libraries to not want to confront OCLC. Remember that the proposed policy had penalties for mis-use of records, and severe ones at that (loss of rights to use OCLC records altogether). There is an intimidation factor involved. Agreed, as a member organization the members should not be afraid, but I think they are.”

That fear should be much diminished. Members spoke up, and OCLC backed down again and again (by my count they postponed, revised, revised, revised, delayed for revision and now shelved). Libraries—and, quite importantly, lots of people who merely love libraries—rose up and forced OCLC to back down.

The frontispiece to Hobbes’ Leviathan, shown above, quotes the Vulgate of the first clause of Job 41:33, non est super terram potestas quae conparetur ei (“There is no power on earth that compares to him.”). No doubt Hobbes or anyway his cover-designer felt this a good description of the legitimate sovereign.

But for our purposes, the second clause is particularly apt, qui factus est ut nullum timeret, “There is no power on earth that compares to him, who was made to fear nothing.” Job thought that whales were fearless.

Well, OCLC, you’ve met the architeuthis. And he is all of us.

See also:


*”In the eyes of the community, the guidelines expressed a mutual social contract, and the new Policy represents an authoritarian, unilaterally imposed legal restriction.”
**The trees growing inside are also a nice touch. The Assyrian throne room just had a carving of a tree, and the White Tree of Gondor was outside. 😉
***Similarly, nobody minds if you belong to two cooperatives. But states tend to be jealous about allegiance. AS ICOLC wrote, libraries are involved in a “complex set of relationships,” of which, “OCLC is one vital component among many that libraries will use.” That’s not really a “social contract” idea.

P
hoto by Flickr user OZinOH.

Labels: oclc

Tuesday, May 12th, 2009

OCLC Policy, Good night


The International Coalition of Library Consortia, a very loose but extremely large group of library consortia, just released a Statement on the Proposed OCLC Policy for Use and Transfer of WorldCat Records.

For a while there it looked like OCLC was going to succeed in locking down the world’s library data, converting a wonderful sharing and coordination tool into an unbreakable data monopoly. But, together with OCLC’s recent, revealing decision to enter the library systems market, the ICOLC statement effectively ends that possibility. OCLC isn’t getting its new Policy, or anything like it.* Good night, OCLC Policy.

The details are worth a look. The ICOLC’s statement was short, signing onto the “substantial and broad” concerns highlighted by the Association of Research Libraries. It goes on to add three concerns, two of which address the risk to innovation—a topic the ARL report barely touched on:

  1. The proposed policy appears to freeze OCLC’s role in the library community based on historical and current relationships. We share the concern, voiced by many, that the policy hinders rather than encourages innovation, and we urge the Review Board to carefully examine this issue. It is unclear that the policy has been constructed with a focus on an evolving role of OCLC in enhancing the missions of an international library community with diverse and complex interests.
  2. The scope of the proposed policy goes well beyond any concerns about inappropriate commercial exploitation of WorldCat records. It applies as well to non-commercial uses. ICOLC member consortia are member-created, member-driven innovation agents. Our initiatives are generally non-commercial and undertaken with member approval based on member needs. Any OCLC record use policy should account for the rich and diverse innovation that takes place through many consortia.
  3. The proposed policy is legally murky. There is no mechanism for negotiation of terms and conditions nor is it clear what constitutes acceptance by member libraries. A new policy must address these problems.

As significant as the content was the list of signatories. Lyrasis, the former Palinet and Solinet, includes over 2,500 members. With “regional base and national scope” (their words), and about to merge with Nelinet, bringing their members to 4,500, Lyrasis is a major player. They’re no longer just a “regional service provider” for OCLC, and can be expected to collaborate or compete with OCLC as its members’ interests lead. They were joined by many of the big regional and state networks out there—MINITEX, NERL, the Florida Center for Library Automation, the Washington Research Library Consortium, the Michigan Library Consortium, WiLS, four Canadian consortia and both the Swedish and Finnish national libraries. Some of the signatories ought to have been sympathetic. Orbis Cascade, a source of much original cataloging, is also an important OCLC partner in developing consortial software. In Ohio, OCLC’s home state, OhioLINK, OHIONET and INFOhio all signed. Other members will add their names to the list as they affirm it.

The Next Step. It’s time now for the library world to step back and consider what, if anything, they want to do about restricting library data in a fast-moving, digital world. Some, including some who’ve deplored OCLC’s process and the policy, want restrictions on how library data is distributed and used. Once monopoly and rapid, coerced adoption are off the table, that’s a debate worth having, and one with arguments on both sides.

From my perspective, restrictions on the use and transfer of cataloging data—which is not usually copyrightable and is most frequently created by bodies responsible to the public good—is legally dubious and ethically stingy.

Instead, libraries should embrace “radical openness,” a commitment to sharing what they know freely, something that looks less radical in light of the library’s historic dedication to the free exchange of information. Selling other people’s library records isn’t a real threat, but, if it were, the answer would be more openness, not less. When you sell tickets, you get scalpers. But nobody makes money selling passes to Central Park. (A few people make money walking dogs around it. Most just enjoy the free grass and sunshine.) And in a world that’s looking less and less friendly to the long-term success of libraries, an unwavering commitment to sharing and openness may well be libraries’ saving grace.

So, three cheers to ICOLC for speaking up on this issue. Now, librarians and library programmers, let’s get back to work. Let’s earn our freedom.


Artwork: “Flickr is Freedom.” Creative Commons, Attribution, by Timtak.

*I note with some interest that Edward Corrado, whose OCLC posts have been very perceptive, isn’t quite as excited about this as I am. Where he wishes the statement was “worded a little stronger” I take great solace in phrases like “The proposed policy appears to freeze OCLC’s role in the library community based on historical and current relationships.” I’m hoping some others weigh in. The library world is, I think, somewhat exhausted by the whole OCLC Policy affair, and now that the organizations are weighing in strongly and negatively, the bloggers and newslist-ers who raised the initial questions—and were excoriated for it—may no longer be as necessary.

Labels: oclc

Friday, April 24th, 2009

OCLC news reactions

This post follows on The OCLC End Game, posted early this morning.

Library Journal‘s Josh Hadro did an excellent follow-up article. Besides citing this blog post, Hadro got responses from Carl Grant, president of Ex Libris on OCLC’s tenuous non-profit status—I’ll have another post about that soon—and a number of bloggers. Iris Jastram/Pegasus Librarian’s thoughts deserve quotation:

“I’m pleased that this is yet another competitor against the current lumbering giants in the ILS market, and I like the idea that (if I understand correctly) this will add a hosted option to the ILS market. … On the other hand, this means that that pesky new policy on the transfer and use of OCLC records really wasn’t just about protecting a bunch of member-produced data after all. There were bigger plans afoot, and these plans involved leaning even farther toward the vendor model rather than the service model. And if OCLC is a vendor rather than a service, that new policy feels even more like a land-grab rather than an effort to protect member investments.”

Ms. Jastram’s misgivings are comforting to me, at least, as her previous thoughts on the OCLC Policy were more mixed. Ultimately, the fate of OCLC’s Policy will be decided by the people in the middle—the fair-minded people, not the ones who equate OCLC with the Matrix, The Empire or the All Your Bases villain.*

The Smithsonian Libraries on the OCLC Policy. I missed this, but on April 2 the official blog of the Smithsonian Institution Libraries weighed in on the OCLC Policy and the ACRL/ARL response (PDF), “support[ing] the recommendations” emphasizing a number of points. Among these were:

  • “The policy should recognize and affirm traditional library values of cooperative cataloging and shared bibliographic information without any claim of ownership of the bibliographic records.”
  • “OCLC’s new policy should recognize, and not be in conflict with, existing legal obligations or requirements that may apply to some OCLC member libraries (such as federal libraries).”

It’s great to see a federal library making such a public statement. Having been passed by in the OCLC Policy discussion—Federal librarians have told me they were amazed OCLC thought it could unilaterally change licensing terms with government entities—and not included on the ARL/ACRL board either, at least one is lending its voice to the criticism. Hooray for them. James Smithson, who left his estate for the “increase and diffusion of knowledge”—and to a country he had never even visited!—would, I think, be proud.

*Chris Bourg left a comment to the effect that the AYBABTU reference was purely humorous, and she does not consider OCLC a villain, even if she thinks I’ve got a good argument. Now, can anyone think of a way to tape Jay Jordan saying “You have no chance to succeed make your time”? I’m thinking we could sky write it over OCLC headquarters in Dublin, OH and secretly film OCLC employees puzzling it out. Ideally, though, he’d need to wear the bionic monocle.


I will never run out of interesting Flickr chess images. This one’s by Shyald, from a series.

Labels: oclc, worldcat, worldcat local

Friday, April 24th, 2009

The OCLC End Game

Two years ago I predicted what OCLC, the library-data organization, was after with it’s WorldCat Local pilot program—”They’re trying to convert a data licensing monopoly into a services monopoly.” To illustrate, I changed the OCLC logo to the Death Star.

I was hardly alone in this speculation. But this concern was soon overtaken as OCLC brought forth it’s Revised Policy for Use and Transfer of WorldCat® Records. The Policy, which turned a de facto data monopoly into a legally enforceable one, became a focus of intense debate in the library world. On the one side just about every library blogger with a keyboard, and eventually a review board at the ACRL/ARL, raised questions about the idea of anyone “owning” records meant for sharing and most frequently produced by government entities. On the other side, OCLC’s defenders (in truth, mostly employees), talked of OCLC’s “curation” of community content, of “protecting members’ investment,” of the “best interest of libraries,” “OCLC’s public purposes” and of WorldCat.com’s role as an essential “switching mechanism” to local catalog (references: 1, 2, 3).

Yesterday, OCLC unveiled the end game that brings everything together. As reported by Marshall Breeding in Library Journal:

“This new project, which OCLC calls “the first Web-scale, cooperative library management service,” will ultimately bring into WorldCat Local the full complement of functions traditionally performed by a locally installed integrated library system (ILS).”

The new service will be “free” to (paying) WorldCat First Search customers.

The move to “web scale” (OCLC-speak for “web”) catalogs was an inevitable one, and is a good one. It’s silly to have every library in the country running their own racks of servers. The economics of server architecture, equipment and systems administration make a single, hosted solution economically superior. It makes particular sense for OCLC. With a large percentage of world libraries’ data sitting in servers for copy-cataloging purposes, a locally branded and faceted web-app. catalog was the next logical step.

The move casts new light on its Policy defenses. OCLC isn’t “curating” library records; it’s leveraging them to enter a new market. It wasn’t “protecting members’ investment,” it was investing members’ money, intended to support OCLC’s core mission, to build a new service. WorldCat isn’t a “switching mechanism” to local catalogs. It will replace them.

I’d love to follow them. I’d love to make a large-scale hosted library catalog. I think LibraryThing could do a lot better. OCLC is full of smart people, but it develops slowly and has shown singular inability to produce social features that anyone would want to use. I think Talis, AquaBrowser, LibLime and Equinox could do better too. And I think, if library programmers got together, they could make truly open community-run service—something others, like LibraryThing, could provide plug-ins for.

We’d all love to try, but we aren’t allowed. According to the Policy, you can’t build the sort of truly “web scale” database that would make such a project economically viable. Anything that replicates the “function, purpose and/or size” of WorldCat is not “Reasonable Use.” Any library participating in such a venture would lose its right to OCLC-derived records, something that would literally shutter most public and all academic libraries in the country. When it comes to large-scale online catalogs, there can be no competing with OCLC.

Let me be clear: I have no problem with OCLC developing software. They do good work. I for one think WorldCat/WorldCat Local is a better product than most server-based OPACs.

But, now more than ever, OCLC must end its attempts to restrict and monopolize library data. It was ugly and unfair for OCLC to claim ownership over what is largely public data. It is obscene to leverage that data monopoly into a software monopoly.


Chess images from Flick users malias and furryscaly. Chess outside makes me think of the Deus’ song Slow. What is it with Europeans and outdoor chess sets anyway?

Labels: oclc, worldcat, worldcat local

Monday, February 23rd, 2009

Research libraries clobber OCLC Policy

The Association of Research Libraries released its report on the new, now delayed OCLC Policy, and it’s a doozy—a forceful rejection of both the process and content of the Policy.

The full report makes for enjoyable reading—outside of Dublin, Ohio anyway. The task force members, research-library heavyweights all, fully and finally put to rest the notion that the only people bothered by OCLC’s power grab are open-data crazies and evil commercial companies.

There appears to have been a significant split. The majority felt it “desirable to have a policy that limits large-scale redistribution of records that could be harmful to the collective” and a minority did not. (It’s great to hear that a team of veterans had at least one member willing to reject the whole structure of cooperative-restriction!) But if the majority felt some policy was called for, they were apparently unanimous in condemning OCLC’s unilateral, non-consultative approach and concerned by a host of issues, large and small. Surveying the current Policy they urge a “fresh start.”

Vague legal language, unclear goals, worrying process, the split between the “nice” FAQs and the actual language of the Policy, issues of clouded ownership and responsibility for bibliographic data, termination provisions, the lack of respect for federal libraries and the legal impossibility of binding them without explicit renegotiation—it’s all here! There’s even a legal opinion, attached to the document, pouring cold water on the idea that the Policy will have any “downstream” effect on parties that haven’t explicitly agreed to it (ie., LibraryThing members). In all, a good drinking game could be invented—every time the ARL report validates or recapitulates a point made on this blog, and on other opponents‘, drink. (If you’re going to Code4Lib this week, I’ll buy the drinks!)

Most striking are the report’s vision of OCLC as a cooperative, and the ways the OCLC policy undermined that trust:

“The collective activity of shared cataloging is a source of deep pride and success in libraries in the U. S. and around the world. OCLC was created as, and is viewed as, a membership organization formed for the purpose of enabling this collective activity…. Members view WorldCat as a collective enterprise, not as a product that they license for use. …”

“The new Policy is clearly intended as a unilateral contract, unilaterally imposed on any entity using records from the WorldCat database, including member libraries…. The member community has seen the introduction of the new Policy as a fundamental change in the nature of the relationship between OCLC and its member libraries. In the eyes of the community, the guidelines expressed a mutual social contract, and the new Policy represents an authoritarian, unilaterally imposed legal restriction.”

Now let’s see what comes of this. OCLC has a needle to thread. The ARL report sets a high bar for consultation and consensus—higher than I think OCLC can reach without rethinking its whole communication model. And the core research-library concerns are serious*. I don’t think they can address them without failing to ensure what I believe to be the Policy’s true intent—establishing a permanent and lucrative data monopoly.

My prediction: Keep an eye on OCLC’s “regional service providers.” Various signs, including what reporters call “highly-placed sources” confirm that OCLC/regional tension is at an all-time-high, with OCLC increasingly rewriting the rules there too—selling directly to libraries in unprecedented ways. I think we can see in these moves a common historical pattern: when the structures that give a powerful institution strength start to weaken, it reaches for a new level of authority not based in the previous structure and therefore not susceptible to weakening. (In this case, OCLC is moving from a robust, often mediated cooperative to a unmediated, contractually-drawn licensure.) Sometimes the effort succeeds; sometimes the attempt crystalizes opposition and hastens and ensures the institution’s decline.


*Even if they picked the members of the Review Board, they may still face trouble from that direction. I doubt that OCLC’s Review Board has what the ARL board apparently had—members who apparently questioned the very idea of restricting access and use!—but all but one of the board members are academic/research librarians and can be expected to understand and appreciate the concerns raised by their ARL colleagues.

Labels: arl, oclc

Thursday, February 19th, 2009

Seeing parallels

Steve Lawson wrote this wonderful piece for his blog See also…, reprinted here (by permission) in full:

There is a large organization whose main business isn’t producing information, but instead hosting and aggregating information for many thousands of users on the web. Users upload content, and use the service to make that content public worldwide, and, likewise, to find other users’ content. Then one day the large organization decides to change the rules about how that information is shared, giving the organization more rights–to the point where it sounds to some people like the organization is trying to claim ownership of the users’ content, rather than simply hosting it and making it available on the web.

A small but vocal and influential group of users object to the policy change. The organization protests that it isn’t their intent to fundamentally change their relationship with their users and that legal documents tend to sound scarier than they really are. Most customers are either unaware or unconcerned by the change in policy, but the outcry continues until the organization backs down a bit, sticking with the old policy for the time being. The future, though, is up in the air.

Facebook? Or OCLC?

Perfect, just perfect.

Labels: facebook, oclc, open data, steve lawson

Sunday, February 15th, 2009

Why Wirral? One partial explanation.

A recent article in the Telegraph describes a worrying fall-off in library books and library usage in the UK.

Over the past six years books in public libraries in the UK have fallen 12%, from 116 million to 103.2 million. Library check-outs have fallen faster—16.5%. According to the Telegraph, UK librarians are bracing for another round of declining numbers, coming amid budget shortfalls across the board—and expecting to get their budgets slashed.

Reflecting on these problems, the CEO of the Museums, Libraries and Archives Council (MLA) told the Telegraph:

“[W]e live in an age where books can be bought cheaply from supermarkets or the internet so the reasons to visit a library have changed for many users.”

Wirral as a microcosm. Cuts have started. The Wirral council system in NW England (LibraryThing Local), is closing 11 of 24 branches.

They sure don’t deserve it. Taking a look at the Wirral Libraries website, anyone can see they’re doing a lot of things right. The branches look well-organized and inviting. They’ve got a fair number of computers and free Wifi. They have a special outreach program for the house-bound. They even lend toys!*

But they are doing one thing very wrong—namely that Wirral, like most libraries, isn’t really “on” the web.

People are finding things in supermarkets and the internet because it’s easy to do so. On the internet, one-stop shopping means that a huge panaply of useful and interesting things are available from a single, unified and well-understood interface—from local bars, to local bands, to some 600 pizza and 400 curry joints in the area (Man, I love Britain!). Many of these resources are not only in Google searches, but Google will plot them on a map for your convenience.

What isn’t online are library books! The Wirral Libraries’ catalog, a Talis Prism OPAC, hardly registers in Google, which knows only 7,000 pages, from a library with more than 300,000 items. Worse, virtually every Wirral page in Google is broken. On the right are a representative sample of what Google knows about from the Wirral catalog. Each link has the same title. And each links to an expired session that proclaims:

You can, of course, get to the Wirral Libraries catalog if you know that’s where you want to go—fifth link down, then the top rounded button on the right. That’s not the same thing.

And even if you find a book, you can’t bookmark it for yourself or forward it to a friend–the links will die off in a few minutes. In refusing to allow links and spider, the Wirral website sets itself apart from the other websites Wirral residents might use. The rest of the web just works—it’s in your search box, where most internet-aware people do most of their information finding.

Lastly, where is WorldCat in all this, the “switching mechanism” and “point of concentration” (Karen Calhoun) OCLC provides libraries as an alternative to the “lunacy” (Roy Tennant) of libraries being on the web for themselves? Nowhere. None of the Wirral Libraries are in it, and WorldCat doesn’t list a copy of Harry Potter in the Deathly Hallows closer than 60 miles away (postal code: CH46 6DE?). One may speculate that Wirral wasn’t willing to pay for the service, which anyway gets quite insignificant traffic.***

Who’s to blame? Wirral Libraries’ misfortunes are no doubt many, and not being part of the web is not the largest. But it’s a part. Wirral citizens aren’t seeing their library appear in their search results. They aren’t as aware of its riches as they might otherwise be. If they were aware, it’s likely they’d use these resources more, and the system would be easier to defend politically.

It won’t do to blame Wirral for this. Library vendors have long handicapped their products in this way, and Wirral Libraries surely bought their Talis Prism system a while ago.** Budgets are short—and getting shorter. Both the web and this recession have hit libraries by surprise.

But refusing to participate in the central information technology of the age has its costs. And the leaders of Libraryland who advocated and continue to advocate for closed solutions, closed data and staying out of search indexes—except as “negotiated” with Google—have contributed to this situation. The respected guides have taken libraries off the great river of information, and left them grounded on the shore. Now someone’s coming for the boat.

I hope the residents of Wirral fight like hell to keep their libraries open. Then they should fight like hell to make their libraries truly open.


*I don’t know how common this is in Britain. I get the sense it’s not too common in the US, but it happens. The Hingham Public Library in Hingham, MA lends practically everything, from toys to paintings on the wall.
**It’s ironic that Wirral’s OPAC was made by Talis, now one of the more progressive and forwarding thinking library vendors. I’ll put this in a footnote to avoid “shilling,” but if Wirral can get a new OPAC, I’ll arrange for them to get LibraryThing for Libraries for free until they get back most of their funding. Maybe Talis would kick in an incentive to upgrade their OPAC?
***WorldCat is supposed to be the central website of Libraryland, but third-tier websites like LibraryThing and Dogster—the social network for dog lovers!—are currently beating it.

Labels: indexing, oclc, riverine metaphors, web, wirral, wirral libraries

Sunday, February 1st, 2009

The evil 3.26%

The question has arisen of why I advocate against OCLC’s attempt to monopolize library data. Roy Tennant of OCLC, an intelligent, likeable man whom, although we disagree on some issues, has done more for libraries than most, accused me of writing and talking about the issue because:

“… your entire business model is built on the fact that you can use catalog records for free that others created and not contribute anything back unless they pay (yes, there is a limited set of data available via an API, but then they need the chops to do something with it).”

Fair enough. Let’s look at the numbers, and the argument.

I did a comprehensive analysis, available here as a text file, with both output and PHP code. If anyone doubts it, send me an email and I’ll let run the SQL queries yourself.

The numbers. As of 6:17pm Sunday, some 3.5 years after LibraryThing began, our members have added 35,831,904 books from 690 sources:

  • 85.48% came from bookstore data (almost exclusively Amazon).
  • 4.88% were entered manually by members
  • 9.63% were drawn from library sources

Now, where did that 9.63% come from?

These sources were in every case free and open Z39.50 connections our members accessed through us. Very frequently they accessed records of their own academic institution, but in any case, these members accessed these records alongside everyone else—libraries, museums, public agencies of one sort or another and all the students and scholars who use RefWorks, EndNote and other such services. Meanwhile LibraryThing has never been asked to stop accessing a source. On the contrary, libraries frequently ask to include themselves on our list of sources.

Of the 9.63%, by far the largest source is the US Library of Congress, the source of 2,203,182 books, or 6.15% of the total. The Library of Congress is a Federal organization, created for the benefit of the country and falling under the government-wide rule that public work is for the benefit of the public, and cannot be copyrighted or otherwise “owned.” As long as technology was there the Library of Congress has allowed access to its cataloging data; the OCLC policy change will not affect that.* We are grateful the Library of Congress does this. But insofar as we are taxpayers and support American notion of public ownership of public resources, I will not apologize for it. (On the contrary, I feel that OCLC should apologize for attempting to restrict and profit from public work.)

3.26%. That leaves 3.48%—more appropriately 3.26%**—the evil sliver upon which our “entire business model is built.” Take a look at the top fifteen here:

  • Koninklijke Bibliotheek — 130,406 books (0.36%)
  • National Library of Scotland — 80,826 books (0.23%)
  • British Library (powered by Talis) — 80,205 books (0.22%)
  • Gemeinsamer Bibliotheksverbund (GBV) — 77190 books (0.21%)
  • National Library of Australia — 72,896 books (0.2%)
  • Helsinki Metropolitan Libraries : 70,551 books (0.2%)
  • The Royal Library of Sweden (LIBRIS) : 63,430 books (0.18%)
  • Italian National Library Service : 60,643 books (0.17%)
  • Vlaamse Centrale Catalogus : 58,936 books (0.16%)
  • LIBRIS, svenska forskningsbibliotek — 54,339 books (0.15%)
  • ILCSO (Illinois Libraries) : 28,517 books (0.08%)
  • Yale University : 26,885 books (0.08%)
  • Det kongelige Bibliotek : 24,564 books (0.07%)
  • University of California : 20,098 books (0.06%)
  • Bibliotek.dk : 19,628 books (0.05%)

With 690 possible sources, it’s a long, long tail. We take 2087 from the Russian State Library, 1067 records from the Magyar Országos Közös Katalógus, 286 from Princeton, 106 from Koç (in Izmir), 63 from Hong Kong Baptist, 4 from the Universidad Pública de Navarra, etc.

It should be apparent to anyone looking at the above that the 3.26% is largely about satisfying the needs of foreign LibraryThing members–a small percentage of our membership and hardly central to our “business model.” Equally clear is the government orientation of the list—only one, Yale—is a private institution. The rest are all government agencies. Of course, no records actually came from OCLC itself!

All-in-all, library data from non-federal sources is a negligible component of LibraryThing’s content. LibraryThing is not some big plot to capture library records. That idea is simply not in the figures.

Do we give back? What of the second half of the accusation, that we “not contribute anything back unless they pay” and the bit against APIs.

First, assuming Roy means LibraryThing data generally, it’s absurd to suggest that because LibraryThing draws 3.26% of its data from free, unlicensed sources, our members’ data and services are owned by OCLC or its members. OCLC no more owns members’ tags and reviews on bibliographic metadata than Saudi Aramco owns the furniture I bring home in my car. Who in their right mind would every accept a list of titles and authors from a library, if that meant ceding ownership over what you think about the book?

LibraryThing and OCLC both have terms. But LibraryThing license terms are unlike OCLC’s in a number of ways. LibraryThing members knew what they’re getting, unlike OCLC members, who thought they were sharing with other libraries, but find themselves the lynchpin of a monopoly. From our inception LibraryThing has reserved a right to sell aggregate or anonymized data. We also sell some reviews—giving members the option to deny them to us. All our member data is non-exclusively licensed, so members can do anything they want with it outside of LibraryThing, and members can leave at any time. Neither is true of OCLC members’ data under the Policy.

Cataloging data. That leaves LibraryThing cataloging data, of which we have three types. We don’t have any legal responsibility to make it free, but we do so anyway.

First, we would be happy to offer downloads of original or modified MARC records! We haven’t done so in order to avoid attracting a suit from OCLC. But perhaps we were mistaken. If OCLC would like us to start releasing our MARC records to others, someone should let us know. We will release them under the same terms they were given to us—freely.

Second, our Common Knowledge cataloging (series, awards, characters, etc.) is free and available to all. We can’t think of a better way to provide it other than through an API, but we’re all ears if Roy knows of a better way. And if OCLC would like to admit it to WorldCat, without subverting its always-free license, they don’t even need our permission. Go on, OCLC, make my day!

Thirdly, there’s ThingISBN, which was directly patterned on OCLC’s xISBN service. Despite Roy’s criticism, they are identical in format and delivery so if there’s something wrong with its XML APIs, OCLC has only itself to blame. Indeed the only difference is cost: ThingISBN is completely free, both as an API and as a feed; xISBN, which member data creates, is sold back to members.


Stop killing the messenger. It’s time for OCLC to recognize they made this mess, not others. They have perpetrated some astouding missteps—from attempting to sneak through a major rewrite of the core member policy in a few days without consultation, to a comic series of rewrites and policy reversals, culminating in withdrawing the policy entirely for discussion. (It now seems clear they did so on the heels of a member revolt, whether general or just of some key libraries.)

It’s also important to see that, before OCLC started threatening companies and non-profits doing interesting but non-competing things with book data—notably LibLime, Open Library and LibraryThing—they had none of the problems they have now. Now, by attempting to control all book data, they’ve spurred the creation of LibLime’s ‡Biblios system, a free, free-data alternative to OCLC and, well, sent me, Aaron Swartz of Open Library and dozens of prominent library bloggers into orbit.

Being caught so flat-footed can’t feel nice. It must be hard feeling like royalty and discovering your subjects think themselves a confederacy. But this is no time for OCLC to start attacking the credibility of its opponents. Surely LibraryThing is an unusual case—a company that has an opinionated, crusading—okay, loud—president. But the thousands of librarians and other individuals who supported our calls, or raised other objections to the OCLC policy are not less well-motivated than OCLC and its employees. They do not love libraries less. They are, rather, concerned that OCLC’s urge to control library metadata threatens longstanding library traditions of sharing, and sets libraries on a path of narrowness and restriction that will surely prove no benefit in this increasingly open, connected world.


*I need to write a blog post on this, but I was recently informed that whatever changes OCLC makes cannot touch federal libraries without explicit authorization. That is, federal law does recognize clauses like “if you continue to use” or “we can change this at any time.”
** It should more accurately be 3.48%, because we are getting our British Library records through Talis, who have a contract with the British Library.

Labels: oclc

Thursday, January 22nd, 2009

The Guardian asks “Why you can’t find a library book in your search engine?”

The OCLC data-grab has hit the “real” media—an article in the Guardian. The article asks the simple question, “Why you can’t find a library book in your search engine?”

It’s an obvious question. The answer isn’t quite as simple as they put it. Libraries would be in Google if their library catalogs could be spidered. But they’d still be hampered by OCLC in various ways. Anyway the coverage of OCLC, Open Library, LCSH.info and LibraryThing are spot-on. And the subtle nationalist angle—an American site!—can’t hurt.*

Three cheers for the Guardian. Next up, the New York Times? We can hope.

*Did you know OCLC invaded Iraq?

Labels: guardian, oclc

Friday, January 16th, 2009

Library social media wins one

Update: We can’t make it to today’s Nylink/NYPL event. Get your tshirts at ALA Midwinter or by asking for one.

Big news. As you may have heard, OCLC has reversed itself and delayed its new Policy due to take effect in February. They will be setting up a “Review Board of Shared Data Creation and Stewardship”*, with broad member consultation promised. At best, they’ve heard the message and may end up embracing truly free and open library data. (A man can dream!) At worst their strategic retreat gives free-and-open data proponents time to articulate and broaden their case.

For people like me who have been pluging away at this for months and feeling increasingly depressed about what seemed the library world’s inevitable slide into data monopoly, it was a big, big win. The LibraryThing team went out to Silly’s. That’s a party.

Social media won. Content aside, however, it was a big win for library “social media,” particularly the “biblioblogosphere.”* OCLC’s new Policy was rushed through so quickly that it effectively bypassed traditional library-world tools, like professional conference. Press coverage too was minimal, late and mostly dependent on the blogosphere. Even the hastily-convened ARL/ASERL panel hadn’t spoken yet when OCLC felt the need to reverse course. The blogosphere was running ten- or twenty-to-one against the Policy.

Other social media also played their part. From the trendy, excitable Twitter to the cliquish Facebook to that forgotten workhorse of professional communication, the Listserv. Even AUTOCAT, which many of the Library 2.0 types I hang out with consider past hope, showed little support for the policy and much criticism. And over them all, the Code4Lib wiki was pressed into action tracking and aggregating what everyone was saying, allowing arguments to build on each other and makin it crystal clear to everyone that they were not alone.

Of course, we don’t know why OCLC changed course. There’s a rumor going around that important library director or two said they wouldn’t abide by it. It’s also possible that ARL/ASERL is going to come out solidly against it, and OCLC saw it coming. But even if the ultimate decision rested with some powerful people, they must have drawn on the blogosphere for information and support. Maybe the payoff from all those library-sponsored professional development courses won’t come from helping patrons get on the MySpace bus, but from getting the library world off a train to nowhere.

So, open-data people. You’re not alone. You have power. The library world is listening. What do you have to say?

Labels: facebook, oclc, social media, twitter

Tuesday, January 13th, 2009

OCLC protest teeshirts are here!

UPDATE: OCLC just announced creation of a “Review Board of Shared Data Creation and Stewardship” to review the policy with members. And that was one nipple. What could two nipples do?

We made a teeshirt out of the best of our parody OCLC logos, protesting the new OCLC Policy. I think it works pretty well.*

We’ll be handing out at this Friday’s OCLC Policy discussion at the New York Public Library (sponsored by Nylink) and at the American Libraries Association Mid-Winter conference in Denver. So far we’ve made exactly one. We’ll probably make 100. They cost $5 to make, so a $5 donation is appreciated (but not required). If there’s demand, we’ll sell and ship them for cost.

Sonya strikes her “defiant” pose:

*Except for being nipply on me. I didn’t want to post it, but Mike had the best argument, “With all the things that LibraryThing members have done for you, don’t you owe them a nipple?”

Labels: nipply, oclc

Sunday, January 11th, 2009

Why libraries must reject the OCLC Policy (part 1)

I have been thinking about the new, proposed OCLC Policy, scheduled to take effect in mid-February. I was driven to act after a recent AUTOCAT posting, in which a librarian suggested libraries not expose their collections to the web, except for “original cataloging,” for fear of the new OCLC Policy. How terrible would that be?

I’m not sure the specific fear is justified, but fear certainly is. As the Policy states, violations of the OCLC Policy “automatically” terminate a libraries right to use any OCLC records. And OCLC gets to say what constitutes a violation.

It got me thinking about compiling all the arguments against the Policy. I want to start with the process and legal ones, which have gotten very short shrift. OCLC spokespeople are persuasive personalities, and OCLC’s “Frequently Asked Questions” allay fears, but the Policy itself is a scary piece of legal writing and, as it explictly asserts, the only writing that matters.


1. The Policy fundamentally changes the character of OCLC, a “member” institution, with no formal member approval and with little member input.

WorldCat is why OCLC was created, is OCLC’s largest revenue source, the basis of most of their other services and the most common way OCLC interacts with its members. The Policy transforms WorldCat in many respects, but most of all in how OCLC relates legally to its members from a cooperative to a sort of licensure.

OCLC is supposed to be a member organization. But what member organization would fundamentally alter its core business and transform their relationship with members without putting the issue squarely before them? Yet OCLC has done just that.

2. The Policy is a legal document. No other statements matter.

The policy is a legal document, not a statement of intent or aspirations. It explicitly states (§E7) that it is “the final, complete and exclusive statement of the agreement of the partiwith respect to the subject matter hereof.” That means that the “intent” of the Policy as voiced by OCLC spokespeople or the seemingly gentler “Frequently Asked Questions” have no legal standing. If it’s not in the Policy, it’s not part of the agreement.

Licenses are legal documents. You don’t sign legal documents based on casual pleasantries. If a landlord says you can move out at any time, but the lease says you have to give notice and pay rent until a new tennant is found, trust the lease or make the landlord change it.

3. The Policy is illegitimately retroactive.

The Policy limits the use and transfer of all records, not just new ones. The diligent catalogers of forty years ago who thought that OCLC was a humble cooperative helping libraries copy catalog had no idea that they were laying the foundations of a data monopoly.

Retroactive licenses are legally dubious and morally obnoxious. If OCLC wants to impose a new license, it should not do so on legacy data.

4. The Policy is perpetual and will create a perpetual monopoly.

Most licenses lay out what does and does not “survive” termination. Not here. There is no out from the Policy whatsoever. You can leave OCLC and sit on your records for twenty years and they still effectively own them, and they can still strip your library of them at any time. The policy lasts forever, on every record it touches and no matter who touches it.

The perpetual nature of the agreement means that, once this policy goes into effect, it’s all over. The vast majority of the world’s library data is owned and restricted. What US library could even think of exempting themelves of every “OCLC-derived” record? The “network effect” is just too great.

Unless OCLC changes its mind or dies, there will be no second chance.

5. OCLC can change the Policy at any time, in any way.

As the Policy states, “OCLC may issue a modified version of this Policy or a substitute for this Policy at any time.” There is no check whatsoever on what this new policy can require or prohibit. Given the lack of member input that characterized its introduction, OCLC members may confidently expect to have no role in any future changes or “substitions” either.

A perpetual license that can be changed at any time is a lot of power to any institution. Does OCLC deserve that sort of power?

6. If you violate the policy your library automatically loses the right to any “OCLC-derived” records you have.

(§E1) “The rights to Use and Transfer WorldCat Records afforded by this Policy shall automatically terminate upon any breach of the terms of this Policy.”

Imagine losing all the OCLC-derived records in your library catalog. Imagine turning all your automated systems off until every bibliographic and authority record that passed through OCLC at any point was identified and removed from your library, and new “untained” ones found or created from scratch. What library in the United States could keep its doors open if it lost the right to use “OCLC-derived” records?

It sounds dire, but according to the Policy, if you violate the Policy in “any” way, OCLC can shut down your library.

7. OCLC has sole discretion to declare a library in violation and strip it of its records.

Not only can OCLC shut down your library, but you have no recourse to stop them. As the Policy states “[§E6] OCLC has the sole discretion to determine whether any Use and/or Transfer of WorldCatRecords complies with this Policy.”

If someone handed a government agency the power to kill libraries, and do so with no appeal or legal recourse, librarians would be in the streets in protest. Why does OCLC get a pass?

Call to action

Librarians and interested parties have only a month before the OCLC Policy goes into effect. It is time to put up or shut up.

UPDATE: Note that it’s Friday, January 16. See the page.

*I am dying to be there, but I simply can’t make it. One way or another, however, we’ll try to get our word out.

Labels: oclc

Sunday, January 11th, 2009

This is like a Heinlein Novel!

I recently enjoyed a recorded talk by Christine Peterson, co-founder of the Foresight Nanotech Institute, on open-source security and politics.

The basic point is to get alpha geeks to think about what they can contribute to basically polticial questions–to better, less invasive physical national security but also to stand up and fight against absurdities like polling machines running proprietary software, when everyone in software knows open source would provide a much better check on potential hacks.

She has a section that mirrors how I feel about the new OCLC Policy and librarians’ and library technologists’ responsibility to get engaged and do heroic things, to keep libraries “free to all” and not the cornerstone of a perpetual monopoly:

So you may be sitting there going, “God, the Constitution! Franklin, Jefferson… this is like a Heinlein novel! She’s trying to convince me I’m in a Heinlein novel, where there’s heroic action to take.”

Well, guess what, you are! You really are. This is a critical time. And there’s going to be huge decisions and a lot of work to be done. And you’re the best ones to do it. I hate to tell you that. I know you have other things to do.

Labels: inspiration, oclc

Wednesday, December 31st, 2008

666

Another OCLC logo parody. The person who did wishes to remain anonymous—and for good reason!


Labels: oclc

Tuesday, December 16th, 2008

New OCLC logos

Some genius over at the Technology Planning Committee of the SHARE Library Consortium in Washington put together a parody of the OCLC logo, incorporating Darth Vader. I’d like to think there were in part inspired by my transformation of the old OCLC logo into that of the Deathstar.

Which got me thinking. The muted response to OCLC’s new Policy is enormously frustrating. The Policy is the a major shift, taken with minimal member input, which effectively transforms an expensive transfer service into a permanent data monopoly. It runs against age-old library values, and in the face of everything else going on in the information world.

There’s only so many posts I can write digging into the legal language. So, maybe the time has come for humor. How about some new OCLC logos I put together?






Wouldn’t they look good on t-shirts at ALA Midwinter?

Well, that was a fun couple of hours! I just wish I could get the OCLC font just right.

Labels: oclc, policy, worldcat

Wednesday, December 10th, 2008

The New OCLC Policy and Federal Libraries

This blog post attempts to show that the new OCLC Policy (blogged here) effectively anulls a longstanding principle of US law, that work performed by government officials and employees is forever in the public domain.

In a library context, this has always meant that Federal libraries are not only free but compelled to share their information with the public that pays for it.

Many continue to hold that this is still true. As one AUTOCAT poster wrote:

“I find it hard to believe OCLC would attempt to assert an intellectual property right over things such as LC cataloging, which by statute is in the public domain.”

Unfortunately, this conception confuses two areas of law. By crafting the Policy as a license, which is perpetual, retroactive and viral, OCLC can effect a sort of ownership–US citizens still own it, but the don’t have a right to get it (except, if the qualify, with an OCLC license around it).

Thus, OCLC transforms an expensive service–access to a repository of data that, even OCLC employees admit, would fit on an iPod, with room for 5,000 songs!–into effective ownership. This state of affairs obtains even when all the cataloging and editing was done by other Federal agencies and employees. It is only broken when the library in question itself did the original cataloging. As we shall see, that doesn’t help much.

Three Federal Libraries. The OCLC affiliate for Federal libraries, FEDLINK, maintains a list of its members–libraries like the Library of Congress, NASA, Justice, the Smithsonian, the National Library of Medicine, the Supreme Court, etc.

From this list I plucked three that have public catalogs–the Department of Defense, Commerce, and Labor–and carefully examined the first ten MARC records for three common English words. I checked these against the 001, 035 and 994 fields recommended in the Policy FAQ, “How can I determine if a record was derived from WorldCat?”* The results are depressing.

Of the Department of Defense‘s ten books on “Freedom,” zero will be free after the Policy takes effect. None were originally cataloged by the Department of Defense, and all had 035 fields showing they were at one point “derived” from OCLC. In every case, the original cataloger was the Library of Congress, and many were edited by the Department of Defense. But that doesn’t count. They aren’t DoD original cataloging and they bear the mark of OCLC. As far as the Policy is concerned, that’s the end of the story.

Of the Department of Labor’s ten “Copyright” books, zero again are free. All ten were cataloged and edited by Federal employees (mostly the LC and the Congressional Information Service). But none were cataloged by the Department of Labor, and all have fatal 035 fields.

The situation at the Department of Commerce was slightly better. Here I searched for “Openness” and got only eight results. Five are clear-cut OCLC records. Two might be free–they lack 001 and 035 fields, although OCLC appears in the 040. I think, however, that they aren’t currently held by the library though, and, in an overlooked provision, the OCLC Policy prohibits transfer of records when a library doesn’t hold the book. But one is free–cataloged by the University of Alabama and lacking any trace of OCLC transfer.

Don’t think the OCLC Policy affects Federal libraries? Think again.

Sign the Petition (if a librarian, also see this one).


Data. Here's what I found. Prove me wrong.

Department of Defense: first ten records with title starting "Freedom."

  • Freedom by Orlando Paterson (035 has ocm; cataloged by LC, edited by Department of Defense)
  • Freedom by William Safire (035 has ocm; cataloged by LC, edited by Department of Defense)
  • The Destruction of slavery (035 has ocm; cataloged by LC, edited by Department of Defense)
  • Freedom : a history (035 has ocm; cataloged by LC, edited by Department of Defense)
  • Freedom and foreign policy (035 has ocm; cataloged by LC; OCLC edits)
  • Freedom and information (035 has ocm; cataloged by LC, edits by Baker & Taylor, Connecticut State Libray and Department of Defense)
  • Freedom and the Law (035 has ocm; cataloged by LC, edited by Department of Defense)
  • Freedom at Issue (035 has ocm; cataloged by LC and about a dozen other instittions, not including OCLC)
  • Freedom at Midnight (035 has ocm; cataloged by LC, edited by Brown, OCLC and Department of Defense)
  • Freedom betrayed (035 has ocm; cataloged by LC, edited by Department of Defense)


Department of Labor: first ten records with title starting "Copyright."

  • Intellectual property and trade (035 has ocm; cataloged by US International Commission, editded by Government Printing Office and the Congressional Information Service)
  • Berne Convention Implementation Act of 1987 (035 has OCoLC; cataloged by Government Printing Office, Congressional Information Service)
  • Berne Convention Implementation Act of 1988 (035 has OCoLC; cataloged by Government Printing Office, Congressional Information Service)
  • Record rental amendment extension (035 has OCoLC; cataloged by Government Printing Office, Congressional Information Service)
  • Satellite Home Viewer Copyright Act of 1988 (035 has OCoLC; cataloged by Government Printing Office, Congressional Information Service)
  • Berne Convention (035 has OCoLC; cataloged by Government Printing Office, Congressional Information Service)
  • General oversight on patent and trademark issues (035 has OCoLC; cataloged by Government Printing Office, Congressional Information Service)
  • Copyright issues presented by digital audio tape (035 has OCoLC; cataloged by Government Printing Office, Congressional Information Service)
  • Legal issues that arise when color is added to films originally produced, sold, and distributed in black and white(035 has OCoLC; cataloged by Government Printing Office, Congressional Information Service)
  • The Berne Convention (035 has OCoLC; cataloged by Government Printing Office, Congressional Information Service)


United States Department of Commerce: first eight records starting "Openness" (only 8 records total)

  • Globaphobia: confronting fears about open trade (001 incldes ocm; cataloged by LC and Colgate)
  • Regulatory reform and international market openness (035 includes ocm; cataloged by Stony Brook)
  • Financial policies and the world capital market : the problem of Latin American countries (001 contains ocm; cataloged by DLC)
  • +A vision for the world economy : openness, diversity, and cohesion (040 includes OCL; cataloged by LC, with edits by National Agricultural Library)
  • Regulatory reform in the global economy (035 includes OCM; Cataloged by University of Georgia)
  • +Globalization and progressive economic policy (040 includes OCL; cataloged by Library of Congress, edited by British Libra ry)
  • Regulatory reform in Spain (cataloged by University of Alabama)
  • Challenges to globalization (001 contains ocn; cataloged by University of Texas)

*The FAQs are not, however, determinative of anything. The Policy makes this clear:

“This Policy is the final, complete and exclusive statement of the agreement of the partiwith respect to the subject matter hereof.”

Similarly problemmatic is the claim that OCLC will not be asking libraries to shut down Z39.50 connections. The Policy makes it clear that libraries cannot “Transfer” records to companies or for “Unreasonable use” (ie., building up a free database of library records). Since companies and entities like the Open Library aren’t going to agree to the Policy, how exactly can a library avoid violating their contractual agreement if they don’t shut down Z39.50 connections?

Labels: copyright, department of commerce, department of defense, department of labor, federal libraries, freedom, library of congress, oclc, openness

Sunday, December 7th, 2008

The Elusive Moose and OCLC

Over the next few weeks I’m going to try to approach this issue in a number of different ways. Here’s a first try.

Thought experiment. I walk into the Portland, ME public library and look up The Elusive Moose. Who owns the database record, with the title and subjects and so forth?

Who owns it? Joan Gannij wrote the book, and Clare Beaton illustrated it. Barefoot Books of Cambridge, MA published it.

To qualify for the Library of Congresses Cataloging in Print program, Barefoot Books filled out forms and submitted the basic data. The Catalogers at the Library of Congress used that and some sample chapters and made the basic record record from that, providing the publisher with the core cataloging information it printed in the book.

Then people at three other companies improved the record–Ingram Library Services, Baker & Taylor (twice), and Yankee Book Peddler. After that catalogers at two public libraries worked on it–the Vancouver Public Library and the Southfield Public Library. Finally the Anchorage, Alaska School District added the finishing touches. No doubt they know from moose! (LibraryThing, located in Portland, ME knows about moose too!)

Whose record is it? The authors? Publisher? The companies? The public libraries? The school library? How about me, or nobody? Aren’t libraries supposed to be about free access to information?

The Answer. Right now, it’s unclear. Probably no one owns it. The Library of Congress did the most work, and, by law, their work is free to all. And anyway, the record is composed of facts, which can’t be copyrighted.

Come February, however, it will acquire a new owner, an organization known to few Americans and accountable to fewer, the Online Computer Library Center (OCLC) of Dublin, Ohio. The contribution came late–after the Library of Congress had created the base record they uploaded it to OCLC so other libraries could have access to it. And their contribution was minimal–warehousing 1k of data and sending it over the (free) internet. And for that work they were very well paid, both directly and for the services they offer on top.

OCLC’s new license purports to offer carrots to libraries. But it’s mostly carrots from their own gardens. And it comes at a steep legal price, transforming the legal relationship between librarians and their labor, and making everyone else come begging to Dublin for information about books. OCLC will be asserting a perpetual, retroactive and explicitly viral license over the records–as good as ownership. The OCLC policy that will cover many if not most library records in the world, even at the LC and other national libraries, and is designed to spread to derivative works.* All use will be on OCLC terms–which, of course, like any such license, they can change at any time. The terms shut down the Open Library, a giant open-data cataloging project sponsored by the non-profit Internet Archive. And they shut down all commercial use of records–including LibraryThing’s, unless we go through their new owner.

Petitions. If this bothers you as much as it does me, check out the Stop the OCLC Powergrab Petition, put up by Aaron Swartz, Tech Lead at Open Library. Aaron also wrote an excellent blog post about the the issue.

If you’re a librarian, check out Elaine Sanchez’s Petition for OCLC to Collaboratively Re-write Policy for Use and Transfer of WorldCat Records.

BTW: Don’t worry too much about LibraryThing. One way or another we’ll get through this. More and more I’m confident either the Policy will change and OCLC will embrace and lead a future of openness and collaboration, or opposition to it will create what OCLC is trying to prevent—a free and open repository of high-quality bibliographic data.

*There are millions of “OCLC-derived” records at the LC. I think I’m going to write my next post trying to figure out what the Policy means for the LC and other federally-funded libraries.

Labels: moose, oclc

Thursday, November 20th, 2008

OCLC Policy Re-re-released, now in unfriendly PDF

After releasing their new records policy, pulling it back and re-releasing it, I put together a much-appreciated simple “diff,” using MediaWiki’s history feature. It was easy to do, once someone found a cached copy of the original, since both were HTML documents.

Now OCLC has released a third version of the policy, this time in PDF. The new version is harder to manipulate. (Hasn’t anyone at OCLC read Jacob Nielsen.) Adobe PDF and Preview mangle and rearrange the text when cut-and-pasted.

Is there any kind soul out there who wants to whip the text into shape and post it on the wiki page so we can do yet another diff?

Update: After wrangling with the text—sent by various people—it looks like this is going to be very hard to do. The text differs in all sorts of minor formatting ways that throw off the diff. Besides, OCLC will probably just release another version next week—no doubt in JPEG, or carved, like the Behistun Inscription, where only the gods can read it.

Labels: oclc, worldcat

Wednesday, November 5th, 2008

OCLC Policy Re-released; Wiki shows changes.

After releasing their new records policy and pulling it back almost immediately. OCLC has released a revised version.

I took the original version and the revised version and put them into the LibraryThing wiki. Mediawiki software has an excellent “compare” function.

By clicking the link below you can see the new policy and the changes that have happened:

http://www.librarything.com/wiki/index.php?title=OCLC_Policy_Changes&diff=11748&oldid=11747

Obviously, I am posting both policies as an aid to understanding and commentary. OCLC retains the copyright and if they want me to take down the comparison, I will be only too glad.

Labels: oclc

Sunday, November 2nd, 2008

OCLC Policy Change

Here it is: http://www.oclc.org/worldcat/catalog/policy/.

No comment, as of now. Frankly, I haven’t even read it. February is a long time away. Long enough to discover we’re okay, make a deal or copy OCLC from scratch using nothing but periwinkle ink and passionate book lovers’ time.

Update: Depressing analysis: Terry’s Worklog. Wow.

Update #2: The non-legal page remains up, but the legalese page was taken down very early this morning

“We are reconsidering some aspects of the policy. More information will be available in the near future.”

Damn. I wish I had remembered to copy and paste. Does anyone have the original text? (For example, in your browser cache? I browse cache-less, unfortunately.)

Update #3: See Inkdroid pointing out the “viral” nature of the policy. Over a few years the libraries that now get their data from the Library of Congress, bypassing OCLC, will find uninfected records increasingly scarce. They’ll be forced to join OCLC—or do all their own original cataloging.

Update #4: A librarian-blogger managed to take a snapshot before OCLC took it down, here.

Update #4: Does anyone get Publishers Lunch Plus? Apparently it has an article called “WorldCatFight.” I don’t know the terms on forwarding that, but if it’s legal, can someone send me a copy?

It would certainly be good if publishers got into this. In my fantasy, publishers “pull a reverse-OCLC” and require unlicensed distribution of records derived from their data. Publishers have want their data out there, not restricted, and since OCLC records often start at publishers, this would shut down OCLC’s data-monopoly plans.

Update #5: The terms kill off the Open Library project completely. Not only does it involve viral terms—terms that OL could enver accept—but OCLC libraries are prohibited from participating in anything that “substantially replicates the function, purpose, and/or size of WorldCat, for example for the purpose of providing cataloging services to libraries or other organizations.”

I think that means it kills Talis too.

Update #6: Edward Corrado has an excellent summary of some of the issues.

Update #7: Jonathan Rochkind wrote a good explanation of the difference between an open source viral license—designed to keep things open—and an OCLC viral license—designed to keep them closed. He also suggests a remedy—give OCLC a virus instead, by add an Open Data license to everything your library catalogs!

Labels: oclc

Saturday, June 14th, 2008

OCLC’s non-profit status

The New York Times ran an interesting story on non-profits that act like businesses. Apparently a number of states are taking a hard look at charities that “give nothing away,” or have amassed vast wealth. A lot of day-care centers are worried, as is Harvard, where the endowment tops the GDP of more than 100 counties.*

Of course, my mind went to OCLC, the Dublin, Ohio-based global library-data organization.

OCLC’s core business involves maintaining a central database of cataloging records, largely created by others, which member libraries pay to access. That OCLC was a great invention can hardly be denied. Personally, I think it has become a relic and an danger to the future of libraries. Agree with me on this or not, there’s no question it is highly profitable—driving a steady stream of acquisitions—and in its fee structure calls into question the core idea of the non-profit.

So, why hasn’t someone take away OCLC’s non-profit status?

I Googled it up, and discovered that someone DID! In 1984 Ohio state courts stripped OCLC of it’s charitable status on those very grounds:

“(A)lthough OCLC’s service may greatly enhance the ability of libraries to better serve the public, OCLC essentially offers a product to charitable institutions, for a fee exceeding its cost, and, as the board concluded, is not itself a charitable organization.”

So, what happened?

It seems the Ohio legislature passed some sort of private bill removing Ohio organizations involved in “library technology development” (and starting with the letter “O”?) from the court’s requirements. Well, I guess that’ll do it.

UPDATE: I’m working up a presentation on why OCLC’s (also unfree) Dewey Decimal System needs to be killed-off, and what distributed, open classification could replace it. I’m all ears for anti-Dewey examples. And if any bright young cataloger with no love of Dewey wants to talk to me about heading up the effort, I’d love to hear from you.


*$35 billion, doing a quick check against Wikipedia. Of course, GDP is wiggly as heck.

Labels: Dewey Decimal Classification, oclc, tax exemption

Friday, February 15th, 2008

ThingISBN adds LCCNs, OCLC numbers

ThingISBN, our popular ISBN-based API, supports and returns data for two more identifiers: LCCN and OCLC.

At core, ThingISBN—blogged before here and here—takes an ISBN and returns a simple XML list of other ISBNs, corresponding to other “editions” of the work, eg.

http://www.librarything.com/api/thingISBN/0590353403

Now, if you add &allids=1 to the ISBN, the XML will include relevant LCCN and OCLC numbers, eg.

http://www.librarything.com/api/thingISBN/0590353403&allids=1

You can also feed ThingISBN both numbers, eg.,

http://www.librarything.com/api/thingISBN/lccn97039059
http://www.librarything.com/api/thingISBN/ocm37975719

If you feed it an LCCN or an OCLC number you don’t need to add “&allids=1” to get back these identifiers.

What’s next?

  • I haven’t added LCCNs and OCLC numbers to the ThingISBN feed, yet.
  • Although there are some details to be worked out, this advance looks forward to adding support for LCCNs and OCLC numbers to LibraryThing for Libraries.

Tell us what’s going on. I know that ThingISBN gets a lot of use, some of it even in accordance with its Terms of Use. If you’re using ThingISBN, I’d love to hear how on a new wiki page I’ve created, Projects Currently Using ThingISBN.

Caveat. ThingISBN is free for non-commercial use. Commercial use requires our say-so. Read more here.

In the news! Coincidentally, LCCNs are in the news this week. Yesterday, the Library of Congress announced a “LCCN Permalink,” a smart bid to convert a vital but underused set of permanent, unique IDs, the LCCN (Library of Congress Control Number), into the regnant permanent, unqiue ID, the URL. See Catalogablog for the announcement.

Labels: apis, lccn, lccns, oclc, oclc numbers, thingisbn

Tuesday, January 8th, 2008

While you were sleeping, ThingISBN got better.

LibraryThing does a lot of cool things nobody else does. And, as we grow, we do them better and better.

I’ve got a very good example for today: the ThingISBN service. It was good when it was launched more than a year ago, becoming LibraryThing’s first API, and it’s been getting better ever since. (And where its competitor became a paid service, ThingISBN is still free for non-commercial use.)

The ThingISBN service provides something called “edition disambiguation.” Give it an ISBN and it will shoot back a list of “related” ISBNs—other editions, other media, and translations. Edition disambiguation is valuable stuff. Retailers use it to aggregate reviews and other data across editions, and to sell you something when the book you searched for is no longer available. Libraries use it to make sure a patron leaves with a copy of a book, even if the edition the patron searched for is checked out.

You can get ThingISBN in two ways:

  • As a REST-based API. Just change the ISBN in this URL as needed.
  • As a complete feed (thingISBN.xml.gz in /feeds). We ask that people not hit the API more than 1,000 times per day. Instead, pick up the full feed.

What’s cool here? LibraryThing isn’t the only supplier of this data. The other supplier, OCLC, the Dublin-Ohio based library data organization, compiles its data through clever automated analysis of OCLC’s billion-plus records. Their data and algorithms do a great job. Unfortunately, they charge for the service, called xISBN.

LibraryThing does it differently, relying instead on members, who add, combine and separate editions by the thousands every day. For doing this, LibraryThing members get better connections with other users. That is, you gain connections and enhanced recommendations by connecting your edition with others. The result is a detailed list set of correspondences between editions, assembled by thousands and improving every day.

You’ve got to admit it’s getting better. If you improve every day, you can get pretty good, and that’s what’s happened to ThingISBN. OCLC still beats LibraryThing in quantity, but LibraryThing is closer, and, it seems to me, has a clear advantage for paperbacks.

I want to revist some of the examples I gave when ThingISBN debuted:

  • OCLC’s canonical example is Frank Herbert’s Dune. I don’t have the exact counts, but LibraryThing originally trailed OCLC. (I know because I used it as example in a number of talks.) As of now, however, LibraryThing has passed OCLC, with 89 ISBNs to OCLC’s 80.
  • Peter Green, Alexander of Macedon. When ThingISBN started, both LibraryThing and OCLC knew the recent hardback, and one other edition. That is, LibraryThing knew the paperback and OCLC knew the 1974 first edition. Since then, LibraryThing has discovered the first edition, giving it three ISBNs; OCLC still doesn’t know about the paperback.
  • Lee Strobel, The Case for a Creator. OCLC knew of two editions, LibraryThing eight. OCLC now knows three, LibraryThing eleven. It’s about paperbacks, obviously.
  • Emily Bronte, Wuthering Heights. Originally LibraryThing had 92 ISBNs, OCLC a commanding 326 ISBNs. OCLC is still in the lead, with 424 ISBNs, but LibraryThing has more than tripled its count, to 285.

Now, I’m quite sure that, overall, OCLC’s xISBN service still beats LibraryThing in coverage. LibraryThing only covers 2.7 million ISBNs. OCLC must cover more.

But LibraryThing is gaining. It’s getting better faster.

And while OCLC continues to sink resources into the project, including staff, now a paid service for all but minimal use as part of its Peace-is-War-ish Openly division, I can tell you honestly that I haven’t touched ThingISBN in six months. I haven’t made it better, even a little. Members made it better.

Now as then, that’s pretty revolutionary stuff.

See you next January, OCLC.

Labels: apis, frbr, oclc, thingisbn, work disambiguation, xisbn

Friday, May 18th, 2007

Why I joined OCLC …

… is the title of a short Library Journal piece by Roy Tennant. In it, Roy, a popular and much respected library speaker and author explains his decision to leave the California Digital Library and take a job at OCLC.

Roy’s decision drew some flack among anti-OCLC librarians and related pundits who view OCLC as—in Steve Oberg’s phrase—the “Microsoft of the library world.”

I’m in that camp, as Roy knows well. After we presented in the same session at Computers in Libraries Roy and I went out to dinner, with another prominent librarian. I subjected them both to a long, Greek-food-fueled rant about open data and the problems with OCLC and its approach to the web. I had my shot. A day or two later, he announced he was moving to OCLC. Apparently I didn’t convince him! 🙂

OCLC needs people like Roy—passionate librarians with a vision for the future. If OCLC is to change, people like Roy are going to be the ones to do it. I have great hopes for him there.

But they’re not going to do it alone. People on the outside are going to change OCLC too. They’re going to keep the pressure on. I applaud that Roy took the time to explain his move, and his vision for OCLC and the future of libraries. He was eloquent and persuasive. But I’m also glad he felt he had to.

Labels: oclc, roy tennant

Thursday, April 12th, 2007

WorldCat: Think locally, act globally

OCLC just announced a “pilot” of WorldCat Local. In essence, WorldCat local is OCLC providing libraries with a OPAC.

That’s the news. Here’s the opinion. Talis’ estimable Richard Wallis writes:

“Yet another clear demonstration that the library world is changing. The traditional boundaries between the ILS/LMS, and library and non-library data services are blurring. Get your circulation from here; your user-interface from there; get your global data from over there; your acquisitions from somewhere else; and blend it with data feeds from here, there and everywhere is becoming more and more a possibility.”

I think this is exactly wrong. OCLC isn’t creating a web service. They’re not contributing to the great data-service conversation. They’re trying to convert a data licensing monopoly into a services monopoly. If the OCLC OPAC plays nice with, say, the Talis Platform, I’ll eat my hat. If it allows outside Z39.50 access I’ll eat two hats.

They will, as the press release states “break down silos.” They’ll make one big silo and set the rules for access. The pattern is already clear. MIT thought that its bibliographic records were its own, but OCLC shut them down when they tried to act on that. The fact is, libraries with their data in OCLC are subject to OCLC rules. And since OCLC’s business model requires centralizing and restricting access to bibliographic data, the situation will not improve.

As a product, OCLC local will probably surpass the OPACs offered by the traditional vendors. It will be cleaner and work better. It may well be cheaper and easier to manage. There are a lot of good things about this. And—lest my revised logo be misunderstood—there are no bad people here. On the contrary, OCLC is full of wonderful people—people who’ve dedicated their lives to some of the highest ideals we can aspire. But the institution is dependent on a model that, with all the possibilities for sharing available today, must work against these ideals.

Keeping their data hidden, restricted and off the “live” web has hurt libraries more than we can ever know. Fifteen years ago, libraries were where you found out about books. One would have expected that to continue on the web–that searching for a book would turn up libraries alongside bookstores, authors and publishers.

It hasn’t worked out that way. Libraries are all-but-invisible on the web. Search for the “Da Vinci Code” and you won’t get the Library of Congress–the greatest collection of books and book data ever assembled–not even if you click through a hundred pages. You do get WorldCat, seventeen pages in!

The causes are multiple, and discussed before. But a major factor is how libraries deal with book data, and that’s largely a function of OCLC’s business model. Somehow institutions dedicated to the idea that knowledge should be freely available to all have come to the conclusion that knowledge about knowledge—book data—should not, and traditional library mottos like Boston‘s “Free to All” and Philadelphia‘s Liber Libere Omnibus (“Free books for all!”) given way to:

“No part of any Data provided in any form by WorldCat may be used, disclosed, reproduced, transferred or transmitted in any form without the prior written consent of OCLC except as expressly permitted hereunder.”

We now return you to our regularly-scheduled blogging.

Labels: library of congress, oclc, open data, worldcat local

Thursday, March 15th, 2007

thingISBN data in one file

thingISBN is a simple API for discovering related editions. Give it an ISBN and it returns a list of other ISBNs—different formats, translations, etc. We offer the API free for non-commercial use. Today we’re releasing thingISBN in one giant feed, under the same conditions.*

thingISBN is based on LibraryThing’s first-of-its-kind “work” system, by which regular people—LibraryThing members, mostly—combine and separate editions. Members run over 2,000 work-combination actions per day. Although some do it for pure altruism, combining editions helps LibraryThing users by improving the quality of their connections.

LibraryThing’s results compare very favorably with its competition, OCLC’s xISBN service (also free for non-commercial use). xISBN’s coverage is better, but where LibraryThing is built on the collective judgment of humans, xISBN is just a computer algorithm. As the fella says, xISBN is “based on a world which is built on rules and because of that, [it] will never be as strong or as fast as [thingISBN] can be.”**

APIs, while nifty, can be a pain. Both thingISBN and xISBN have a 1,000-per-day limit. So, starting today, thingISBN is also available in feed format—one giant XML file with all the data from over two million unique ISBNs.

Here’s a sample file with just 1000 ISBNs:
http://www.librarything.com/feeds/thingISBN_small.xml

As you can see, the format is not ISBN-to-ISBNs. This would involve too much repetition—the full XML file is already 96MB! Instead, it goes work by work, listing the ISBNs inside them:

<work workcode="183">
<isbn>0802150845</isbn>
<isbn>0802143008</isbn>
<isbn>2020006014</isbn>
<isbn>0745300359</isbn>
<isbn>0394179900</isbn>
<isbn>9867574397</isbn>
<isbn uncertain="true">999107371X</isbn>
</work>

This format should go into a database well, e.g.,

CREATE TABLE isbn_to_work (
itw_workcode mediumint(8) unsigned NOT NULL,
itw_isbn char(13) NOT NULL,
itw_uncertain tinyint(4) NOT NULL default '0',
PRIMARY KEY (itw_workcode,itw_isbn)
)

As you can see, some ISBNs are listed as “uncertain.” This happens when an ISBN crosses works. In a perfect world, these works would be combined, but LibraryThing doesn’t do it automatically. There are a couple ways that can go wrong. For example “great books” sets often sport a single ISBN across volumes. It wouldn’t do to combine “Pride and Prejudice” with “Moby Dick” just because their publisher wouldn’t pony up for two ISBNs.

So, you can use the “uncertains” if you are willing to accept more errors. Otherwise, ignore them.

The feed itself is in http://www.librarything.com/feeds/ and is called “thingISBN.xml.gz”. It is 16MB compressed.

We’d love to hear what people are doing with the data.

*Commercial use requires our permission. See http://www.librarything.com/api.php.
**Okay, the comparison in inexact, but OCLC does have a “Matrix” feel to it.

Labels: apis, frbr, oclc, thingisbn, works, xisbn