Archive for the ‘google book search’ Category

Friday, December 17th, 2010

Romeo and Juliet, with—Get your mind out the gutter!

Today Google released its Books Ngram Viewer, a remarkable statistical snapshot of the books in Google. The New York Times did an nice piece on it.

So I went to work on it. My guess was that, like much else with Google books, the data was ratty. It didn’t have to look far. At first glance this chart appears to show that “fuck” had a remarkable early history—being more popular in 1725 than even today! (link)

Don’t get too excited. A quick search on the phrase in books between 1700 and 1800 treed the cause:

Yes, Google can’t tell between an f and an ſ, the “s without a bar” more properly known as a long, descending or medial s. To the disappointment of many, Shakespeare wrote “suck’d.” The effect pops up all over. Here’s a graph of “crimſon” vs. “crimson.” If nothing else we can now follow the demise of the ſ with precision.

There’s no question this is a cool tool. But given Google’s grand ambitions and how common s is in English, it’s a pretty startling lapse.

Labels: google, google book search, humor

Wednesday, July 30th, 2008

Google goes after the Library of Congress for “mature content”

UPDATE: They relented. Woo-hoo!

LibraryThing shows Google Adsense ads on a small number of templates. The ads appear only if you’re not a member at all—paid or unpaid. They don’t make much money, but we’ve never had a problem with them.

Today I got a form letter from Google, alerting me that Google had detected “adult or mature content” on LibraryThing. They gave one example, the LibraryThing.fr page for the Library of Congress Subject Heading (LCSH) “Erotic stories.” No doubt some algorithm caught a few keywords, like “sex” or the common porn-word “Lolita” (it’s a book, guys).

Needless to say, they run ads against most of these books on Google Book Search. Our competitors, who all rely on Google Adsense for all their revenue run ads against the same books, apparently without incident (although, I suppose, one can hope!). I must therefore conclude, the problem is the Library of Congress Subject Headings, and that it’s a good thing the Sandy Berman-inspired LCSH “Strap-on Sex” hasn’t made it into LibraryThing yet!

A follow-up email triggered another form-letter, including the helpful suggestion to remove content like:

“image or video content containing lewd or provocative poses, strategically covered nudity, see-through or sheer clothing, and close-ups of breasts, butts, or crotches.”

I have accordingly been consulting with Casey on how to remove all the butt-shots from the Yale University MARC records.

I have three days to comply or be terminated. So, what do I do? Clearly I’m not getting anywhere with their response system. And LibraryThing has something like 100-millon pages. Should I start running pages against keyword lists before showing Google Ads?

That sounds like a big pain, I’ll tell you—and not worth it.

Labels: ads, google, google book search

Friday, June 6th, 2008

Covers from Google: Too good to be true?


I created this cover (well except for van Gogh’s contribution). You may use it!

A few months ago when the Google Book Search API came out, I was among the first notice that GBS covers could be used to deck-out library catalogs (OPACs) with covers, potentially bypassing other providers, like Amazon and Syndetics. I subsequently promoted the idea loudly on a Talis podcast, where a Google representative ducked licensing questions, giving what seemed like tacit approval.

It seemed so great–free covers for all. Unfortunately, it now seems that it was too good to be true. At a minimum, the whole thing is thrown into confusion.

After some delay, Google has now posted–for the first time–a “Terms of Use” for the Google Book Search API (http://code.google.com/apis/books/terms.html). If you’re planning to use GBS data, you should be sure to read it.

The back story is an interesting one. Soon after I wrote and spoke about the covers opportunity, a major cover supplier contacted me. They were miffed at me, and at Google. Apparently a large percentage of the Google covers were, in fact, licensed to Google by them. They never intended this to be a “back door” to their covers, undermining their core business. It was up to Google to protect their content appropriately, something they did not do. For starters, the GBS API appears to have gone live without any Terms of Service beyond the site-wide ones. The new Terms of Service is, I gather, the fruit of this situation.

Now, I am not a lawyer and I am not a reporter. I don’t know who, if anyone, messed up. Nor do I fully understand what the new Terms of Service requires or allows. Although I am told they put the kibosh on using GBS as a replacement for other cover providers, I can’t find a straightforward prohibition on using GBS for covers, primarily or secondarily. But it starts out with the statement that:

“The Google Book Search API is not intended to be a substitute or replacement of products or services of any third party content provider.”

And there are other concerning clauses. There is a vague bullet about not posting content that infringes any other parties’ “proprietary rights.” And there are clauses that should give pause to many on the library-tech listservs–about not reordering results, not crawling, not caching, and so forth.

My interest in free data is well known. I think the days of selling covers—something publishers give out for free—are passing away. But if this happens, it must be done fairly. Those who provide proprietary data should be able to protect it, at least as far the law allows them to. (Since no data suppier can “copyright” their cover images, any restrictions must be based in licenses.*) Those of us who argue for free data** must respect this. That’s the difference between “free as in freedom” and “free as in ‘fell of a truck.'”

Meanwhile, being among the most vocal proponents of using GBS for covers—and having no idea the covers’ weren’t Google’s to do with what they pleased—I have been asked to sensitize librarians that “some of this content is licensed and they need to be respectful of infringement issues.”

So, that’s the word. Now if I only understood it.


*And, I gather, there is some doubt about “posted” licenses on publicly-available websites, as opposed to licenses that require explicit agreement. By the way, did you know that, by reading this, you’ve agreed to dance on the table like a damn fool next time you hear the Gypsy Kings? Do not disregard this license. We’ll know.
**At least those who believe in the right of contract or property.

Labels: book covers, google book search

Sunday, April 13th, 2008

“Library 2.0 Gang” discusses Google Book Search API

Here’s a quick heads-up for those interested in the Google Book Search API. Talis’ new “Library 2.0 Gang,” of which I will be an occasional member, covered the topic

Importantly, they managed to get someone from Google, Frances Haugen, in on the call. Ms. Haugen was diplomatically non-committal about the terms of service, but telegraphed benign latitude.

I ended up talking too much (what’s new), but I did surface the most interesting thing about the GBS API for Libraries: using their API to add free covers to the OPAC, and the rise of JavaScript-based OPAC enhancements. I covered the former here. The latter is also take-away from LibraryThing for Libraries
Check it out here.

Labels: google book search, library 2.0 gang, talis

Saturday, March 15th, 2008

Free covers for your library, from Google

On Wednesday we added integration with Google Book Search, and talked about it on the main blog. We did it together with a number of cool libraries.

My thoughts are still percolating, but I wanted to throw out a piece of my ham-handed JavaScript code. The code gives your library covers, something libraries usually pay for.

This basic grabs cover images from Google. You feed it an ISBN and it gets the cover. It doesn’t link to them. Would they mind? Maybe.

<div id="gbsthumbnail"></div>

<script type="text/javascript">

/* GBS Cover Script by Tim Spalding/LibraryThing */

function addTheCover(booksInfo)
{
for (i in booksInfo)
{
var book = booksInfo[i];
if (book.thumbnail_url != undefined)
{
document.getElementById('gbsthumbnail').innerHTML =
'<img src="' + book.thumbnail_url + '"/>';
}
}
}

</script>

<script src="http://books.google.com/books?jscmd=viewapi&bibkeys=ISBN:0670880728&callback=addTheCover"></script>

Here’s a version that links to them, but only if they have a full version. Surely they wouldn’t mind this.

<div id="gbsthumbnail"></div>
<div id="gbslink"></div>

<script type="text/javascript">

/* GBS Cover Script by Tim Spalding/LibraryThing */

function addTheCover(booksInfo)
{
var gbsnameA = new Array("No information", "Book info", "Partial view", "Full view");

for (i in booksInfo)
{
var book = booksInfo[i];

var quality = 0;
if(book.preview == "noview") { quality = 1; }
if(book.preview == "partial") { quality = 2; }
if(book.preview == "full") { quality = 3; }

if (book.thumbnail_url != undefined)
{
document.getElementById('gbsthumbnail').innerHTML =
'<img src="' + book.thumbnail_url + '">';
}
if (quality > 3)
{
document.getElementById('gbslink').innerHTML =
"<a href='" + book.preview_url + "'>" + "Google Books: " + gbsnameA[quality] + "</a>";
}
}
}

</script>

<script src="http://books.google.com/books?jscmd=viewapi&bibkeys=ISBN:0670880728&callback=addTheCover"></script>

So, book covers for the price of an occasional link to Google. Sounds like a good deal to me!

If this saves your library money, consider getting LibraryThing for Libraries. We’re clever all over.

Labels: code, gbs, google book search, javascript

Saturday, September 22nd, 2007

Magical Thinking at Harvard

A Babylonian Demon Bowl (Kelsey Museum)

“Know the secret name of something and you control it,” is an extremely ancient idea, stretching as far back as the Sumerians, and running through subsequent Mesopotamian, Egyptian and Greco-Roman magic. The secrecy of the name was critical to its power, and to the mystique of those who knew it. One suspects it also helped their hourly rates.

It’s modern equivalent is the “unique identifier.” Information is available as never before, but its sheer quantity limits discovery. Unique identifiers cut through the clutter. And they can be powerful. Let the wrong person know your Social Security Number and you’ll be in a world of hurt as great as a malevolent spirit caught by a name under a Babylonian demon bowl.

In the legal world the equivalent is the West American Digest System, which numbers court cases for lawyers. Although the cases are invariably in the public domain, the numbers that identify them are not. And controlling “the only recognized legal taxonomy” gives its creator, West Publishing, a valuable monopoly.

In the book world, it’s the ISBN. Know a book’s title and you can find yourself away in a sea of editions. Discover its ISBN and you’ve got it for sure. Type the ISBN into BookFinder or Abebooks.com and you’ve a panoply of new and used sellers.

Although assigned by private firms, ISBNs will never go the way of the West American Digest System. But their power explains why the Harvard Coop* has taken to ejecting customers who attempt to write down ISBNs. As reported in the Crimson, this is exactly what happened to one Harvard student, Jarret A. Zafra. In another (?) incident, reported by the Herald, the Coop called the police on three more ISBNs-scribblers.** When asked about the policy, Coop administration told the Crimson that it “considers that information the Coop’s intellectual property.”

The IP claim is hogwash. ISBNs are facts. Under US law facts can’t be copyrighted. The Coop is probably within its rights to expel whomever it wants, bhat won’t stop people from trying. The three students above were volunteers for a site called CrimsonReading.org, which is compiling a complete list of all books used at Harvard. When a Harvard Student types in an ISBN, CrimsonReading connects them to new and used booksellers. Affiliate revenues go to charity. By calling on volunteers and getting Harvard professors involved, CrimsonReading is getting around the Coop’s magical secrecy. Three cheers to them for doing it.

We need more projects like CrimsonReading. Much the same idea was behind my Google Book Search Search bookmarklet, which asked volunteers to collect Google Book Search IDs. In this case, the unique identifier was new and more secret. By giving its scans unique—and effectively secret—numbers, Google is creating a whole new bibliographic identification scheme. And where ISBNs cover only about thirty years of books, Google’s IDs are designed to cover every book printed, including millions in the public domain.

Control the name and you control the thing. It’s what WestLaw is doing. It’s what’s what the Coop is trying to do.

Is it what Google is doing? I’m not sure. And I don’t see any signs of this happening on its own yet. For example, sellers on used book sites are not using Google Book IDs to nail down editions. But the danger is there.

Secret and proprietary numbering systems pose a serious challenge to the benign potential of the internet. When the secrecy or obscurity are used against this potential, people need to act up—and break the spell.


*Always pronounced “coop,” not “coöp.” Full disclosure: My parents belong to the Coop, which is a true “cooperative” in organization. This means they share in the annual dividend accord to how much they spend there. So I’m working against them!
**I grew up near Harvard Square, and the Coop was one of my haunts. (It’s a general-purpose bookstore as well.) Quite a few of my friends were expelled from the Coop for shoplifting. If CrimsonReading really wants to get the job done, it should enroll the private-school street urchins of Square in the ISBN game.

Labels: google book search, harvard coop, isbns, open data, westlaw