Archive for the ‘google book search’ Category

Friday, December 17th, 2010

Romeo and Juliet, with—Get your mind out the gutter!

Today Google released its Books Ngram Viewer, a remarkable statistical snapshot of the books in Google. The New York Times did an nice piece on it.

So I went to work on it. My guess was that, like much else with Google books, the data was ratty. It didn’t have to look far. At first glance this chart appears to show that “fuck” had a remarkable early history—being more popular in 1725 than even today! (link)

Don’t get too excited. A quick search on the phrase in books between 1700 and 1800 treed the cause:

Yes, Google can’t tell between an f and an ſ, the “s without a bar” more properly known as a long, descending or medial s. To the disappointment of many, Shakespeare wrote “suck’d.” The effect pops up all over. Here’s a graph of “crimſon” vs. “crimson.” If nothing else we can now follow the demise of the ſ with precision.

There’s no question this is a cool tool. But given Google’s grand ambitions and how common s is in English, it’s a pretty startling lapse.

Labels: google, google book search, humor

Wednesday, July 30th, 2008

Google goes after the Library of Congress for “mature content”

UPDATE: They relented. Woo-hoo!

LibraryThing shows Google Adsense ads on a small number of templates. The ads appear only if you’re not a member at all—paid or unpaid. They don’t make much money, but we’ve never had a problem with them.

Today I got a form letter from Google, alerting me that Google had detected “adult or mature content” on LibraryThing. They gave one example, the LibraryThing.fr page for the Library of Congress Subject Heading (LCSH) “Erotic stories.” No doubt some algorithm caught a few keywords, like “sex” or the common porn-word “Lolita” (it’s a book, guys).

Needless to say, they run ads against most of these books on Google Book Search. Our competitors, who all rely on Google Adsense for all their revenue run ads against the same books, apparently without incident (although, I suppose, one can hope!). I must therefore conclude, the problem is the Library of Congress Subject Headings, and that it’s a good thing the Sandy Berman-inspired LCSH “Strap-on Sex” hasn’t made it into LibraryThing yet!

A follow-up email triggered another form-letter, including the helpful suggestion to remove content like:

“image or video content containing lewd or provocative poses, strategically covered nudity, see-through or sheer clothing, and close-ups of breasts, butts, or crotches.”

I have accordingly been consulting with Casey on how to remove all the butt-shots from the Yale University MARC records.

I have three days to comply or be terminated. So, what do I do? Clearly I’m not getting anywhere with their response system. And LibraryThing has something like 100-millon pages. Should I start running pages against keyword lists before showing Google Ads?

That sounds like a big pain, I’ll tell you—and not worth it.

Labels: ads, google, google book search

Friday, June 6th, 2008

Covers from Google: Too good to be true?


I created this cover (well except for van Gogh’s contribution). You may use it!

A few months ago when the Google Book Search API came out, I was among the first notice that GBS covers could be used to deck-out library catalogs (OPACs) with covers, potentially bypassing other providers, like Amazon and Syndetics. I subsequently promoted the idea loudly on a Talis podcast, where a Google representative ducked licensing questions, giving what seemed like tacit approval.

It seemed so great–free covers for all. Unfortunately, it now seems that it was too good to be true. At a minimum, the whole thing is thrown into confusion.

After some delay, Google has now posted–for the first time–a “Terms of Use” for the Google Book Search API (http://code.google.com/apis/books/terms.html). If you’re planning to use GBS data, you should be sure to read it.

The back story is an interesting one. Soon after I wrote and spoke about the covers opportunity, a major cover supplier contacted me. They were miffed at me, and at Google. Apparently a large percentage of the Google covers were, in fact, licensed to Google by them. They never intended this to be a “back door” to their covers, undermining their core business. It was up to Google to protect their content appropriately, something they did not do. For starters, the GBS API appears to have gone live without any Terms of Service beyond the site-wide ones. The new Terms of Service is, I gather, the fruit of this situation.

Now, I am not a lawyer and I am not a reporter. I don’t know who, if anyone, messed up. Nor do I fully understand what the new Terms of Service requires or allows. Although I am told they put the kibosh on using GBS as a replacement for other cover providers, I can’t find a straightforward prohibition on using GBS for covers, primarily or secondarily. But it starts out with the statement that:

“The Google Book Search API is not intended to be a substitute or replacement of products or services of any third party content provider.”

And there are other concerning clauses. There is a vague bullet about not posting content that infringes any other parties’ “proprietary rights.” And there are clauses that should give pause to many on the library-tech listservs–about not reordering results, not crawling, not caching, and so forth.

My interest in free data is well known. I think the days of selling covers—something publishers give out for free—are passing away. But if this happens, it must be done fairly. Those who provide proprietary data should be able to protect it, at least as far the law allows them to. (Since no data suppier can “copyright” their cover images, any restrictions must be based in licenses.*) Those of us who argue for free data** must respect this. That’s the difference between “free as in freedom” and “free as in ‘fell of a truck.'”

Meanwhile, being among the most vocal proponents of using GBS for covers—and having no idea the covers’ weren’t Google’s to do with what they pleased—I have been asked to sensitize librarians that “some of this content is licensed and they need to be respectful of infringement issues.”

So, that’s the word. Now if I only understood it.


*And, I gather, there is some doubt about “posted” licenses on publicly-available websites, as opposed to licenses that require explicit agreement. By the way, did you know that, by reading this, you’ve agreed to dance on the table like a damn fool next time you hear the Gypsy Kings? Do not disregard this license. We’ll know.
**At least those who believe in the right of contract or property.

Labels: book covers, google book search

Sunday, April 13th, 2008

“Library 2.0 Gang” discusses Google Book Search API

Here’s a quick heads-up for those interested in the Google Book Search API. Talis’ new “Library 2.0 Gang,” of which I will be an occasional member, covered the topic

Importantly, they managed to get someone from Google, Frances Haugen, in on the call. Ms. Haugen was diplomatically non-committal about the terms of service, but telegraphed benign latitude.

I ended up talking too much (what’s new), but I did surface the most interesting thing about the GBS API for Libraries: using their API to add free covers to the OPAC, and the rise of JavaScript-based OPAC enhancements. I covered the former here. The latter is also take-away from LibraryThing for Libraries
Check it out here.

Labels: google book search, library 2.0 gang, talis

Saturday, March 15th, 2008

Free covers for your library, from Google

On Wednesday we added integration with Google Book Search, and talked about it on the main blog. We did it together with a number of cool libraries.

My thoughts are still percolating, but I wanted to throw out a piece of my ham-handed JavaScript code. The code gives your library covers, something libraries usually pay for.

This basic grabs cover images from Google. You feed it an ISBN and it gets the cover. It doesn’t link to them. Would they mind? Maybe.

<div id="gbsthumbnail"></div>

<script type="text/javascript">

/* GBS Cover Script by Tim Spalding/LibraryThing */

function addTheCover(booksInfo)
{
for (i in booksInfo)
{
var book = booksInfo[i];
if (book.thumbnail_url != undefined)
{
document.getElementById('gbsthumbnail').innerHTML =
'<img src="' + book.thumbnail_url + '"/>';
}
}
}

</script>

<script src="http://books.google.com/books?jscmd=viewapi&bibkeys=ISBN:0670880728&callback=addTheCover"></script>

Here’s a version that links to them, but only if they have a full version. Surely they wouldn’t mind this.

<div id="gbsthumbnail"></div>
<div id="gbslink"></div>

<script type="text/javascript">

/* GBS Cover Script by Tim Spalding/LibraryThing */

function addTheCover(booksInfo)
{
var gbsnameA = new Array("No information", "Book info", "Partial view", "Full view");

for (i in booksInfo)
{
var book = booksInfo[i];

var quality = 0;
if(book.preview == "noview") { quality = 1; }
if(book.preview == "partial") { quality = 2; }
if(book.preview == "full") { quality = 3; }

if (book.thumbnail_url != undefined)
{
document.getElementById('gbsthumbnail').innerHTML =
'<img src="' + book.thumbnail_url + '">';
}
if (quality > 3)
{
document.getElementById('gbslink').innerHTML =
"<a href='" + book.preview_url + "'>" + "Google Books: " + gbsnameA[quality] + "</a>";
}
}
}

</script>

<script src="http://books.google.com/books?jscmd=viewapi&bibkeys=ISBN:0670880728&callback=addTheCover"></script>

So, book covers for the price of an occasional link to Google. Sounds like a good deal to me!

If this saves your library money, consider getting LibraryThing for Libraries. We’re clever all over.

Labels: code, gbs, google book search, javascript