Archive for the ‘google’ Category

Friday, December 17th, 2010

Romeo and Juliet, with—Get your mind out the gutter!

Today Google released its Books Ngram Viewer, a remarkable statistical snapshot of the books in Google. The New York Times did an nice piece on it.

So I went to work on it. My guess was that, like much else with Google books, the data was ratty. It didn’t have to look far. At first glance this chart appears to show that “fuck” had a remarkable early history—being more popular in 1725 than even today! (link)

Don’t get too excited. A quick search on the phrase in books between 1700 and 1800 treed the cause:

Yes, Google can’t tell between an f and an ſ, the “s without a bar” more properly known as a long, descending or medial s. To the disappointment of many, Shakespeare wrote “suck’d.” The effect pops up all over. Here’s a graph of “crimſon” vs. “crimson.” If nothing else we can now follow the demise of the ſ with precision.

There’s no question this is a cool tool. But given Google’s grand ambitions and how common s is in English, it’s a pretty startling lapse.

Labels: google, google book search, humor

Thursday, September 17th, 2009

The Amazon policy change, and how we’re responding.

“Amazon Cardboard Boxes” by Flickr member Akira Ohgaki (Attribution 2.0 Generic)

Summary: Amazon is requiring us remove links to other booksellers on work pages. We’re creating a new “Get it Now” page, with links to other booksellers, especially local bookstores and libraries, and a host of new features. Talk about it here.

The challenge. We’re days away from releasing a series of changes to our book pages, both forced and intentional. Amazon is requiring all websites, as a condition of getting any data from them, to have the primary page link to Amazon alone. Links to other booksellers are prohibited. Secondary pages—pages you go to from the primary page—can have non-Amazon links.

Everyone at LibraryThing disagrees with this decision. LibraryThing is not a social cataloging and social networking site for Amazon customers but for book lovers. Most of us are Amazon customers on Tuesday, and buy from a local bookstore or get from a library on Wednesday and Thursday! We recognize Amazon’s value, but we certainly value options.

Importanly, the decision is probably not even good for Amazon. Together with a new request-monitoring system, banning iPhone applications that use Amazon data, and much of their work on the Kindle, Amazon is retreating from its historic commitment to simplicity, flexibility and openness. They won through openness. Their data is all over the web, and with it millions of links to Amazon. They won’t benefit from a retreat here.

But agree or not, we have to follow their terms. We thought long and hard about giving up Amazon data entirely, converting to library data only, in concert with a commercial provider, like Bowker or Ingram, and with help from publishers and members. Unlike our competitors, who are exclusively based on Amazon and who don’t “catalog” so much as keep track of which Amazon items you have, that option is available to us. But we’d lose a lot, particularly book covers. Ultimately, we’ve decided the disadvantages outweigh the benefits.

The Response. Most of all, we think we’ve found a way to give Amazon what they require, and continue to provide members with options: We’re going cut back our primary-page links to Amazon alone, and give people the best, most diverse secondary pages we can make. We are allowed to link to other booksellers, like IndieBound and Barnes and Noble on secondary pages, and we’re going to do it far better than we ever have. We’re going to take something away, but also make something better—something that goes way past what we did before, in features and in diversity of options.

The upcoming “Get it Now” page will go far beyond our current “Buy, borrow, swap” links, with a live new and used price-comparison engine, as well as sections for ebooks, audiobooks and swap sites. The page will be edition-aware, and draw on feeds or live data (so the links work). Many members have wanted live pricing data for the books they already own and these features can be used for that purpose too. We’ll also be doing some stuff with libraries nobody else has, or can, do.

Key to the upcoming Get it Now page is a “Local” module, drawing on LibraryThing Local, showing all the libraries and bookstores near you. Where possible, this list will incorporate holdings data and links to buy—the sort of information you never get from a Google search on a book. If not, we’ll give you their telephone numbers and show you where they are on a map. We’ll make the page customizable, and let members add sources to it.

We think the new page will make a lot of members happy. For one thing, LibraryThing has never been about buying books, so having all these links on a separate page won’t be a great loss. And if the new format doesn’t make members happy, we’ll listen, and together we can plan to take LibraryThing on a truly independent course.

Post your comment here, or come talk about this on Site Talk.

Labels: amazon, apis, google, open data

Wednesday, July 30th, 2008

Google goes after the Library of Congress for “mature content”

UPDATE: They relented. Woo-hoo!

LibraryThing shows Google Adsense ads on a small number of templates. The ads appear only if you’re not a member at all—paid or unpaid. They don’t make much money, but we’ve never had a problem with them.

Today I got a form letter from Google, alerting me that Google had detected “adult or mature content” on LibraryThing. They gave one example, the LibraryThing.fr page for the Library of Congress Subject Heading (LCSH) “Erotic stories.” No doubt some algorithm caught a few keywords, like “sex” or the common porn-word “Lolita” (it’s a book, guys).

Needless to say, they run ads against most of these books on Google Book Search. Our competitors, who all rely on Google Adsense for all their revenue run ads against the same books, apparently without incident (although, I suppose, one can hope!). I must therefore conclude, the problem is the Library of Congress Subject Headings, and that it’s a good thing the Sandy Berman-inspired LCSH “Strap-on Sex” hasn’t made it into LibraryThing yet!

A follow-up email triggered another form-letter, including the helpful suggestion to remove content like:

“image or video content containing lewd or provocative poses, strategically covered nudity, see-through or sheer clothing, and close-ups of breasts, butts, or crotches.”

I have accordingly been consulting with Casey on how to remove all the butt-shots from the Yale University MARC records.

I have three days to comply or be terminated. So, what do I do? Clearly I’m not getting anywhere with their response system. And LibraryThing has something like 100-millon pages. Should I start running pages against keyword lists before showing Google Ads?

That sounds like a big pain, I’ll tell you—and not worth it.

Labels: ads, google, google book search

Thursday, March 13th, 2008

Google Books in LibraryThing

The official Google Blog and the Inside Book Search Blog just announced the new Google Book Search API, with LibraryThing as one of the first implementors. (The others are libraries; I’ll be posting about what they’ve done over on Thingology.)

In sum, LibraryThing now links to Google Books for book scans—full or partial—and book information.

Google Book Search links can be seen two places:

  • In your catalog. Choose “edit styles” to add the column. The column reflects only the exact edition you have.
  • On work pages. The “Buy, borrow, swap or view” box on the right now includes a Google Books section. Clicking on it opens up a “lightbox” showing all the editions LibraryThing can identify on Google Book Search.

Despite the screenshot, of Carroll’s Through the looking glass and what Alice found there, relatively few works have “full” scans. “Partial view” and “book information” pages are more common. But the former generally include sthe cover and table of contents, and the whole text can be searched. The latter can also be useful for cataloging purposes. Members with extensive collections from before 1923—the copyright cutoff—will get relatively more out the feature.

Leave comments here, or come discuss the feature on Talk.

Limitations. The GBS API is a big step forward, but there are some technical limitations. Google data loads after the rest of the page, and may not be instant. Because the data loads in your web browser, with no data “passing through” LibraryThing servers, we can’t sort or search by it, and all-library searching is impossible. You can get something like this if you create a Google Books account, which is, of course, the whole point.

LCCN and OCLC. To get the best results, we needed to add full access to two library standards, namely Library of Congress Control Numbers (LCCN) and OCLC Numbers. We did so, reparsing the original MARC records where necessary. You can see these columns in your catalog now—choose “edit styles” as above. The two columns are not yet editable, but will be so in a day or two.

The Back Story. The rest of the first batch are libraries, including a number of “friends”–Deschutes Public Library, the Waterford Institute of Technology, the University of Huddersfield and Plymouth State/Scriblio. Google wanted help finding potentials and if there’s one thing I have it’s a Rolodex of smoking-hot library programmers! Once I’ve taken in all the neat things they did, I’ll be posting over on Thingology.

Some libraries have chosen to feature Google Book Search links only when Google has the full scan. This makes sense to me. Linking to a no scans or partial scan, when the library has the item on its shelves, seems weird to me.

LibraryThing and its members can also like to take credit for moving the API along in another way. Your help with the Google Book Search Search bookmarklet forced the issue of GBS data. The message to Google was clear: our members wanted to use GBS with LibraryThing, and if Google wouldn’t provide the information, members would get it themselves. After some to-and-fro with Google, we voluntarily disabled the service. But I think it moved the openness ball a few feet, and that’s something for members to be proud of.

Labels: gbs, google, google book search

Monday, September 17th, 2007

Google Book Search … on LibraryThing

Introducing something new we’re calling “Google Book Search Search.”

Google Book Search Search is a bookmarklet that searches Google Book Search for the titles in your LibraryThing library. It works not unlike the famous SETI@Home project. You set it up and searches Google Book Search slowly in the background.* You can watch, do something in another window or go out for coffee.

When it’s done you can link to and search all the books in your library that Google has scanned. You’ll find a “search this book” link on work pages, and a Google Book Search field to add to the list view in your catalog.

But this isn’t just a selfish thing. There’s a lot of searching to do, and you can help. If you choose, you can pitch in and help with others’ books. All of the data gathered is free and available to everyone. A lot of people want a reliable index of what Google has, not least libraries.

What do I do?

Google Book Search Search is a “bookmarklet.” You save it to your “favorites” or “bookmarks.” Then you got to Google Book Search and you click it. You can see what pops up on the right.*** Press start and it will start collecting information.

Here it is: Google Book Search Search

We’ve tested it on FF and Safari on the Mac, and FF and IE7 and IE5.5 on the PC. We haven’t tested it on PC IE6 yet. I have no idea about Opera.

Why a bookmarklet?

We’ve wanted to do this for a long time. But to link to a book on Google reliably you need its Google ID. For some reason Google doesn’t publish these, making it impossible to tell what they have and what they don’t, and impossible for sites like LibraryThing to send them the traffic they want. Secretive and self-defeating? Seems like it to me.

Efforts have been made to collect Google IDs before. The well-known Lib 2.0 blogger John Blyberg tried, as have others. We tried too. The trick is that Google Book Search—like the rest of Google—has a system in place to stop machine queries.**

Making a bookmarklet distributes the work. And because it takes place within a browser, it tends not to trigger machine-collection warnings.

Ultimately, however, Google can put a stop to this. The bookmarklet has a signature. And Google can send us a note, and we’ll disable the bookmarklets. Just as Google respects the robots.txt file, we’ll respect such a request.

Why not use “My Library”?

Last week Google introduced an interesting “My Library” feature, allowing people with Google accounts to list some of their books. A few tech bloggers saw an attack on LibraryThing.

LibraryThing members were quick to dismiss it. It wasn’t so much the lack of any social features, or of cataloging features as basic as sorting your books. It wasn’t even the privacy issues, although these gave many pause. It was the coverage.

Google just doesn’t have the sort of books that regular people have. Most of their books come from a handful of academic libraries, and academic libraries don’t have the same editions regular people have. Then there are the books publishers have explicitly removed from Google Book Search. Success rates of below 50% were common. Of these a high percentage are only “limited preview” or “no preview.”

The Google-kills-LibraryThing meme has another dimension. We WANT people to use Google Book Search. It’s a great tool. Being able to search your own books is useful, and LibraryThing members should be able to do it. Call us naive, but we aren’t going to be able to “pretend Google isn’t there.” And we aren’t convinced that Google is going to create the sort of robust cataloging and social networking features that LibraryThing has.

Our bookmarklet works by transcending ISBNs, using what LibraryThing knows about titles, authors and dates to fetch other editions of a work. In limited tests I’ve found it picks up around 90% of LibraryThing titles.

Information wants to be free

Our commitment to open data is long-standing. We’ve railed against OCLC for its desire to lock up book metadata.

But we’re not railing here. We think it’s perfectly fine for Google to control access to the scans it’s made. All we want to do is link to them, to send them traffic. It’s not clear to us that Google is trying to control access to its ID numbers.

You can see and edit the data here. Full XML downloads of the data are also available there.


*Come to think of it, it works like Google.
**The system is overzealous. It often refuses to show me Google Blog Search pages in Firefox because I look at LibraryThing’s blog coverage too much.
***It’s quite amazing what a bookmarklet can do. We could have never done it if Altay hadn’t shown us the way in this sort of Javascript. The script itself is, however, pretty amateurish–a notice attempt at what Altay did expertly.

As we put on the bookmarklet: “Google and Google Book Search are registered trademarks of Google. LibraryThing is not affiliated in any way with Google or the many libraries that have so generously provided Google with their books and bibliographic metadata, although we share a love of books, a desire to make information as freely available as possible, and similar opinions about evil.”

Labels: features, google, google book search, new feature, new features