Thursday, August 9th, 2007

Hot intellectual property!

LibraryThing has a small but dedicated cadre of author-picture adders. The most active, alibrarian, has uploaded more than 3,000 of them.* Today’s subject, DromJohn, has entered fewer—183 at last count—but almost all of them required permission. That is, he wrote to the author, agent or publisher and got permission to post the images on LibraryThing. I am awed by this.

DromJohn wrote to the McIlhenny Company, the people who make Tabasco. They’ve also published a few books under their company name, so they have an author page. Wouldn’t it be cool to have the Tabasco logo on that page? Here’s their reply:

Please be advised that McIlhenny Company hereby grants you permission to use the forthcoming “Brand Products” logo on the LibraryThing author page … for six months from the date of this email.

Any changes in your intended use of our Intellectual Property must be submitted to us for prior approval.

We will follow up with you at the end of the six month period to see if the logo is still being used.

Further, we note that “Tobasco” is misspelled on the author page. Please make the necessary revisions to the pages in which it is misspelled. It should read “TABASCO(r)” with an “A” and it should be in all caps with a superscript registered symbol.

I will review the page in a few days to ensure the necessary revisions have been made.

Should you have any questions, please do not hesitate to contact me.

Thank you and have a great day.

[NAME OMITTED]
Trademark and Licensing
McIlhenny Company

A couple of points:

  • The owner of one of their (not-so-popular) cookbooks is volunteering to promote them. This should be a call for celebration.
  • The “tobasco” spelling error came about because some loyal customer of theirs couldn’t find the book they published in any source, but was so insistent on including it on their virtual shelf that they cataloged it by hand. These are problems you want!
  • Neither Amazon nor the Library of Congress nor any other source I can find put the registration marks in. Personally, I’m glad.
  • A close of view of the front cover of the Tabasco Cookbook shows no registration mark either. So, LibraryThing is supposed to add it when the company doesn’t?
  • Good luck getting the Tabasco tag pages on Del.icio.us and Flikr to use the ® symbol.
  • “TABASCO®”? HASN’T ANYONE TOLD THEM THAT ALL CAPS IS SHOUTING?

I’ll bet you that, on today’s web, half the time you fire off an asinine letter like this to someone with a blog you get a post like this, and another 10, 100 or 1,000 people out there who think you’re clueless.

Of, I forgot: TABASCO®, the TABASCO® diamond logo, and the TABASCO® bottle design
are registered trademarks exclusively of McIlhenny Co., Avery Island, LA 70513.

Labels: Uncategorized

Wednesday, August 8th, 2007

Elton John wants to shut down the internet

Hat tip to the (recently news-worthy) Fake Steve for juxtaposing that blog title with this photo. Juxtaposing someone’s fuddy-duddy opinion with a photo of them in a Donald Duck suit is an unfair, but totally effective, way to cut the opinion down.

Does anyone have a photo of Michael Gorman in a duck suit?

Labels: elton john, fake steve, michael gorman

Tuesday, August 7th, 2007

LibraryThing and AquaBrowser My Discoveries


AquaBrowser, which makes one of the few really interesting online library catalogs, has teamed up with us to offer LibraryThing tags and recommendations within AquabBrowser.

The product is called My Discoveries. Basically, it gives AquaBrowser a series of desirable social features, like tagging, list-making, ratings and reviews—and not in some half-assed way either. LibraryThing comes in as a way to kick off the tag data (a 21-million-tags kick) and to add recommendations to it. My Discovery customers who choose to go with LibraryThing data will be able to see both LibraryThing’s as well as their own patron’s efforts.

Putting tags and recommendations in AquaBrowser is a natural step. LibraryThing for Libraries is showing what LibraryThing can do to a library catalog and more generally the importance of having large amounts of data to help “social” features reach their full potential. But some sort of LibraryThing-AquaBrowser project has been written in the stars for a while now. Writing up this blog post I did some blog searching around LibraryThing and Aquabrowser. Apparently we should have hooked up long ago—the idea is positively rampant on the biblio-blogosphere. As NeoArch puts it:

“What would happen if we put traditional cataloging data, LibraryThing, and a highly visual OPAC in a blender?* Probably something special. It’s just my opinion, but I think if both types of data could be incorporated and added to an OPAC with a powerful interactive visual interface, like AquaBrowser, we would see a fopac [folksonomic OPAC] that every patron could fall in love with.”

We finally met up at ALA in Washington, DC. The core team is whip-smart, and as a relatively small company they have a development culture not unlike our own.** High on my list of virtues, they have a larger sense of what they’re doing. The co-founder and the Marketing director put it in a book, Risen: Why Libraries are Here to Stay. I don’t agree with all of it, but the basic point is dead-right, that innovative and user-centered technology from libraries can avert everyone’s worst-case scenario, the “fading out” of the library. We think projects like this might play some small role here—and that would be something. Also, I’m dying to take a “business trip” to their offices in Amsterdam.***

Here’s their press release.

Lastly, we should be sure to say that LibraryThing for Libraries still very much in play. LTFL is designed for all library catalogs, not just one. We have a number of planned improvements, and a frankly absurd number of customers waiting to try it out. (We’re hiring someone to take it on full time in August.) But working directly with AquaBrowser is going to give their customers what’s good about LTFL with perfect back-end integration and much more baked into the software from the start.

We’d be only to glad to partner with or work more closely with other vendors. This is clearly the future, and everybody’s going to get there eventually.


*We definitely need a LibraryThing edition of Will it blend?
**I suspect they do test, however. AquaBrowser is headquartered in Amsterdam. It’s something of a happy coincidence that yesterday was LibraryThing’s big push into Dutch-language books. The effort was a coincidence, but two of their top people have generously offered to scout out some potential sources.
***I’ve been there four or five times on the way to Turkey—KLM has great lay-overs. And my brother, best friend and I stopped there on the way to my bachelor party in, um, Lithuania (desperately random on the part of my brother). But I’ve basically only done the Rijksmuseum, the Anne Frank House and walked around till I was lost. Now that we’re tying in to all this Dutch data, and we have work to do with AquaBrowser, a longer visit is surely necessary! Now, what accounting category does hash fall under—”office supplies”?

Labels: Uncategorized

Thursday, August 2nd, 2007

The future of libraries

I have seen the future of libraries: It is to spend the future discussing the future of libraries.

Labels: Uncategorized

Thursday, July 26th, 2007

Internet Archive wants book-loving systems engineer

In the spirit of fraternal concern, I post that the Internet Archive is looking for a systems engineer with PHP experience for their book-scanning project. (Also they promised to send us their discards. We need one too.)

The help-wanted has some excellent provisions:

  • Love and respect for books; pride and care in your work
  • Not afraid of terabytes

If I had the skills, I’d be tempted to take it. The Internet Archive is a great institution. The people are great, and they have the best office space ever. LibraryThing’s second-story apartment steps from the Portland waterside pales in comparison. They have this adorable jewel-box in San Francisco’s Presidio, with the Golden Gate Bridge right outside the window.

Labels: internet archive, jobs

Wednesday, July 25th, 2007

Is LibraryThing making you fat?

New England Journal of Medicine article–with fancy animation–explains how social networks cluster… by weight. (Hat tip: David Weinberger)

Labels: Uncategorized

Wednesday, July 25th, 2007

Free copies of Everything is Miscellaneous

I’ve flogged David Weinberger’s Everything is Miscellaneous before, in blog posts and in my Library of Congress talk. I think it’s something close to the intellectual justification for LibraryThing.

Anyway, I flogged the ARC for so long that, when it came out, LibraryThing bought a small box of hardcovers direct from the publisher–to give out at conferences, to thank people for inviting me to talk, and so forth. I still have half a box. So I’m going to open it up to the whole LibraryThing community.

We’re going to give out ten copies. We like contests*—we have a Harry Potter book photo and another review contest going—so we’re going to make a contest of it.

I’ve created a thread Contest: What does tagging do to knowledge?

  • If you want the book, come there and say a word or two about tagging.
  • It doesn’t need to be a big deal. A few sentences with some examples would be fine.
  • You can talk about tagging on LibraryThing or other sites. You can do personal tagging, global tag pages, the new tagmash feature, David’s talks, my talk, Clay Shirky’s talk, your talk, or whatever.
  • You can say something positive, something negative or just ask an interesting question.
  • You can post as many messages as you want, but you don’t get more chances, duh.

I’m not going to pick winners. I’m just going to randomly pick ten members who left comments. But you can’t just say “I want a book.”


*No purchase necessary. Void where prohibited. Also void where discouraged, unseemly or tacky. We pay to mail it. You are responsible for taxes. My taxes.

Labels: Uncategorized

Tuesday, July 24th, 2007

Tagmash: Book tagging grows up

Tagmash: alcohol, history gets over the fact that almost nobody tags things history of alcohol

Short version: I’ve just gone live with a new feature called “tagmash,” pages for the intersections of tags. This is a fairly obvious thing to do, but it isn’t trivial in context. In getting past words or short phrases, tagmash closes some of the gap between tagging and professional subject classifications.

For example, there is no good tag for “France during WWII.” Most people just don’t tag that verbosely. Tagmash allows for a page combining the two: France, wwii. If you want to skip the novels, you can do france, wwii, -fiction. The results are remarkably good.

Tagmash pages are created when a user asks for the combination, but unlike a “search” they persist, and show up elsewhere. For example, the tagmash for France, Germany shows France, wwii as a partial overlap, alongside others. Related tagmashes now also show up on select tag and library subject pages, as a third system for browsing the limitless world of books.

Booooring? Go ahead and play a bit:

That’s the short version. But stop here and you’ll never know what Zombie Listmania is!

Long version. LibraryThing has shown some of the things that book tags are good for, such as plain language, genre fiction, capturing identity and perspective, academic schools, staying current and changing over time. (Details and examples in footnote.*)

It also demonstrates some of the weaknesses, including:

  1. Idiots
  2. Bad actors (spammers, racists, anarchists)
  3. “Personal” tags clouding the tagosphere with junk (eg., “at the beach house”)
  4. The lack of a “controlled” vocabulary results in ambiguous terms (eg., classics, leather, magic)
  5. Tags lacks the detail and focus available to a hierarchical subject system like the Library of Congress Subject Headings (LCSH), eg.,
    Great Britain — History — Elizabeth, 1558-1603 — Fiction
    , or
    Jews — Italy — Bologna — Conversion to Christianity — History — 19th century**

As I’ve argued elsewhere and in my Library of Congress talk, problems 1, 2 and 3 are mitigated by having LOTS of tags. Idiocy, malice and personal junk fall out statistically. A tag here or there can’t be trusted, but a large body of tags in agreement is different.

Problems 4 and 5 are harder to tackle. Flickr has shown the way with one solution, statistical clustering. The screen shot below shows this–clusters of images related to the tag “bow.”

Some day–when I become a better programer?–I’m going to try this on LibraryThing data. It will help with ambiguity—the secondary tags on the various meanings of “leather” are surely wildly divergent! But I suspect it separates better than it clarifies. Flickr supposes that tags fall into discrete clusters, but subjects interact with books in extremely complex ways. On a more basic level, I am suspicious of the too-quick resort to algorithms against user data.*** After all, if computers are so good at figuring out meaning, why were users necessary in the first place? It smacks of technological revanchism.

So, where Flickr’s clusters are automated, tagmash is a semi-automated process. LibraryThing does the statistics, but users decide what the meaningful clusters are. Some mashes are interesting and useful. Some aren’t. By and large, uninteresting clusters won’t last.****

This certainly helps with ambiguity. Take the problemmatic tag leather, which divides easily into tagmashes like:

Now let’s take the “focusing” power of hierarchy. As mentioned above, there is no good way to get at “france during wwii.” The tag Vichy covers some of the ground, but not enough. Tagmash provides an answer.

The book list is good, and a simple union gets around an imposed hierarchy. Looking at the related LCSHs, for example, one is left in doubt whether France is part of World War II, or World War II part of France—or what:

Of course, both trees are equally artificial. David Weinberger writes how, in the real world, a leaf can be on many branches. But it’s equally true that what’s trunk and what’s branch are largely about where you start–dirt or pinecone. Either way, branching happens. The order of the branches isn’t necessarily important.

Even as it borrows some of the virtues of subject classification, tagmash keeps the strenghts of tagging. Subject systems are pre-built things. Now and then they get larger, but it takes deliberation and effort. What gets “blessed” is often surprising. I would have never predicted the unusually staid LCSH would have embraced:

But tagging has no limits. Think of the tagmash “erotica” and “zombies” and there it is. (Tagmash: erotica, zombies). Want to know what chick lit takes place in Greece? (Tagmash: chick lit, greece.) Young adult books involving horses? (Tagmash: horses, young adult.) Poems from or about San Francisco? (Tagmash: poetry, san francisco). Slavery in Brazil? (Tagmash: brasil, slavery.) Non-fiction books about Narnia? (Tagmash: narnia, -fiction.) The options are endless.

Of course, tagmash only narrows the gap. It doesn’t eliminate it. Tagmash: poetry, San Francisco still can’t distinguish between poetry about and poetry from San Francisco–it involves whatever is tagged “San Francisco” and that’s probably a mixed bag.***** Well-planned and carefully executed subject systems have strengths that no ad hoc, regular-person system can match.

Lastly—let there be no doubt—tagmash needs a very large quantity of tags to work. For tagmash after tagmash, the data is simply insufficient.

You’ve made it to Zombie Listmania! There are some obvious directions this can go:

  • The syntax can improve, for example to allow alternates (eg., humor, cats/dogs)
  • The syntax can include non-tag factors, such as formal subject headings (Tag: zombies, LCSH: love stories), languages, dates, authors and so forth.
  • The syntax can include weights (eg., Zombies 50%, vampires 50%, love stories 90%). Abby and I experimented with just such a system, creating algorithmic proxies for BISAC (bookstore) headings. It isn’t that hard to do.
  • Complex mashes could acquire titles and other metadata.
  • Users could follow a tagmash, and be alerted whenever new material enters the list.

Amazon calls its static, or dead, lists “Listmania.” All these tend to create a “Zombie Listmania,” lists of books that “won’t stay dead.” Instead, they change over time, as the underlying social and non-social data change. There’s no reason you couldn’t create “Zombie” versions of formal subject headings—a series of tags and other markers which approximated the content of a professionally-assigned subject heading.

Pretty cool idea, I think. We’ll see what we can do about it.

Details.

  • Tagmashes can be made from any tagmash or tag page. Just search for a tag or two or more tags with a comma between them. The URLS are the same /tag/ plus a tag or tags separated by commas.
  • The weighting of tags is wiggly. We’re trying to get at both raw numbers of tags on an item and the relative salience (number divided by total number of tags), and then cross this data tag-by-tag. There is no obvious answer. In an ideal world, some tags would about salience (eg., humor) and others would be threshholds (eg., fiction)–that is, when you’re looking for humor, fiction you want the funniest fiction, not the most fictional humor.
  • You can enter the tags in any order, but it will reformat your URL in alphabetical order, with the minuses at the end, such that “wwii, france” is the same as “france, wwii.”
  • A single minus (-fiction) “discriminates” against items tagged “fiction.” A double minus (–fiction) disqualifies all books with the fiction tag.
  • Tagmashes don’t get built until someone builds them. The first time can take a while to generate. There is currently no system to expire older or underused tagmashes.
  • UPDATE: I’m seeing a lot of part/whole tagmashes. These rarely work. When you search for “Einstein, science” or “Manet, art” you’re not doing much more than putting a statistical cramp on the smaller of the two tags—a few Manet books won’t have an art tag, and that will be the end of them. Tagmashes work with different things, not a thing and its category.

Footnotes!

*What’s good about tagging:

  • Tags use everyday terms (the tag cooking vs. the subject cookery)
  • Tags are great for genre fiction that subject systems can’t keep up with as fast or as well as their readers (chick lit, cyberpunk, paranormal romance)
  • Tags often encode subtleties that “controlled vocabulary” irons out (lgbt, glbt, queer, gay, homosexuality)
  • Tags capture identity and perspective that subject systems can’t or wont (queer, glbt, lgbt, christian living)
  • Tags are good for schools of thought (intelligent design, austrian economics)
  • Tags respond quickly to change (hurricane katrina)
  • Tags “keep happening” in a way that systems like LCSH do not, getting added to books where LCSH misses the “first wave” of anything new (memetics, sociobiology)

**I’ve left out one problem, not covered at the LC—how “democratic” weighting can put Angela’s Ashes at the top of the Ireland tag. books. I want to write a blog post on the topic sometime. I think there are ways around it, and algorithmic solutions that nobody has really tried.

Aside: Much LIS anti-tagging polemic focuses on the most trivial of problems—spelling mistakes and “incorrect” tags. The former underestimates technology, the latter insults our intelligence. LibraryThing has dealt with the spelling problem, and has seen very few “wrong” tags. In fact, there are some serious problems with tagging. But you have to understand tags before you can see the problems, and many refuse to get past the idea that people will spell “white” wrong, or tag white horses as black.
***This is half formed. I have a problem with the reflexive “turn” from people-centered data to algorithms. I see this pattern again and again in software. Something transformative happens–something human. But it’s imperfect, so programmers conclude that programs will fix humans. In a way, it’s a reassertion of importance. More often, humans fix humans. To adapt David Weinberger, the answer to user-generated data is MORE user-generated data.
****Probably there’s got to be some system to expire unused clusters.
*****UPDATE: After turning the feature loose I watched what new tagmashes would be created. One was children, cooking. Should I call the police?

Labels: new feature, tagging, tagmash

Saturday, July 21st, 2007

My Library of Congress talk

The Library of Congress has just posted a talk I did there back in April, part of the Digital Future and You series.

I cover the basics of LibraryThing and some of what LibraryThing “means” to libraries, including a long section on tagging. It has a short section—a sermon, really—on open data, in anticipation of the launch of Open Library, and another on the upcoming Everything is Miscellaneous.*

To my regret, it ends abruptly. They didn’t include the 20+ minute Q&A**, which went a lot deeper on some of the interesting issues (particularly tagging), and with the nation’s top library talent!

Being asked to talk in front of the LC was a great honor. There aren’t many institutions I hold in higher regard. And it was fun. I got to be myself—PowerPoint-less, off-the-cuff and passionate–and was greeted warmly and given the benefit of the doubt when I pushed the limits. Also, I got to have lunch with some of their top people. It was a blast.


*The subtext of that section is that I just had a lunch conversation about open data, and heard more about the whys, wherefores and finances involved.
**Apparently they felt that they needed permission from everyone who appears on tape, and that the questions were not well miked.

Labels: Uncategorized

Saturday, July 21st, 2007

Facebook and the blink tag

Altay’s attempt to insert the CSS version of the old <blink> tag into our upcoming Facebook application, produced this excellent reply from Facebook:

He was in fact kidding. Or so he says.

Labels: facebook