Archive for February, 2006

Sunday, February 26th, 2006

“Users with your books” is better

Update: The server migration appears to have gone off without a hitch—anyway, LT’s had of the usual random, corrupting crashes since the changeover. I am a little behind on email, but will be—amazing to say—out of touch today.

The “Users with your books” box on user profiles shows how many books you—or any user—share with other LibraryThing users. Unfortunately, it counts all books equally—Harry Potter as much as something rare. And there was no dampening of big libraries, so everyone had the largest library, ellenandjim, near the top.

LibraryThing used to have a page that munged your “shared books” in various ways. The algorithm was, however, very inefficient, so I had to drop it somewhere around 1 million books. I’ve brought it back. It better than ever and it wont clog the server (another side-benefit of the new “works” system).

You can see the new feature by clicking the “weighted” link in the “Users with your books” box on your profile. It takes account of both book obscurity and library size. It really works for me, anyway, sifting to the top a number of users I’d never seen, but who share some of my favorite stuff. Try it out and tell me what you think.

PS: The 2:30am EST downtime is still on.
PPS:
Feel free to chime in on this topic. My first task in the next week or so is to work on bugs and infelicities. After that, should I work on a “groups” system or a “forum”? A groups system would, among other things, allow a group of friends, a club or other association to easily search a bunch of libraries. There would also be group profiles and so forth. A “forum” feature would bring interactive, mutli-person discussion to LibraryThing. It would be very closely tied to the work, author and tag system, not just being “another place” to discuss books. (It would, of course, have a place to discuss bugs too.)

Labels: 1

Sunday, February 26th, 2006

Downtime 2:30am EST / 11:30pm PST

LibraryThing will go down at 2:30am EST / 11:30pm PST / 7:30 GMT for a major swap. The “big new server” I got some months ago has proved very fast, but also glitchy. My database guy thinks it was some interaction between the OS FreeBSD and MySQL. He’s made a Linux server—exact same hardward—that seems never to crash under similar stress. That sure would be swell. Everyone cross your fingers!

Oh, and don’t worry about your data. Obviously the change-over will be backed up six ways from seven.

Labels: 1

Sunday, February 26th, 2006

2.5 Million tags!

LibraryThing just hit 1.8 million books and 2.5 million tags. Since we’re going to hit 2 million books soon, I’ll talk about the tags today.

As the tags accumulate, they are also generating a lot more value. Tags are mostly useful personally and statistically. Tags are often played up baselessly—as if a few scattered and general tags are of any use to anyone. For statistical purposes you need a LOT of tags, so frequency patterns can emerge and anomalous entries fade into the background. And tags are primarily interesting in concert, not by themselves. Because tags are non-heirarchical and often short, they lack the “context” of something like the Library of Congress subject headings. Other tags can provide that context.

That’s why the “tag similarity” algorithm takes many tags into account, favoring recommendations that match on more than one. Take the messy example of a mid-level book, T. E. Lawrence’s Seven Pillars of Wisdom. What the heck is that? Its all over the map—literature, WWI, Middle East, Ottoman Empire, Arabia, history, autobiography, memoir, etc. The recommendations try hitting many of these tags at once—books like Fromkin’s A Peace to End All Peace (WWI, Middle East, history, Ottoman Empire, etc.) and Robert Grave’s Goodbye to All That (literature, memoir, WWI). It’s not perfect—Edward Said’s memoir!—but it’s a hell of a lot better than any single tag could produce.

And, most importantly, every book and tag makes the statistics better.

Lastly, I wonder how LibraryThing’s 2.5 million compares. I’m sure Flickr and Delicious have many times that number. But what else is out there? Amazon has encouraged product tagging for about three months, and they have thousands of times the traffic. I wonder how well that’s going?

Labels: 1

Saturday, February 25th, 2006

‘Twas the night before LibraryThing

I wanted to take a second to highlight an interesting use of LibraryThing. LibraryThing user _Celeste_ collects editions of the Clement Clarke Moore‘s “A Visit From St. Nicholas,” better known as “The Night Before Christmas.” Putting her collection online helps her—and the friends and family members who scout for her—keep track and avoid duplicates. In addition to putting her collection online, _Celeste_ has also added her own covers. (Needless to say, most of her copies are not available on Amazon.) Arrayed together, they are a pretty cool sight, and a monument to one collector’s dedication. It would be great if more collectors put their collections and covers on LibraryThing. Old covers have a low profile on the internet because nobody has much of a financial stake in them, and there isn’t anywhere central to “put” them. LibraryThing can be that place.

Check out The Night Before Christmas for most of the editions. Scroll WAAAY down to see her covers. Not all the editions have been combined into the master “work” (not should they), so also check out her dedicated tag and the books in her catalog. Great stuff.

The library came to my attention when Celeste reported problems with her 100 copies sending “shared book” stats through the roof. I’ve revisted how these are calculated. Profiles now list both the number of works and the number of books, if different. I’ve also brought back the “Shared books” box for all users’ profiles, not just yours.

Labels: 1

Thursday, February 23rd, 2006

LibraryThing leaps forward: Everyone a librarian

UPDATE: I changed the way I load covers. Is anyone still getting a “stack” error?

In the last three days I’ve added a slew of new features, and a new structure to support future improvements:

“Works” : user-controlled book combination and separation

LibraryThing would be pretty useless if connections, ratings, comments and recommendations considered every edition of something like Pride and Prejudice a separate thing, as other online cataloging applications do. In the past LibraryThing made a guess, like Amazon does, and failed about as often. Library technologists are starting to do somewhat better, but there was no perfect answer. It was time to try something new.

Starting three days ago, I announced a trial project to let users determine what books belonged together, the first time anything like this has been attempted. Using simple check boxes, users could go through a favorite author’s works, combining and separating editions as necessary.

The response has been startling to say the least: In three days, users have combined 17,000 times, mashing together 42,000 works! Users have spent hours at the task, and debated the nuances in a blog post that now sports 182 comments. Although only a few of these Christmas elves are actual librarians, but most are experts on the authors they labor over. As one wrote on the blog, Isaac Asimov’s Nightfall the short story collection, is distinct from Nightfall the novel and from Nightfall One. Do libraries know that? Does Amazon?

As with tagging, reviews, ratings, uploading scanned covers and other user content, LibraryThing users are taking book information into their own hands. And they’re doing it because it’s fun and they see the benefit right away—filling out the catalog with covers and cataloging data they wouldn’t otherwise have, and connecting them with like-minded reasons, even if the person who’s read every Asimov book they have did it in Finnish.

Book pages revised

Book information pages (the card-catalog, people and pencil icons) have been enhanced in a variety of ways.

  • Social pages show all the work’s covers, by popularity, including user-contributed ones. (Note: Some Windows IE machines fail to load all the images. If the book has very many covers, this can show an error. I am working to solve this.)
  • Each edition links to Amazon, Abebooks, Alibris and other online merchants.
  • Editions also link to the OCLC’s “Find in a Library” project.
  • Users can now swap covers easily, and for the first time snap up covers uploaded by other users.

New Recommendation Options

The new “works” system has opened things wide up for improvements. One of the first is an enhanced “recommendations” engine, now based on the “full work,” not just some of its editions.

The new engine shows both “People who own X also own Y” and “Similarly tagged” books. The “raw” option shows “People who own X also own Y” without any weighting applied (J. K. Rowling rules!). The “exclude author” option is useful when a book triggers ten or fifteen suggestions by the same author, as happens with authors like Agatha Christie or Stephen King.

Search

LibraryThing now allows library-wide title and author searches based on the new work system.

Deweys and LC Call Numbers for everyone!

Until now, members who found books through the Amazon search (most of us), had no access to Library of Congress Call Numbers and Dewey Decimals, which only came through library searches. The new “work” system leverages everyone’s cataloging, bringing true library data to 175,000 works, including most popular ones. These numbers now appear in your catalog, in green to distinguish them from your own numbers.

In addition to Deweys and LC Call Numbers, card catalogue pages now allow you to browse works’ MARC records, a Matrix-like stream of data librarians are said to be able to decipher, and even enjoy looking at.

Forward

The new features have gotten a workout since starting testing three days ago, and users have been very helpful with bug reports and suggestions. Some systems are still transitioning, and some problems remain, but issues are being knocked down one-by-one. In my defense, developing LibraryThing is usually like working on a train while it’s in motion. The recent changes were like turning a train into a monorail while it was in motion…

Some anticipated improvements include:

  • Fixing all occurrences of that MSIE6 “stack overflow” bug, and the pesky double-frames issue
  • A definitive statement on when translations should be combined (I’m working on it!)
  • Work disambiguation—there, I said it—available through the search system
  • Improved author disambiguation
  • Deweys and LC Call Numbers for all books, not just books in the system
  • Browsable LC Subject headings for all and sundry
  • A space-based laser to smite people who write in library books

Comments, suggestions, criticisms, complaints and bug reports always welcome.

Labels: 1

Monday, February 20th, 2006

Beta/unofficial features launch

Update: I’m going to give this another day of “beta.” I updated the About Works and Books page. I’m going to work on making the “combine” feature less cluttered, and the “separate” feature more responsive.

A suite of new features have been launched in beta. The core feature is a change in the relationship between books, adding a robust concept of “works” to tie different editions together. This behind-the-scenes change has allowed a cascade of other, seemingly unrelated changes.

I plan to watch the site today, and receive in the blog comments, on the Google group and by email. I’ll make changes and launch the features officially tomorrow. If you plan to blog the changes, I’d prefer if you did it tomorrow, when things are more stable. The same goes for praise and blame—to the extent you can, hold it, and if you are minded, give me solid, specific feedback.

The new features include:

  • Book combine/separate, available from author pages and the card-catalog page of a book. I took a swing at a few of my favorite authors (even getting into Rowling and Dan Brown), but the process is only beginning. I have also yet to fully explain the LibraryThing “way” of works.
  • Substantially-revised book-info pages (eg., social, card-catalog and edit pages)
  • Book-info pages include Amazon and user-supplied cover images.
  • Book-info pages now link to a number of booksellers as well as the OCLC “Find in a Library” service.
  • The change-cover feature has been improved. You can now snag covers from the 23,000 users have already uploaded.
  • Social pages now offer enhanced book recommendations in various flavors, such as “weighted,” “raw” and “exclude author” (useful when every recommended book is by the same person).
  • All-LibraryThing title and author searching, available from the search tab.
  • Card-catalog pages now show LCCNs and Deweys for most books from Amazon—and more will get them as the work system fills out. At present, these are not available on the catalog view, but they will be, along with clickable LC Subjects.
  • MARC records for many books, often many of them.

There are some known bugs, and tweaks and changes to last a week or two. Your feedback is, as ever, invaluable.

Labels: 1

Friday, February 17th, 2006

Upgrade update

My promised upgrade is still in the works, and growing in scope by the minute. It will include:

  • A work structure, with work disambiguation
  • Better-looking book pages (social info, book info, editing), with multiple covers and the ability to tap into others’ manual covers
  • Deeper, tweakable book recommendations
  • Edition-by-edition purchase links, going to an expanded selection of booksellers, with OCLC “Find in a Library” too
  • LCCNs and Deweys for most books

You will NOT see it today (Friday). It’s not quite ready, and, as everyone knows, Friday is the worst day to announce anything. Thanks for your support.

Update: The widgets are fixed. Sorry about that. 2:07pm EST.

Labels: 1

Tuesday, February 14th, 2006

Tag expansion delay / 2am downtime

UPDATE: (1) Work-logic is still coming. In that connection, you will see incorrect “shared” numbers if you look at your catalog early this morning. It’s still calculating the “initial” guesses. I had to redo it to correct for forgetting to hint that titles differing only with respect to capitalization were probably the same book… (2) I’ve expanded the tag field. If I don’t notice a performance impact, it will stay that way.

The current tag field is capped at 255 characters. A post yesterday begged for an expanded tag field, and I said I’d look into it. I didn’t get to it last night, but I will tonight.

I’m going to take LibraryThing down at 2am EST (11pm PST, 7am GMT—see I got the GMT right this time!). This will give me time to do the tag thing and bring some works-based stuff online. I’ve had to rewrite the “people who own X also own Y” system. In the process, I’ve added the ability to see the recommendations raw (Harry Potter wins!), weighted (the current system), omitting books by the author of the suggesting book, and flagging the books you already own. With luck, I’ll bring that live tomorrow.

Labels: 1

Monday, February 13th, 2006

Work disambiguation and the “Ship of Theseus”

This blog post is long, and involves both showy mythological allusions and inside-baseball discussions of database structures. In brief, you’ll be seeing some new features, but you may also catch some glitches as I bring them live. Thank you for your support.

In philosophy, the “Ship of Theseus” is a “replacement paradox.” The story is that the Greek hero Theseus (you know, minotaurs and balls of string) rebuilt his ship during the voyage back from Crete—perhaps even while it was moving—such that the ship arrived at Athens with no piece of wood that had left from Crete.* The question is: Was it the same ship?

Anyway, LibraryThing is a true “Ship of Theseus.” I’m rebuilding it as it moves. This week I’ll be putting in a new keel—a whole new structure for thinking about books and works.

The former system was essentially composed of discrete books. If two books had very similar authors and titles (eg., two editions of Romeo and Juliet) , the system guessed that they were the same “work.” These guesses were pretty good—particularly considering they had to be made on the fly—but not good enough. And there was no way to change them. Notably, the whole system operated without a separate “works” database. This was clever and economical, but also limited.

The new system introduces a robust concept of “work.” On the database side this means a special “works” database, where each work has a title (the most common title of books belonging to the work). It is the way whereby most LibraryThing books can acquire LCCNs, Deweys and other cataloging information. It will allow users to discuss books—for example, on a forum—without worrying that they were only talking to people who had the same edition they did. Techies will like that it opens the door to an external API, relying on Library of Congress data, not Amazon data, which is forbidden. And, most importantly, it will allow ordinary people to participate in the sacred act of cataloging, combining and splitting books from works as they see fit. This has never before been done before. It’s Wikipedia for book cataloging.

Anyway, all this is coming this week. The trick is, the system is so complex and involves so much “calculation” that I can’t bring the server down, make the changes and bring it up again without unacceptable downtime. Testing it on my own Mac takes forever and won’t give it the stress test it needs (LibraryThing can average 3,000 “queries” per minute!) So, I’m going to be rebuilding the ship while it moves.

In fact, the new system is already mostly in place, but invisible. It’s going to become more and more visible as the week progresses. Once everything is changed and I’m satisfied it works, I will add the last element, exposing work disambiguation to the masses. Then I’ll take down the old system.

So bear with me as I make these changes. The switch-over is highly planned; I even have stuff on paper–I’m a real programmer now! But the presence of two different systems will lead to inconsistencies in presentation and other hiccups. If you notice that your official book counts disagree by one, let it slide. If something breaks, wait ten seconds and try again. If book recommendations go briefly insane—well, serendipity is a good thing!

Advice corner. I still haven’t quite figured out the user-interface on work disambiguation. I think it will mostly take place on the author pages. Users will click checkboxes by books and then click “combine books” to combine them. I’m not certain if “work splitting” will also happen on author pages. Certainly work pages will let you to see all the editions of a work, allowing you to remove one or more editions as not belonging to the work at hand. Your suggestions would be appreciated.

Lastly, I want to favor library titles for books. Amazon too frequently puts edition and marketing info into their titles. (This isn’t their fault; they’re not running a cataloging ap.) And using library data will allow LibraryThing to offer an external API. The only trick is, libraries don’t capitalize the way most people think is “right.” It’s “Lord of the rings” not “Lord of the Rings.” I think people will go ape if work pages, recommendations and other such start using library-format titles. On the other end, it’s hard to write a perfect capitalization algorithm, and library purists may resent the use of the vulgar form. What to do?

* He also stranded his wife on a desert island, but the only philosophical issues there are ethical. The ancient story is actually a little different. According to Plutarch (Theseus 22-23), the Athenians of later days exhibited the ship that the ancient hero Theseus had sailed back from his adventures in Crete. Over time, the Athenians had replaced its planking bit-by-bit, until no part of the ship was original. Personally, I think the modern paradox should be changed again. Theseus’ voyage was pretty much a straight shot, and, in the story, he gives no time to even changing his sails—although doing so would have averted his father’s death—let alone rebuilding the boat from the inside out. The whole thing would make a lot more sense as Odysseus’ ship, or Jason’s. The latter has the advantage of allowing Medea to fix the ship through magical means, even while it moved. Of course, Jason ditched Medea too. What’s with it with these guys?

Labels: 1

Saturday, February 11th, 2006

New feature: Improved tag editing

Entering tags when you use the Add Books screen has always bothered me. I wanted a one-step process, but that meant entering tags at the same time as you entered search terms. If you didn’t enter tags then, you had to go through the “pencil,” which took you away from the page, or wait until you had a whole lot of books cataloged and then use the catalog’s tag-editing features.

To fix this I’ve added the same on-page edit feature that the catalog has. Edits change the tags in a “Ajax-y” way, without refreshing the whole page.

Here are the demonstration graphics:

Labels: 1