Friday, September 9th, 2011

Sample LibraryThing data files

In addition to the 300-odd libraries using LibraryThing data through LibraryThing for Libraries, a number of major bookstores, publishers and other bookish sites use LibraryThing data, including Angus and Robertson (Australia), Whitcoulls (New Zealand), Abebooks, BookMooch and Random House. They use data including tags and reviews, and pull it from us in feed form, so they can serve it up faster than one-by-one API requests would be.

We’ve just posted the “small” versions of these files, so you can check them out. If you’re interested in using LibraryThing data—which is not expensive and can, in some circumstances, be traded for other interesting data—you will want to check these files out.

The files are here:
http://www.librarything.com/feeds/samples/

Here’s a description of the files, and how they work together. If you’re interested in playing with the full files, send us an email (tim@librarything.com) and we’ll get you access to them!

Frequency. We aim to refresh the files every two weeks, by the 1st and the 15th.

Small versions. Each feed a “_small” version. Open it in Firefox and you’ll see the structure clearly.

works_to_isbn_small.xml This is a catalog of our “work” ids, and of all the ISBNs that fall underneath them. The ISBNs are ordered by “count”–how many users have the ISBN.

Occasionally, an ISBN is marked as “uncertain”:

<isbn count="2" uncertain="true">0807282073</isbn>

This happen when an ISBN is “split” across two works. Often that’s because of user error. Most feed users should skip them entirely; they’re dubious and unlikely to matter. 

LT always gives ISBNs in their shortest form. That is, a 13-digit 978- ISBN will be expressed as a 10-digit. A 979 will be 13 digits, of course.

worktotags_small.xml This is a catalog of works and the tags applied to them. Take this example:

<work workcode="2">
<tag id="302312" count="14">college football</tag>
<tag id="931452" count="1">nebraska football</tag>
<tag id="1599" count="1">history</tag>
<tag id="624418" count="1">civil law</tag>
<tag id="3373" count="1">europe</tag>
<tag id="1042723" count="1" aliasedtoid="18587">Reader</tag
</work>

This means six tags have been applied to this work. One–college football–was used 14 times. The rest are all singletons. Each tag has an id. The id is unique, so Fiction, fiction, FICTION have separate tag ids.

Notice the last tag, “Reader.” This tag has the attribute aliasedtoid=”18587″. This means that although the user applied “Reader” the tag has been aliased to a more common tag, in this case id 18587,  which is, in fact, “reader” (lower case).

Tag aliasing on LibraryThing extends past case. Members “combine” tags, such that “wwii,” “world war 2,” “ww2” and so forth are lumped together. In this file, all would be “aliasedto” one id.

I recomment that feed users show only the final, aliased form, but use both the final and intermediate forms in searching.

taginfo_small.xml This is a catalog of all tag ids. It lists each id, together with the id  and text of the tag to which it has been aliased, if any. It also lists the total count for that tag and whether or not the tag has been “approved” for LibraryThing’s library-data project.

For example:

<tag id="1">
<text>Fantasy</text>
<aliasedto id="5280">fantasy</aliasedto>
<totalcount>569707</totalcount>
<approved>true</approved>
</tag>

The tag has the id of 1. The text of the tag is “Fantasy.” The tag is aliased to id number 5280, which is “fantasy.” “fantasy” and all it’s aliases are used 567,707 times in LibraryThing. The tag is approved for use in libraries.

We recommend most feed users display only “approved” tags. Approval was designed for our LibraryThing for Libraries product, and indicates the tag is probably useful and safe for display. It can be useful to use non-approved tags for search–in case, for example you want to catch variants like “ww2” and “world war two.”

reviews_small.xml This should be fairly straightforward. They represent all *approved* reviews that can be used outside of LibraryThing.com. I don’t have a count, it should be around 300,000 now. We’re approving a backlog of reviews and should hit 300,000 in the next few weeks.

userid: The LibraryThing userid

restricted: There are two possible values—”libraries only” and “unrestricted.” They determine which reviews you can use.

stars: .5-5; some books don’t have stars. There are no zero-stars.

recommendations_small.xml This is a very straightforward work-to-recommended works format. 

worktoratings_small.xml This is a very straightforward work-to-rating stars format.

Labels: Uncategorized

4 Comments:

  1. Fred Bacon says:

    Tim, is is possible to get the individual users ratings for works? I’d want userid, work number, star rating. I saw an interesting paper last winter on a reputation system for ratings that I’d like to play with. Having real data to test with would be interesting.

  2. Tim says:

    Fred, write my email. Let’s talk.