Thursday, February 5th, 2015

Subjects and the Ship of Theseus

I thought I might take a break to post an amusing photo of something I wrote out today:

subjecttables

The photo is a first draft of a database schema for a revamp of how LibraryThing will do library subjects. All told, it has 26 tables. Gulp.

About eight of the tables do what a good cataloging system would do:

  • Distinguishes the various subject systems (LCSH, Medical Subjects, etc.)
  • Preserves the semantic richness of subject cataloging, including the stuff that never makes it into library systems.
  • Breaks subjects into their facets (e.g., “Man-woman relationships — Fiction”) has two subject facets

Most of the tables, however, satisfy LibraryThing’s unusual core commitments: to let users do their own thing, like their own little library, but also to let them benefit from and participate in the data and contributions of others.(1) So it:

  • Links to subjects from various “levels,” including book-level, edition-level, ISBN-level and work-level.
  • Allows members to use their own data, or “inherit” subjects from other levels.
  • Allows for members to “play librarian,” improving good data and suppressing bad data.(2)
  • Allows for real-time, fully reversible aliasing of subjects and subject facets.

The last is perhaps the hardest. Nine years ago (!) I compared LibraryThing to the “Ship of Theseus,” a ship which is “preserved” although its components are continually changed. The same goes for much of its data, although “shifting sands” might be a better analogy. Accounting for this makes for some interesting database structures, and interesting programming. Not every system at LibraryThing does this perfectly. But I hope this structure will help us do that better for subjects.(3)

Weird as all this is, I think it’s the way things are going. At present most libraries maintain their own data, which, while generally copied from another library, is fundamentally siloed. Like an evolving species, library records descend from each other; they aren’t dynamically linked. The data inside the records are siloed as well, trapped in a non-relational model. The profession that invented metadata, and indeed invented sharing metadata, is, at least as far as its catalogs go, far behind.

Eventually that will end. It may end in a “Library Goodreads,” every library sharing the same data, with global changes possible, but reserved for special catalogers. But my bet is on a more LibraryThing-like future, where library systems will both respect local cataloging choices and, if they like, benefit instantly from improvements made elsewhere in the system.

When that future arrives, we got the schema!


1. I’m betting another ten tables are added before the system is complete.
2. The system doesn’t presume whether changes will be made unilaterally, or voted on. Voting, like much else, existings in a separate system, even if it ends up looking like part of the subject system.
3. This is a long-term project. Our first steps are much more modest–the tables have an order-of-use, not shown. First off we’re going to duplicate the current system, but with appropriate character sets and segmentation by thesaurus and language.

Labels: cataloging, subjects

3 Comments:

  1. elenchus says:

    Work-level! Edition-level! Book-level!

    (gasps for breath)

    Warms the cockles of my heart. And I enjoy learning how that schema links into LT’s schema generally, though I’m not (at the moment) a big user / peruser of subjects.

  2. timepiece says:

    I don’t have anything personally to contribute, but I thought I’d point you to another community which has put a lot of though into tagging, if you find alternate viewpoint of any use: Archive of Our Own, which is one of the 2 biggest fanfiction archives.

    Here’s their FAQ on tagging.

  3. amarie says:

    Have you heard of Linked Open Data? It’s ten levels of semantic awesomeness and answers that problem of silo metadata. I want to implement it everywhere (along with almost every librarian exposed to the concept), but my reach is small so I’ll just keep taking workshops and raving about linked data.