Codes and crosswalks (purplesearch)

We use data to look up Libary of Congress Classification, Dewey, and also a smaller and local Dutch BCL (BasisCLassificatie) codes, which we mostly use to present readable english forms of whatever codes appear in records.

The data that backs that is exposed in XML form. It took me some work collect, and I figured I might save someone else the trouble. Note that BCL has relatively few codes at all, while the Dewey and LCC sdata here are not as specific as the actual systems go -- this contains only the more general classification (which seems to be the part that you don't have to buy to use).

More interestingly than just the codes, though, are that we've been keeping track of co-occurring codes, with ideas like that we can create crosswalk maps between codes, for various potential benefits in searching and browsing.

There is a whole bunch of data that has been collected over some time now, but to most the more interesting thing than that raw data is the somewhat filtered code lookup function, exposed at an URL like http://purplesearch.ub.rug.nl/expose.similarcodes?code=dewey:149 (Note: mentioning dewey:, lcc: or bcl: is optional as with the current types of code, the code can deduce it from the actual code text, but seemed like a good idea to be able to specify)

The relations are specific to the code you give. This may sound obvious, but consider that in most cases there are a lot of more specific codes, and not just in an after-the-dot way.
For example, B831, B833 and whatnot are all in the overall subclass range of Modern Philosophy (B790-B5739) but are more specific than that range, and have their own associations (even when our data detailing LCC doesn't let us know what B831, B833, and such actually mean). (We could decide to gather all relations from the specific subrange, which is sometimes more meaningful, but probably also occasionally overly fuzzy as a range can be fairly general.)

For the more basic lookup data, see:

You may want to add some clever code, and may have to do in the case of LCC, as 'look up the most specific range that this specific code is in' can be useful, but note it will take a bunch of lines of code to do on this data, and do correctly.

Bart Alewijnse – Tue, 2009 – 09 – 08 14:10