Customizing your ArchivesSpace EAD importers and exporters.

Recently, I had the privilege of hosting the ArchivesSpace developer house at Code4Lib 2015 in Portland, Or. It was great to meet real ArchivesSpace community members and talk about how they're using the application and what they'd like to see.

Probably the most common request we hear relates to the EAD import and export, which was a big topic of conversation in Portland. While we are working to improve the quality of the EAD input and output in the application, there are lots of institutions who want customized EAD handling for their own use cases. But instead of writing customized XSLT for pre- and post-processing ( which a lot of folks do ), it's pretty easy to tweak your ArchivesSpace  EAD importer and exporter, which we did at the Code4Lib Dev House.

Taking an example from the real world, some institutions put additional container information in the EAD container @label attribute. One use case I heard about put the instance_type and container barcode information separated by a space. So, the XML looks like this:

    <container label="text 878788" type="box">12344</container>

with the instance_type = “text” and the barcode = “878788”.

So, for importing, we'll need to customize the EAD converter. You can have a look at the original ArchivesSpace EAD converter in Github , which lives in the ArchivesSpace backend. This converter is a DSL that uses a SAX stream parser approach for processing XML. In this DSL, the “with” keyword you see in the configure method initializes a handler to do something when the parser runs into a specific tag. For example, “with 'container' do …. end” is how we convert EAD container nodes.

In our example use case, we need tweak this “with container” handler so that when the parser sees a container XML node, it handles the @label to process our instance_type and barcode data. We can do this by creating a new converter class that inherits from the standard EADConverter class. Have a look at this code, where you can see we define our converter (called “YaleEADConverter”), and we include a few necessary setup methods (import_types, instance_for, and profile), which are probably self-explanatory. In the configure block, we call the super-classes' configure then configure our own container in the “with 'container'“. In this DSL, the att method accesses the XML node's attribute, so we can get the @label data by calling att(“label”), we then process it and set the pertinent instance and container properties as we see fit. Since we're inheriting from the standard EADConverter, in the end we've only had to add 5 lines of code:

with 'container' do
... some stuff happens here that's exactly the same as the default ...

if att('label') && att('label').include?(" ") inst_type, barcode = att('label').split(" ") cont["barcode_1"] = barcode inst["instance_type"] = inst_type end

... more default stuff happens ...


We will now have a new option in our Import Jobs menu that allows us to convert EADs and process the container labels in our very own special way.

However, what if we also want to have our EAD export in this same way? The EAD exporter also lives in the backend. The exporter is written in a declarative fashion that uses the Nokogiri::XML:Builder DSL. The part we need to tweak the serialize_container method, which handles the instances in the Resources and Archival Objects. Looking at the method, we can see the bulk of the process iterates three times to work through the three sets of fields in a container and adds a container node. The container XML gets added when the Nokogiri::XML::Builder object (named “xml”) gets an instance method called “container”, with the attributes (defined in a hash called “atts”) passed as a parameter.

So, all we need to do is open up this EADSerializer class and redefine this method. And that's what we do with the backend/model/yale_ead_exporter.rb. As you can see, this files defines the class EADSerializer < ASpaceExport::Serializer, then monkey patches the serialize_container method. The only line that gets changed is when the atts[:label] is defined and appends the container's barcode when if there's a barcode. Again, we are really only having to add a few lines of custom code, which end ups looking like :

def serialize_container(inst, xml, fragments)
   .... Some default export stuff ...
   if inst["container"]["barcode_1"]
	atts[:label] = I18n.t("enumerations.instance_instance_type.#{inst['instance_type']}", :default => inst['instance_type']) + " " + inst["container"]["barcode_1"]
        atts[:label] = I18n.t("enumerations.instance_instance_type.#{inst['instance_type']}", :default => inst['instance_type'])
   .... Some more default export stuff ...



That's pretty much it. To package this up, all have to do is replicate the plugin directory structure (with your directory name as the name of your plugin) and add your monkey patch to backend/models directory. Then place you code in the ArchivesSpace plugin directory and add the name of your plugin to the config/config.rb file's plugins setting.

Or, you can just put the converter and exporter .rb files in the “local” plugin that ArchivesSpace ships with.

In either case, you'll need to restart that application.

If you want to know more about making plugins, checkout the documentation and this excellent blog post by Mark Triggs. Also be sure to check out the example code on Github.

Hope this shows how easy to change your EAD exporter. Let us know if you have any questions or, better yet, if you've created your own plugin so we can add it to our list.