2020-01-09 Meeting Notes
Date and Time
Thursday 01/09/20, 3pm Eastern
Zoom URL
https://lyrasis.zoom.us/j/897871318
Participants
@Kevin Schlottmann
@James Griffin (Unlicensed) (Note taker)
@Bria Lynn Parker (Unlicensed)
@Christine Di Bella
@Daniel Michelson
@Jared Campbell
@Wiedeman, Gregory
@Dallas Pillen
Goals
Discussion topics
Time | Item | Presenter | Notes |
---|---|---|---|
5 min | Ice Breaker Question: Favorite kind of weather / season | @Kevin Schlottmann |
|
15 min | Standing item: review metadata tickets | @Kevin Schlottmann | Specific tickets =flagged for us by @Christine Di Bella ANW-974: MARC Export -- 044 field usageClosed-Complete ANW-969: EAD export no longer includes <langusage>Closed-Complete See below for Kevin’s discussion prompts / draft replies |
10 min | Ideas from TAC discussion |
| *There is a desire to see us engage with RiC. Suggestions included a blog post about RiC for the AS community; and inviting EGAD to present to TAC. *We should make explicit what AS versions the import/export mappings are keyed too. Speed of AS updates vs review might be an issue. Thoughts? |
5 min | Deliverables from previous meeting |
| James will provide updates on the GitHub repositories Kevin will show the draft instructions for import/export review: Import/export mapping review process Kevin created ticket for 245f/g issue: ANW-1002: MARC 245$g should import into a bulk date fieldClosed-Complete Christine created a dedicated test server for Metadata Standards team, available at http://metadata.lyrtech.org/staff/. (admin/admin) For testing against the current version using/creating data that is reliably and predictably there. |
10 min | DACS tooltips review - updates on scope and process | @Daniel Michelson | Update on the spreadsheet |
10 min | MARCXML /EAD import review
| @Kevin Schlottmann @Wiedeman, Gregory | Updates on the Google Sheet: https://docs.google.com/spreadsheets/d/1jU6MYF7UI7a-UKdd5XhYCV6W1UyrMMCzYDFlgb8iNW8/edit#gid=1527709562
|
5 min | Anything else? |
|
|
Action items
Notes
Introductions
Metadata Tickets
Proposed reply for ANW-974: MARC Export -- 044 field usageClosed-Complete
“As Rachel notes correctly, the 044 is incorrect, since country of publication is determined at the repository level and in AS there can only be one. Per MARC rules therefore the country code should (and is, by AS) encoded only in the 008, p15-17. Light testing indicates that the country codes are being exported correctly, including the US code ‘xxu’ as discussed in ANW-673.
The metadata subgroup of TAC agrees with the reporter and recommends that the 044 field be removed from the MARC exporter.”
Other comments:
The 008 AS->Marc needs to be updated (I made a note in our Google sheet)
In the sandbox (2.7.0), I created a test repo with the country as Germany. The MARC export code to the 008 and 044 is de. The code is correct. The placement in the 008 is correct.
MARC 008 definition: https://www.loc.gov/marc/bibliographic/bd008.html
MARC 044 definition: https://www.loc.gov/marc/bibliographic/bd044.html
MARC country code list: http://www.itsmarc.com/crs/mergedProjects/councod/councod/contents.htm
Looking at the spec, current AS->EAD2002 behavior matches the implementation request (encoded language of description is NOT exported to EAD2002; only the free-text language of description is exported). I also note that langusage is optional in EAD2002, and not required by DACS(!).
All agreed to this proposed solution, this will be posted to the JIRA Issue on 01/10/19
Question: What is the process for these types of Issues for Dev. Pri.?
For something like this when the Issue is submitted, and then it is delivered to this Sub-Team, there is enough information now to know that this MARC field should be suppressed
Dev. Pri. shouldn’t need any additional information, and this would be routed as “ready for implementation”
Prioritization should be minor for this issue, we want to ensure that the Co-Chair for Dev. Pri. is okay with this
Maybe it does make sense for this to be placed into the queue for Dev. Pri. - the ticket itself does not need priority if that is the case
Proposed Reply for ANW-969: EAD export no longer includes <langusage>Closed-Complete
We should discuss, but if this were open for a do-over, I would propose that the EAD2002 export behavior of language of description mimic that of language of materials (in short, if a free-text note exists export that, otherwise export the translation of the encoded language as a text strings with the codes in attributes, with some boilerplate prepended. E.g.:
<langusage> Finding aid written in: <language langcode="[Language (code value)]" scriptcode="[Script (code value)]">[Language (translation value)]</language></language>
However, the spec appears to have been drafted pretty carefully, so perhaps we should reach out to Corey Nimer and see if this was intentional.
Spec (see PDF linked in ticket): https://archivesspace.atlassian.net/browse/ANW-697
EAD: https://www.loc.gov/ead/tglib/elements/langusage.html
DACS: https://www2.archivists.org/standards/DACS/part_I/chapter_4/5_languages_and_scripts_of_the_material
Feedback from the Sub-Team:
This is a weird history, as people have not encoded language; They have chosen to use mixed content
Language of description is not a DACS elements; Only language of material is
We should still attempt to export this information in a structured way
We’re encoding EADs with mixed content given that the toolkit encoded it as such
EAD3 was the intended version for enhanced support for extended language encoding
EAD2002 is far more restrictive in structure
Next steps: Reach out to Corey to determine if there was any intent here, and if there was, determine what the next step would be for the Sub-Team
Another Ticket: ANW-805: EAD validation errors for dao audience attributeClosed-Complete
This was raised by Dev. Pri., but the process for forwarding this to the Metadata Standards was not in place
Issue was how the “published” status should work, it seemed to be that there was a lot of discussion in the comments for this
The interaction between the “published” status for the Digital Object in ArchivesSpace, and how this is encoded within the EAD
This will be added to the agenda for the next scheduled meeting
Ideas from December TAC Meeting
Desire to see us engage with RiC
Blog posts, particularly focusing upon our efforts to engage with the EGAD Steering Group (https://www.ica.org/en/egad-steering-committee-0)
Invite members of the Working Group in order to discuss their work with the TAC
How might we engage with Records in Context in other ways?
Daniel Pitti is the current Chair
There is also another individual who serves as the representative for the US
No members of the Sub-Team have any close relationships with members of the Steering Group
It would make more sense to contact Daniel directly in order to engage with them
Daniel Michelson volunteered to contact Daniel Pitti
This invitation would be for the next joint TAC meeting, and we should check with Maggie first
Making explicit which ASpace versions apply to the import/export mappings
We could just work with version 2.7 until we have a completed mapping, and then we could work with 2.7.4 for the next updated cycle on a separate iteration
Daniel Michelson: If things change in the upgrade, wouldn’t that be a good way of knowing what they are?
Christine: From a development perspective, we would know which issues have been worked on
Daniel: Changes to the exporter in the new version would be something which was presumably already tested, and we wouldn’t need to perform a separate test from us
Christine: There shouldn’t be additional testing required, but the mapping itself will still need to be updated. Should developers be feeding any of this information into this Sub-Team
Greg: We need to think about what we are doing; our pace is slow and this is a very labor-intensive
Updating the mappings for “unittitle” elements, as an example, required several variations of “unittitle” elements to be provided for the test import
It is actually easier to understand the mappings by looking directly at the code for the exporter
It might be possible to have things generated from comments from the code
Ruby Yard: This is in place for ArchivesSpace, but the rest of the documentation is moving away from being so attached to the codebase
The downside to this is that we are limiting the audience who might be able to contribute to this
We might still try this, but we need to be certain that this is even possible
Christine: One of the things which the ArchivesSpace did try was to try to extract the information from the code itself, but it could not be successfully implemented. It might still be possible.
Greg: For reference: archivesspace/backend/app/converters/ead_converter.rb at master · archivesspace/archivesspace
Greg had tested the conversion of “unittitle” elements with archivesspace/backend/app/converters/ead_converter.rb at master · archivesspace/archivesspace
Daniel: For some types of data, there are relatively straightforward cases for testing imports
Is there any way to determine which of these cases might be more straightforward?
Request to have Greg write up a description for what they found when testing the importation of “unittitle” elements
Perhaps invite Laney to the next meeting in order to determine whether or not automating documentation generation from the code base is possible
It should be noted that it is more efficient to update the code comments during updates to the code base if this is indeed possible
Documenting our Review Process
Kevin drafted a Confluence page (Import/export mapping review process)
These directions should prove to be quite useful for providing guidance to future Sub-Teams
Perhaps referencing the code in certain cases might be more straightforward for discussion the import process and mappings
Bulk Import Issue (ANW-1002: MARC 245$g should import into a bulk date fieldClosed-Complete)
Daniel: This should go directly to the Dev. Pri. meeting given that this has already been evaluated by this group
Christine created a dedicated test server at http://metadata.lyrtech.org/staff
DACS Tooltips Spreadsheet Update
Most of the progress has been quite straightforward
The priority of this is significantly less than the Importer/Exporter testing
It might be more efficient to proceed by perhaps setting a month for dedicated focus upon this and finalizing it
MARC-XML and EAD Import/Export Sheet Updates
Bugs are extremely difficult to test
With a massive MARC import, Columbia did not find that many of the more obscure fields were used
MARC 342 tag represents a case which is probably a very extreme edge case
Maybe something like paring down the imports for this process might make it much easier
We should explore defining a core group of minimum supported fields
Prioritizing testing by determining which elements in MARC and EAD are used
There have been usage surveys to try and find points at which we could reduce the complexity offered by MARC where use cases aren’t relevant
Meeting adjourned at 16:00 EST