EAD export problem with & in notes with <p> tags</p>

Description

This issue was reported by Jodi Allison-Bunnell on behalf of Orbis Cascade and Oregon Health & Science University:

Here's the text of the request:

This morning I encountered an error similar to one I've had before. When exporting the EAD from AS, some of the notes were not formatted correctly. Particularly (as you can see in the attached file) the Scope and Contents note and the Conditions Governing Use note didn't get the <p> tags they should have around the content of those notes. I've mentioned this before – though I forget which note was the culprit then. Now that it appears to be happening to other notes, I'd like to check in on this problem again.

....

After a bunch of back and forth, it appears that if a note has “HC&A” in it, it exports without the <p> tags, even though the ampersand is adjusted to read “&amp;”. If I change “HC&A” to “HC&amp;A”, everything exports just fine. Our Conditions Governing Use standard note just happens to be the one that almost always has HC&A written in it – but the recent collection I mentioned, also had it in the Scope note – causing the additional problem.

Environment

None

Activity

Show:
Manny Rodriguez
June 15, 2018, 1:02 AM

I traced this issue down, the <p> tags are missing when the note contains an ampersand directly next to a letter or word, like "A&A" or "Bob&Bob".

There's this comment in the code:
https://github.com/archivesspace/archivesspace/commit/d13eb7b503fd2cd79f087f3c9aee5fb61dace805

backend/app/exporters/serializers/ead.rb:55:

  1. first lets see if there are any &

  2. note if there's a &somewordwithnospace , the error is EntityRef and wont

  3. be fixed here...

Seems like this has been looked at before and the specific case of "&" adjacent to other letters isn't handled.

Googling this, I found this helpful StackOverflow post that explains why this might be the case:
https://stackoverflow.com/questions/23422316/xml-validation-error-entityref-expecting

What I got from that is that is that '&' is a special character in XML.

In ASpace notes, can it be assumed that any "&" has no special meaning besides punctuation, so we can escape the whole thing?

Christine Di Bella
June 15, 2018, 4:48 PM
Edited

@manny - I think one of the things that happens in ArchivesSpace is that, either through new data entry or import of legacy data, some people have done the escaping themselves. So, for example, some records will have [ugh - JIRA is helpfully converting my character entity references it should be & amp; with no space between the & and a] instead of & - this would be true for other character entities as well. So & could be just punctuation, or it could indicate the character entity reference for a particular special character. So we need to accommodate both situations, and somehow make sure that an & without a space next to another character isn't automatically assumed to be an character entity reference but also isn't assumed to not be one when it is.

Manny Rodriguez
June 19, 2018, 9:08 PM

Thanks . I think I've got this dialed in the way we want – where we can properly escape cases like "A&B" as well as "A & B", but leave pre-escaped HTML entities alone.

PR Submitted: https://github.com/archivesspace/archivesspace/pull/1278

Done

Assignee

Manny Rodriguez

Reporter

Manny Rodriguez

Labels

Fix versions

Priority

Major