Avoid the creation and automate the clean up of trailing whitespace in data entry

Description

Whether by human error in data entry or by copying and pasting existing inventories, these extra spaces happen and it would be GREAT to have an automated way to clean up the data. Attached is a screen shot of a finding aid a student cut-and-pasted from an HTML legacy inventory. Note the weird box that is created because of that empty space. When doing data clean up in ead/xml files, the equivalent script would be Find " </unittitle>
Replace "</unittitle>" I don't want to have to do this with all 700 collections we have!
This issue is most poignant in the title fields. Ideally, if the text string ends with a space, it would be great to autocorrect it once to delete that space, but allow users to manually override it in case for some reason they want to

.

Environment

None

Attachments

2

Activity

Show:

Joshua ShawApril 17, 2019 at 1:42 PM

I tried copying & pasting from that html and still didn't see any oddball boxes. BUT! since the metadata for that html indicates it was produced with Word 97, I'm betting it was a weird Word character that the default font in AS didn't want to render or was the result of some character encoding mismatch.

Lydia TangApril 17, 2019 at 1:21 PM

Thanks for checking into this! Actually, I checked my finding aid and I think the boxes are gone now and I forget whether it was by the pain of student workers or possibly through incremental ArchivesSpace updates. Here is a sample HTML inventory we were using to cut and paste from:

Thanks for working on this!

Joshua ShawApril 17, 2019 at 1:10 PM

I created a resource and AO with trailing whitespace and the new background job successfully removed the whitespace.

However, I'm not sure how to reproduce the original issue so I'm not sure I'm fully testing the solution. I did not see the boxes as shown in the png attached to the issue even when whitespaces were present.

It almost looks like the original issue was a copy/paste of some odd character that is not rendered by the default font (maybe from word or via word) rather than a whitespace issue.

Laney McGlohonApril 12, 2019 at 5:08 PM

We have added a background job called "Trim Whitespace" which removes leading and trailing whitespace from titles for resources, accessions, digital objects, archival objects, and digital object components.

Patrick GalliganFebruary 22, 2018 at 4:30 PM
Edited

Seems like it wouldn't be too hard to find trailing endspaces at the end of unittitles in the staff user interface. I think the bigger question is whether this is something the larger community desires.

Done

Details

Assignee

Reporter

Affects versions

Priority

Harvest Time Tracking

Open Harvest Time Tracking

Created January 4, 2018 at 8:09 PM
Updated May 30, 2019 at 1:04 PM
Resolved May 30, 2019 at 1:04 PM
Harvest Time Tracking