Archival description is a critical part of any special collection program. Researchers rely on description of our archival collections to find materials relevant to their work. Archivists rely on description to manage the materials. As office technology changes, archival description changes along with it.
Around 2000, RBML archivists began creating digital description for new collections, first directly in HTML and then via an XML encoding standard called EAD. The resulting finding aids were web pages, which could be easily updated when new materials arrived or when we learned more about a collection. In 2003, the paper finding aids that had long served as the primary description of the RBML’s vast archival holdings were scanned as PDFs and posted online. This was a huge leap forward in providing access to our collections; I’ve heard from my colleagues that reading room traffic spiked shortly thereafter and has only grown since. However, the retrospective conversion of the old scanned PDF description to EAD was a slow, painstaking process that was never a priority.
The drawbacks of these PDFs are myriad. They can’t be edited, so the mix of typewritten and hand-annotated information was soon out of date, referring for example to locations that no longer held collection material. While the PDFs are OCRed, the text searching is imperfect, and there is no way for users to search across collections. The web-based finding aids integrate seamlessly with our Aeon requesting system, but the PDFs require users to fill out request forms for each box they wish to see. (This also makes our usage data messier, since the regular user understandably only inputs the bare minimum necessary to get the box.) The scanned PDFs also don’t provide the ability to pull out metadata for other purposes, such as stub records for digitization workflows or linking to external systems like SNAC. In addition to pulling stub records for digitization, we can update those records in bulk with links to the digital surrogates once digitization is complete. We can do more granular querying and bulk revision of finding aid data across collections, including tracking and lifting expired restrictions to materials, and identification of materials for digitization/other preservation tasks. Non-PDF finding aids are also more accessible, allowing screen readers and other assistive technology to be used.
When we started working from home in March 2020, it was clear that remediating these PDF finding aids by entering the data into our archival content management system ArchivesSpace was going to be a great and important work-from-home project. On March 17, I queried our data set, and found PDF finding aids for 1,114 collections. So my archival colleagues and I got to work. Some finding aids were easily entered into ArchivesSpace, in a few minutes or a few hours; others took days or weeks to work through. Often, we found that we could remediate 95% of the content, but needed to check a box or a range of boxes during our limited onsite time. As of the end of 2020, there were 515 PDF finding aids remaining. This means that in a little less than a year of sustained effort, we were able to address about half of the old finding aids that had been created over the past 50 years.
A full set of the PDFs has been added to RBML Office Files, as a record of how description and technologies have changed.
There is still significant remediation work to do, especially for some very large and complex collections, such as the Random House records. And we also unfortunately have other hidden, unprocessed, or under described collections. But we’ve made the best of the terrible COVID situation by addressing a long-standing need that will make our collections easier to maintain and serve, which is our ultimate purpose in the archives.
– Kevin Schlottmann
Interim Director and Head of Archives Processing