Columbia University Libraries is joining the international community again in the annual celebration of World Digital Preservation Day.The theme this year is ‘Breaking Down Barriers,’ framed as “an opportunity to demonstrate how digital preservation supports digital connections, unlocks potential and creates lasting value. And because digital preservation is so crucial in supporting these opportunities, it also points to a critical need to make digital preservation and community activities understandable, relatable and accessible to all”.
The Digital Collections page on the Libraries’ website, along with our expanding Internet Archive collections, provide a window into the portion of our preserved digital collections that we have been able to publish online for use by researchers, faculty and students at Columbia and world-wide.
Highlights from 2021
In the last year the Libraries added a significant amount of digital content to our preservation holdings. Examples include:
- Audio and Moving Image (AMI) content: 256,000 files (91.2 TB)
- Muslim World Manuscripts: 79,000 files (8.7 TB)
- Han Nom Vietnamese Texts project: 270,000 files (4.7 TB)
- Online exhibitions and legacy site curation: 67,500 files (.5 TB)
This is in addition to the more than 8 million files that have already been digitally preserved. Our digital collections will expand even more rapidly as we continue to digitize content from our rare and special collections and begin to receive an increasing amount of born-digital content from external individuals and organizations and from the Columbia University community itself.
Transition to Cloud / Hybrid Storage
The growth of our digital collections and the increasing number of large-scale projects has demonstrated the need for more cost-effective and scalable solutions that meet our preservation storage system capacity and performance requirements. In 2021, after several years of planning and analysis, we began the transfer of our current on-premises server-based long-term preservation storage repository to a hybrid on-premises/cloud storage configuration. Initially we will continue to maintain local storage in the Columbia Data Center as well as with two cloud storage providers, Amazon AWS and Google Cloud Platform (GCP). This will allow us to store multiple copies of our content in these separate cloud providers with storage locations in different geographic and geological regions.
It is important to note, too, that our preservation strategy currently includes relying on trusted external partners, such as Hathi Trust and the Internet Archive, for preserving and providing access to some types of digitized textual content as well as to the websites that we harvest as part of our Web Resources Collecting Program.
Columbia University Libraries’ Digital Preservation Program emerged in response to the recognition on the part of Library leadership that our growing body of digitized and born-digital scholarly and cultural content needed to be carefully managed, preserved, migrated and made accessible for the future to meet our responsibilities as a 21st Century research library.
Over the past decade we have moved closer to implementing systems, strategies and workflows for meeting the national and international standards and best practices for “trustworthy digital repositories” (see our Trustworthiness of CUL Digital Repository documentation).There is still much to do, however. Over the next year we will continue to work toward addressing the relevant requirements of ISO 16363 and CoreTrustSeal for Trustworthy Data Repositories.
Columbia University Libraries continues to work conscientiously to preserve our important cultural heritage digital content in the face of file format and software obsolescence and digital impermanence.