Declassified government documents held in the Libraries’ collections are now significantly easier to search and discover

Columbia University Libraries and the History Lab are pleased to announce an important milestone in a multi-year project to significantly improve the discoverability of declassified government documents held in the Libraries’ collection via the Freedom of Information Archive (FOIArchive). Integration of the U.S. based collections of the archive in the Columbia Libraries’ catalog now enables the libraries to improve the search and delivery of government documents by making the items discoverable alongside other government information acquired from commercial providers. A new facet in the Libraries’ catalog, “U.S. Government Information,” has been developed to highlight FOIArchive documents alongside Columbia’s robust and legacied collection of government documents.

The National Security Agency’s massive new data center illustrates the growing challenge libraries will face in collecting and making accessible government information, even if only a small percentage of it will ever be declassified.

The FOIArchive now includes more than 3.5 million documents from seven different collections, many of which are now preserved and searchable through Columbia University Libraries’s catalog, CLIO. This special collection is co-curated by Professor Matthew Connelly, a professor of international and global history at Columbia and Principal Investigator of History Lab, a National Science Foundation-funded project to apply data science to the problem of preserving the public record and accelerating its release.

“Libraries are really at the center of everything universities try to achieve, whether in terms of our teaching, our research, or our service to the community,” Connelly said. “So making the FOIArchive a part of the Columbia Libraries collection is just a dream come true for me.”     

Declassified government documents are increasingly released in large, unorganized batches, making them difficult to sort through and understand. Using machine learning  techniques such as topic modeling and named entity recognition in consultation with library experts in classification and collection development, the FOIAchive has recovered more than 4.6 million records from several different collections. 

“We now have the ability to search across a variety of primary source collections in a way that was previously impossible. Users can now discover Congressional reports, declassified memos, and unique or rare archival government records all in one place,” noted Kristina Vela Bisbee, Columbia’s Journalism and Government Information Librarian.

In 2018, History Lab and Columbia University Libraries announced a grant from Arcadia, a charitable fund of Lisbet Rausing and Peter Baldwin, which enabled History Lab to partner with Columbia Libraries to continue building the Freedom of Information Archive, which is already the world’s largest database of declassified documents. Over the past two years, Arcadia support allowed Columbia to continue growing the archive, preserve it permanently, and keep it freely accessible for the entire world. The Libraries helped set it on a sustainable path by creating workflows to continue to add the full archive to our dark preservation storage; to continue to promote discovery and access through the ongoing integration of FOIAchive U.S. collections metadata in the library catalog in compliance with the Libraries’ new collection development policy for FOIArchive U.S. government collections, which was also created under this project. 

Arcadia support also helped revamp the History Lab’s Application Programming Interface (API) to provide researchers with direct access to the underlying database, which will soon include hundreds of thousands of new documents recently made available by the Central Intelligence Agency. 

“This effort highlights the value of working directly with faculty and researchers to understand exactly how they want to use these data,” said Ann Thornton, Vice Provost and University Librarian. “Columbia’s libraries have been a U.S. Federal government depository since 1882, and the significant improvement of the discoverability of these collections will have lasting impact for research and inquiry.”