Category Archives: Preservation

Recreating a lost Yiddish database: The LCAAJ Project

The Language and Culture Archive of Ashkenazic Jewry (LCAAJ) is an extraordinary resource for research in Yiddish studies.  It consists of field interviews recorded between 1959 and 1972 with Yiddish-speaking informants conducted by Columbia University’s Department of Linguistics, who donated the Archive to Columbia University Libraries in 1995.

The Archive presents an interesting preservation challenge, since the original researchers created not only the audiotapes and large quantities of paper documents, but also computer data that has not survived the test of time.

The interviews were collected from people who originally lived in 603 different locations in Central and Eastern Europe, to create a sample that reflected the distribution of the Yiddish-speaking population on the eve of World War II.  The informants answered questions on a wide variety of topics concerning Yiddish language and culture during interviews lasting anywhere from 2.5 to 16 hours.  In all, the project produced 5,755 hours of audiotaped sessions with the native speakers and ca. 100,000 pages of questionnaires.  The documents are covered with hand-written linguistic field notes that were taken during the interviews in a mix of English, Yiddish, and a linguistic notation system developed for the project that uses only characters that the computers of the day could handle.  No verbatim transcriptions of the interviews were ever made.

Examples of questionnaire pages, with linguistic field notes in English, Yiddish and a special linguistic notation system developed for the project.

In the late 1960s and early 1970s, about half of the data collected by the project was coded onto punch cards and read onto computer tapes in order to create lists that would facilitate creation of maps of linguistic features.  These were later published in the multi-volume Language and Culture Atlas of Ashkenazic Jewry.

Current scholars want to manipulate the data for further study, but the original punch cards and computer tapes vanished decades ago.  No one thought of preserving them.  If they had, it would have presented an interesting challenge for digital archaeologists.  Instead, all we have left is printouts of the data on the green-and-white striped pin-fed paper that evokes memories from people of a certain age.

Example of a printout from the original computer database on green-and-white striped pin-fed paper.

CUL obtained a grant from the National Endowment for the Humanities in 2015 to start recreating the database.  We scanned each printout page to create TIF images, then put them through OCR (optical character recognition) and mark-up to generate new machine-readable tables.  Abbyy FineReader OCR software was used for this purpose.  The pages were first zoned and analyzed to identify the tables of data on each page, and the text in each of the series was then subjected to a few hours of software “training” to enhance accuracy.  After full machine reading and some cleanup, all of the pages were exported as MS Excel spreadsheets and put through additional cleanup processes.  Scholars can now search and manipulate the data once again.

The handwritten notes that served as the input to the computer database contain additional information that was never coded in.  We have also digitized those as page images.  Our new site allows scholars to move between the tables and the questionnaire pages to make sure they have all the information relevant to their research. (See the user’s guide http://guides.library.columbia.edu/lcaaj.)

Luckily, the original audiotapes were preserved.

CUL digitized the tapes some years ago in a multi-year effort with generous support from NEH, private foundations, the New York State Conservation/Preservation Program, and EYDES (Evidence of Yiddish Documented in European Societies, a project of the German Förderverein für Jiddische Sprache und Kultur).  The audio files are available online on the EYDES site (www.eydes.de).

One of our next aims is to raise money to link the audio files and the digital data.  Columbia’s LCAAJ site will continue to evolve and add more information and more functionality to keep this re-created database relevant for new researchers.

Links cited in this post:

  • Language and Culture Atlas of Ashkenazic Jewry https://clio.columbia.edu/catalog/1231536
  • LCAAJ in Columbia Digital Library Collections https://dlc.library.columbia.edu/lcaaj
  • LCAAJ User’s Guide http://guides.library.columbia.edu/lcaaj
  • Evidence of Yiddish Documented in European Societies www.eydes.de

A Short Look at the Long History of Conservation at Columbia

In keeping with our “long view” of the Columbia University Libraries and their collections, we should consider the steadily evolving history of the Conservation Program at Columbia.

The first professional conservator was hired at Columbia in the 1980’s, but we know that the University maintained a book bindery since 1912, when Columbia’s library was in its first home on the Morningside campus in the Low Memorial Library building. Conservation staff members still use some large pieces of equipment that came from that first workshop in Low.

But a conservation department is not a bindery. How are we different? Mastering the techniques of bookbinding is only a starting point. We also need to know all about the chemical make-up of the materials themselves – things like paper, leather, pigments, adhesives – so that we can understand how both new and ancient substances will change as they age, and use this information to prevent deterioration. Aside from treating books, prints and manuscripts, we must know how to safely handle and care for the many other kinds of formats, such as paintings, costumes, and artifacts that find their way into libraries and archives.

While bookbinders of earlier centuries learned a regional or local set of skills and practiced them repeatedly, library conservators work with collections spanning many centuries and cultures, and so must know as much as they can about the practices the binders in all these times and places may have employed. Because we study materials and craft techniques, we have a unique perspective on the books and documents that students and scholars encounter in our libraries. By sharing what we know, we become, in a sense, the “reference librarians” for the physical aspects of the objects in our collections.

Perhaps the biggest distinction between the 1912 bindery in the old library building, and the Conservation Program in Butler is that our workplace is no longer confined to one room, but extends to every part of the Libraries where our collections are studied, handled and stored. From our desktops in the lab, we monitor the environmental conditions in fifty separate locations around the campus and communicate with engineers when adjustments are needed. Together with staff throughout the Libraries, we look for ways to make handling delicate materials safer and easier, whether in the process of returning ordinary books to the circulation department or while displaying a Babylonian clay tablet to a visiting class in the rare book room. And, we are delighted that our work is essential to the important ways the Libraries bring their collections into public view: through exhibitions at Columbia and elsewhere; digitization projects that require us to stabilize fragile materials; and in classrooms where we use the collections to teach about bookbinding and manuscript production. We look forward to future posts when we can describe our favorite projects as they arise!

 

Is Your Google Book Incomplete? We May Be Able To Help.

As many people know, Google has digitized hundreds of thousands of books from libraries around the world, including Columbia University Libraries, and they’ve created Google Books, a wonderful resource for readers and researchers.  Subsequently Columbia and many other libraries have contributed their Google digital versions to HathiTrust to assure that the e-books are preserved into the future.

It’s also well known that some Google books have problems – for instance, because Google didn’t open out folded pages when the books were digitized, those pages are not visible to readers.  Recently HathiTrust and its member libraries have developed a process to fix some of those problems.

Let’s look at The Royal Land Com’y of Virginia, published in 1877 and digitized by Google in 2009 from a copy owned by Columbia University Libraries.  Until a few weeks ago, anyone trying to read it on Google or HathiTrust, would have found unreadable folded plates, including this one that follows page 72.

Someone reading the book on HathiTrust discovered the folded plates and reported them by using the Feedback button at the bottom of the page display.

HathiTrust staff then notified Columbia, because it is our copy that Google digitized.  We received messages of the form “the plate following page 72 of this title is folded and cannot be read”.  That alerted us to the need for new digital images of the foldouts.

When we looked at the volume, we discovered that the foldouts were torn.  Conservation treated the damage, and then our Imaging Lab digitized the unfolded plates.

We sent the images to Google, and they inserted the new images in place of the faulty ones.  They then loaded the new version into HathiTrust to replace the incomplete copy there.  Today the corrected e-book is available to everyone through Google and HathiTrust, and preserved for anyone to use in the future.

Now that everyone has the ability to search and view millions of books online in a matter of seconds, libraries are taking time and effort to collaborate with HathiTrust and Google to solve problems.  Behind the digital images that appear to be an easy click away, teams of library professionals are dedicated to digitizing physical books and improving the e-book experience.

A Brief History of the Preservation Reformatting Department

Columbia’s Preservation Reformatting Department (PRD) began as a reprographic services unit back in the 1930s. In the 1970s-1980s, the department gradually became a reprogaphy unit with an emphasis on the preservation of brittle and deteriorating materials.

While the Preservation Division was taking shape, the world was just beginning to understand the slow moving disaster headed our way: the acids within wood pulp paper, which would eventually consume our books and documents. A number of studies done as early as the 1930s had found that an overwhelming percentage of research collections were printed on acidic paper which, under less than pristine conditions, would eventually become embrittled, ensuring the eventual destruction of more than a century of scholarly works.

Daily Spectator

Columbia Daily Spectator, Dec. 8, 1941

In an attempt to cope with this looming catastrophe, the National Endowment for the Humanities (NEH) issued a number of grants to research libraries throughout the U.S., including CUL. These funds, along with emerging best practices established by the Research Library Group (RLG), provided us with the means of moving nascent reformatting projects into large scale reformatting programs, which endure to this day, albeit in a much evolved form.

As of 2016, PRD has transformed itself in many ways; a reflection of the revolutionary technological changes happening outside and within the library doors. We continue to prioritize materials in demand but have expanded our capacity. You may be surprised to hear that we still do send out shipments of microfilm for brittle, circulating collections, primarily due to copyright restrictions. We also still create preservation photocopies for materials for which we really need physical copies on the shelves, such as music scores and reference materials.

In addition, we have a fully developed program for ebook creation for public domain materials and PRD staff is responsible for every step of this process, as they have been for many years with microfilming and photocopying. The staff collates items, searches for existing copies, creates copy catalogued records for the new formats, sends and receives vendor reformatted materials, and is responsible for all QC, image processing and uploading and organizing on Internet Archive.

ldpd_10993010_000_00000001

A pamphlet from the Missionary Research Library (Burke Library)

A future blog post will explore this process and some of the customizations and enhancements that PRD has come up with over the years.

Finally, we are also responsible for the front end of patron services and for numerous special projects, such as the Columbia Spectator digitization project and Burke’s Missionary Research Library digitization, images of which are included in this post.