Category Archives: Audio Preservation

Recreating a lost Yiddish database: The LCAAJ Project

The Language and Culture Archive of Ashkenazic Jewry (LCAAJ) is an extraordinary resource for research in Yiddish studies.  It consists of field interviews recorded between 1959 and 1972 with Yiddish-speaking informants conducted by Columbia University’s Department of Linguistics, who donated the Archive to Columbia University Libraries in 1995.

The Archive presents an interesting preservation challenge, since the original researchers created not only the audiotapes and large quantities of paper documents, but also computer data that has not survived the test of time.

The interviews were collected from people who originally lived in 603 different locations in Central and Eastern Europe, to create a sample that reflected the distribution of the Yiddish-speaking population on the eve of World War II.  The informants answered questions on a wide variety of topics concerning Yiddish language and culture during interviews lasting anywhere from 2.5 to 16 hours.  In all, the project produced 5,755 hours of audiotaped sessions with the native speakers and ca. 100,000 pages of questionnaires.  The documents are covered with hand-written linguistic field notes that were taken during the interviews in a mix of English, Yiddish, and a linguistic notation system developed for the project that uses only characters that the computers of the day could handle.  No verbatim transcriptions of the interviews were ever made.

Examples of questionnaire pages, with linguistic field notes in English, Yiddish and a special linguistic notation system developed for the project.

In the late 1960s and early 1970s, about half of the data collected by the project was coded onto punch cards and read onto computer tapes in order to create lists that would facilitate creation of maps of linguistic features.  These were later published in the multi-volume Language and Culture Atlas of Ashkenazic Jewry.

Current scholars want to manipulate the data for further study, but the original punch cards and computer tapes vanished decades ago.  No one thought of preserving them.  If they had, it would have presented an interesting challenge for digital archaeologists.  Instead, all we have left is printouts of the data on the green-and-white striped pin-fed paper that evokes memories from people of a certain age.

Example of a printout from the original computer database on green-and-white striped pin-fed paper.

CUL obtained a grant from the National Endowment for the Humanities in 2015 to start recreating the database.  We scanned each printout page to create TIF images, then put them through OCR (optical character recognition) and mark-up to generate new machine-readable tables.  Abbyy FineReader OCR software was used for this purpose.  The pages were first zoned and analyzed to identify the tables of data on each page, and the text in each of the series was then subjected to a few hours of software “training” to enhance accuracy.  After full machine reading and some cleanup, all of the pages were exported as MS Excel spreadsheets and put through additional cleanup processes.  Scholars can now search and manipulate the data once again.

The handwritten notes that served as the input to the computer database contain additional information that was never coded in.  We have also digitized those as page images.  Our new site allows scholars to move between the tables and the questionnaire pages to make sure they have all the information relevant to their research. (See the user’s guide.) For more information on this project, check out this interview with Michelle Chesner, Norman E. Alexander Librarian for Jewish Studies at Columbia University.

Luckily, the original audiotapes were preserved.

CUL digitized the tapes some years ago in a multi-year effort with generous support from NEH, private foundations, the New York State Conservation/Preservation Program, and EYDES (Evidence of Yiddish Documented in European Societies, a project of the German Förderverein für Jiddische Sprache und Kultur).  The audio files are available online on the EYDES site (www.eydes.de).

One of our next aims is to raise money to link the audio files and the digital data.  Columbia’s LCAAJ site will continue to evolve and add more information and more functionality to keep this re-created database relevant for new researchers.

Links cited in this post:

  • Language and Culture Atlas of Ashkenazic Jewry https://clio.columbia.edu/catalog/1231536
  • LCAAJ in Columbia Digital Library Collections https://dlc.library.columbia.edu/lcaaj
  • LCAAJ User’s Guide http://guides.library.columbia.edu/lcaaj
  • In Geveb Journal of Yiddish Studies https://ingeveb.org/blog/yiddish-linguistics-and-digital-humanities-a-conversation-with-michelle-chesner-about-the-digitization-of-the-language-and-culture-atlas-of-ashkenazi-jewry-archive-at-columbia-university
  • Evidence of Yiddish Documented in European Societies www.eydes.de

Hearing Voices from a Broken Disc

Hearing the voices of people who lived in another century brings them close to us, but early recording technology makes hearing them a challenge. In the first half of the 20th century a common recording method was to use discs with a lacquer surface. Sound waves caused a stylus to vibrate and cut grooves into the lacquer while the disc turned. The recording was played back by running another stylus through the grooves and amplifying the sound. The inner core of the discs was metal, cardboard, or even glass. Playing these old recordings is a problem – the lacquer deteriorates over time, developing cracks and sometimes detaching from the core, and of course glass is easily broken.

Until a few years ago, a broken record was a lost cause – while conservators can repair many types of damage, they cannot put broken glass recordings back together again. But in 2013 scientists from Lawrence Berkeley National Laboratory developed IRENE (Image Reconstruction Eliminate Noise, Etc.), a digital imaging system that can make a picture of the grooves on a disc and then transform the images into digital sound files. Carl Haber, the lead scientist and a Columbia graduate, won the MacArthur Fellows award for his work. (For more on Haber and how he developed IRENE, see this article in Columbia College Today).

disc-13-join-the-news-reel

Glass disc, WNEW Join the News Reel, 10 February 1944, American Bureau for Medical Aid to China 1937-2005, Rare Book & Manuscript Library, Columbia University

Like many other libraries and archives, Columbia has its share of glass and other fragile recordings. When IRENE became available from the Northeast Document Conservation Center, we sent off this disc from 1944 to test the new service. The disc had shattered and small fragments along the edges of the breaks had been completely lost. Using IRENE, each surviving fragment was separately imaged, and then the entire recording was digitally reassembled. Pops and clicks can be heard where bits of the lacquer were missing, but this recording of WNEW’s Join the News Reel from 10 February 1944, broken decades ago, now speaks once more.

Listen here:

Learn more about IRENE at NEDCC.

irene-system

The IRENE system at the Northeast Document Conservation Center, mounted on a vibration-damping pneumatic air table. Photo courtesy of Northeast Document Conservation Center.