Text and Image Scanning and Machine Reading (OCR)
at the Digital Humanities Center

The Digital Humanities Center’s computer lab (DHC) in 305 Butler provides Columbia students, faculty, and staff with 10 scanners for copying images and copying and reading texts in a vets.2008-04-30.DSC_9678ariety of forms – books, loose pages, photographs, slides, negatives, microfilm, microfiche, and microprint.   (Visitors can take advantage of four public canners, one in room 300, another in room 304, and two more in room 401.) The reading process for texts, known as optical character recognition (OCR), which is implemented by default when one scans, results in output that can be searched, annotated, extracted, and edited, greatly enhancing its value as a tool for research and learning.   The powerful Abbyy FineReader OCR software available at the DHC can produce highly accurate text for most of the world’s languages, including Chinese, Japanese, and Korean, and the DHC is in the process of closing the few remaining gaps by acquiring software for reading languages in South Asian and Arabic alphabets.

Five large-format 14 x 17 inch Fujitsu scanners can easily accommodate most book sizes as well as large stacks of loose sheets, enabling a user to create a pdf from a 300-page book in just about half an hour. An overhead Minolta scanner can handle even larger formats (up to 18 x 24 inches) and is ideal for brittle material that might be harmed on a flatbed machine. Three smaller Fujitsu scanners are well suited for quickly scanning stacks of loose pages and outputting copies of documents using FineReader or Adobe Acrobat Professional, as well as for digitizing smaller-format books. All of the scanners can process images, but the two 14 x 17 Epson XL10000 scanners are optimal for producing quality high-density images of opaque or transparent material and are capable of handling multiple images at the same time. Finally, a ScanPro 2000 scanner can deal with most forms of microforms, and in the case of some microfilm, can be made to automatically process a series of images. Depending on the quality of the original, the resulting images of those microforms can often be successfully OCRed as well. (Two other ScanPro scanners are available in the Periodical Reading Room in 401 Butler.)

Printed guides for using each of these scanners are available at the DHC, and staff is on duty to train and assist you in your work. The lab is open Monday 11-6, Tuesday through Thursday 11-9, Friday 11-6, and Saturday and Sunday 12-6. If you would like to reserve a particular scanner in advance, you can do so by calling 212-854-7547 during opening hours.