Book reviews in nineteenth-century periodicals seem like the perfect data for doing computer-assisted disciplinary history. The body of the review gives information about the words used by early generations of literary critics while the paratext provides semi-structured information about how these early literary critics read, evaluated, classified: they include section headings labeling the topic or genre of books under review alongside bibliographic information. Such material, when studied in aggregate, could provide valuable insight into the long history of literary criticism. Yet there’s a significant obstacle to this work: important metadata created by nineteenth authors and editors is captured erratically (if at all) within full-text databases and the periodical indexes that reference them.
My project aims to tackle this dilemma and develop a method for doing this kind of disciplinary history. To do so, I’m constructing a medium-sized collection of metadata that draws on both unsupervised and supervised models of reading. Working with a corpus of three key nineteenth century British periodicals over a 35 year period (1866–1900), this project collects metadata on the reviews works––capturing the review metadata as it exists in existing database and indexes, and using more granular data extraction to capture sections headings like “new books,” “new novels,” or “critical notices”). I then pair this metadata with computer-assisted readings of the full texts, generating “topic models” of frequently co-occurring word clusters using MALLET, a toolkit for probabilistic modeling. While the topic models offer the possibility of reading over a larger number of unlabeled texts, the metadata provides a way of slicing up these topic models based on the way these reviews were originally labeled and organized. The end goal here is to create a set of metadata that might be navigated in an interface or downloaded (as flat CSV files).
Though the case study will be of practical use for Victorianists, the project aims to address questions of interest to literary historians more generally. What patterns emerge when we look at an early literary review’s subject headings over time? What can we learn from using machine-learning to sift through a loose, baggy category like “contemporary literature” as it was used by reviewers during the four decades of specialization and discipline formation at the end of the century? Critical categories and vocabularies about them presents a particularly thorny problem for literary interpretation and classification of “topics” (see work by Andrew Goldstone and Ted Underwood or John Laudun and Jonathan Goodman). I hope to assuage some of these anxieties by leveraging the information already provided by nineteenth century review section headings, which themselves index, organize and classify their contents.
Much of the first phase of this project is already underway: I’ve collected nearly 418 review sections in three prominent Victorian periodicals The Fortnightly Review, The Contemporary Review and The Westminster Review, with a total of nearly 1,230 individual reviews. I’ve extracted and stored the bibliographic metadata in Zotero, and I’m in the process of batch-cleaning the texts of the reviews so as to prepare the texts for topic modeling and for further extraction of bibliographic citations. I’ve also begun topic modeling a subsection of the “fiction” section of the Contemporary Review. Some of the preliminary results are exciting––for instance, the relatively late emergence of “fiction” as its own separate category within the broader category of “literature” reviews in The Contemporary Review.
The next phases will require further data wrangling as I prepare the corpus of metadata and the full-texts for modeling. In the immediate future, I plan to improve my script for extracting the section headers and the titles of reviewed works. Once this is done, I’ll generate a set of topic models for the entire corpus, then use the enriched metadata to sort and analyze into sub-sets (by journal, review section or genre title, and date). Most of the work of the project comes in pre-processing the data for the topic models; running topic models themselves will be a relatively quick process. This will give me time to refining the topic models––disambiguating “topics,” refining the stopwords list––and to work on the best method for collating the topic results with the existing metadata. Finally, I plan to spending the last stages of the project experimenting with the best ways to visualize this topic model and metadata collection. Goldstone and Goodman have created for visualizing topic models of academic that I’ll be building off of in displaying my data from the Victorian reviews.
While relatively in scale (3 periodicals, a 35-year period), this narrower scope, I hope, will make this an achievable project and a test case for how topic modeling could be used more strategically when paired with curated metadata. For my own research, this work is essential. My goal with the project, however, is not just to provide a way to read and study the review section over time, but to provide a portable methodology useful for intellectual historians, genre, and narrative theorists and literary sociologists. By structuring the project around metadata and methodology, I hope to also make a small bid for treating the accessibility and re-usability of data as just as important as the models made from it.