The discussion around the Digital humanities often seems to be focused on transforming texts into data. Turning language to information, humanists are told, renders their increasingly archaic materials available to a range of computational methods. The majority of these methods, however, are not a part of their training and humanists look towards the other side of the two-cultures divide. Graphs, codes, and algorithms were not apart of the deal when we decided to study the fuzzy arts. Yet, we recognize that the world is now data-driven and we, naturally, want to go along for the ride.
So, we give ground to the number crunching and the people who seem to know it the best. We take our cues from them, adopting their methods and we try visualizing our material, because our narratives seem to have become so impoverished.
Don’t get me wrong, I think there is some great potential in thinking about texts as data and incorporating computational methods. At the same time, however, the move to the digital should not necessarily mean the relinquishing of techniques, approaches, and procedures inherited through the humanities. We should not give up ground to the quantitative so easily. I want to argue that thinking about data as text by scholars trained in “the old way” might be as productive and important as adapting our text to the digital and and rendering them as data.
By way of some sort of reflection of this, I want to provide two excerpts of things that I have been reading recently.
Karin Barber has written a theoretical reflection on what a comparative historical anthropology of texts would look like. She is specifically interested in oral and written culture in Africa but her discussion is quite global. She starts with a basic proposition: “A text is a tissue of words” (1). Defining text beyond its materiality is critical for her as she is trying to think of the oral and the written simultaneously. Textuality, then, is in effect “the idea of weaving or fabricating — connectedness, the quality of having been put together, of having been made by human ingenuity” (21) and not its medium. Leaving the question of “the human” aside for now, I think keeping her basic argument in mind can help us think of data and code as texts, to essentially think of the oral, the written, and the digital together.
The basic recognition that I think is useful is that, following Barber, texts are things, say things, and do things. That texts, oral or written or digital, differ in modality are not an indication that they are so fundamentally different.
The second text is from the Natural Language Processing with Python textbook.
Here, the understanding of “text” is quite similar. Although this definition, compels us to think more of the progressive levels of a structure. This topic, of course, is about using something already recognizable as a text and rendering them available for computational methods. In the example immediately following this one, the method is determining lexical diversity.
Isn’t this what “theory” has been doing for decades?
You might say that this is a bad example because it does not show how “data” can be read as text. However my main point is that we should not let the materiality of the text, that is its medium, trick us into surrendering a textualist orientation to the material. The materiality has changed but still fundamentally a text, an intentional weaving together of structural elements that have to manage limitations and opportunities of the form itself in order to produce meaning.
This has been more of a musing than an argument. Accordingly, I would like to end with a question.
What would a text-based, or a textualist approach to data look like?