Map Club: Reflections on Teaching Self-Teaching in Digital Scholarship

This academic year, through my internship with the Center for Spatial Research and the Digital Social Science Center, I aspired to demystify digital mapping. I formed a series of fast-paced hack sessions focused on play, exploration, and the rapid acquisition of skills. To evoke exploration and inclusivity, I named the series Map Club.

Map Club represents an approach to learning. It seeks to hone the capacity to adapt to change, to encourage fearlessness in the face of new technology, and to nourish the value of experimentation without a specific goal. I color this description with a rhetorical intrepidity because I believe humility, determination, and bravery are the best traits to muster when digging into unfamiliar modes of making. Through Map Club, I wanted to leverage individual autonomy and creativity to teach attendees how to be self-taught. I hoped to achieve this by creating a space for collective, unstructured exploration, within which attendees could teach themselves.

Since its inception this summer, Map Club has met for 14 sessions and has explored 10 different mapping and visualization tools. Attendees have written code in JavaScript, Python, CartoCSS, and a bit of GLSL. We have fostered a space of creativity, collaboration, and digital empowerment, while continuing to grapple with the roadblocks that surface in new, less structured endeavors.

At the same time, this model has been neither unanimously fulfilling nor consistently easy. Map Club has suffered low attendance, mixed feedback, and inconsistent interest. Here, I would like to examine some of the reasons behind its irregular reception, as well as suggest some ideas for mitigating it.

📍Structure

In an effort to combat disorientation, each Map Club session this semester was loosely divided into three sections:

  • (20 minutes) Setup. Downloading a text editor, setting up a local server, and ensuring that example files display properly in the browser.
  • (60 minutes) Self-paced making. Unstructured time for attendees to experiment with the tool or library of the day.
  • (10 minutes) Sharing. Go-around to exhibit screenshots, cool glitches, and creative compositions.

While this schedule does help to divide up session time, it does not supplant the comforting sense of structure provided by a knowledgeable workshop leader. Though some students regularly stayed for entire sessions, others left early and never returned.

📍Challenges

Based on attendee feedback, as well as my own observation, I believe the inefficacy of the initial Map Club model has three consistent causes.

  1. Attendees new to code have a harder time adopting it as a medium. Everybody learns differently. In the absence of prior experience, jumping into a new programming language without a guided tutorial can be confusing and disorganized.
  2. Unstructured time is not necessarily productive. Sometimes, the biggest challenge is figuring out what to do. Even for attendees who do have experience with code, determining how to spend the hour can become its own obstacle.
  3. An undefined goal is not the best stimulus. In choosing to attend a scheduled meeting, many attendees hope to avoid the obstacles and glitches that come from figuring out new platforms or libraries on their own. An un-guided workshop seems pointless.

📍Looking forward

Future Map Club sessions can improve by providing certain types of guidance for attendees, without encroaching upon the self-paced nature of learning-by-hacking.

  1. Provide a starter kit for new Map Club members. A bundled tutorial for introducing new attendees to basic digital mapping concepts provides material to help them better spend the session in a valuable way.
  2. Provide basic templates for download (when applicable). Even experienced attendees benefit from the time saved.
  3. Provide a list of tool-specific challenges. To make the session as productive as possible, put together a list of potential ideas, or challenges, for members to independently explore.
  4. Be available for questions. Even though these sessions are self-driven, nobody should be left in the dark. Leverage other attendees’ knowledge, too.
  5. Emphasize the value of mistakes. Some of the coolest visual output this semester came from members who took novel approaches to producing digital maps — Ruoran’s GeoJSON/Cartogram mashup, for instance, or Rachael’s vibrant approach to tiling. Encourage attendees to relish the proverbial journey and focus on editing, manipulating, and experimenting. De-emphasizing an end goal helps to alleviate the impetus to finish something.
  6. Include some guided workshops. To combat fatigue induced by Map Club’s ambiguous structure, I inserted several guided workshops into the series throughout the semester. Aside from keeping the momentum going, certain tools or frameworks (such as D3.js or QGIS) benefit from a step-by-step introduction.

📍Final thoughts

As an alternate workshop model, I believe that Map Club has the capacity to position technology as an ephemeral means to an end rather than a capability to master. By emphasizing what is plaint, inessential, and surprising about digital platforms, instead of what is inaccessible and opaque, my hope is that this series can foreground the process of learning as an end in itself.

To view the full repository of Map Club materials, sessions, and tutorials, click here. For recaps of each session, visit the “map club” tag on the Digital Social Science Center blog.

A Medium-Scale Approach to the History of Literary Criticism: Machine-Reading the Review Column, 1866-1900

Book reviews in nineteenth-century periodicals seem like the perfect data for doing computer-assisted disciplinary history. The body of the review gives information about the words used by early generations of literary critics while the paratext provides semi-structured information about how these early literary critics read, evaluated, classified: they include section headings labeling the topic or genre of books under review alongside bibliographic information. Such material, when studied in aggregate, could provide valuable insight into the long history of literary criticism. Yet there’s a significant obstacle to this work: important metadata created by nineteenth authors and editors is captured erratically (if at all) within full-text databases and the periodical indexes that reference them.

My project aims to tackle this dilemma and develop a method for doing this kind of disciplinary history. To do so, I’m constructing a medium-sized collection of metadata that draws on both unsupervised and supervised models of reading. Working with a corpus of three key nineteenth century British periodicals over a 35 year period (1866–1900), this project collects metadata on the reviews works––capturing the review metadata as it exists in existing database and indexes, and using more granular data extraction to capture sections headings like “new books,” “new novels,” or “critical notices”). I then pair this metadata with computer-assisted readings of the full texts, generating “topic models” of frequently co-occurring word clusters using MALLET, a toolkit for probabilistic modeling. While the topic models offer the possibility of reading over a larger number of unlabeled texts, the metadata provides a way of slicing up these topic models based on the way these reviews were originally labeled and organized. The end goal here is to create a set of metadata that might be navigated in an interface or downloaded (as flat CSV files).

Though the case study will be of practical use for Victorianists, the project aims to address questions of interest to literary historians more generally. What patterns emerge when we look at an early literary review’s subject headings over time? What can we learn from using machine-learning to sift through a loose, baggy category like “contemporary literature” as it was used by reviewers during the four decades of specialization and discipline formation at the end of the century? Critical categories and vocabularies about them presents a particularly thorny problem for literary interpretation and classification of “topics” (see work by Andrew Goldstone and Ted Underwood or John Laudun and Jonathan Goodman). I hope to assuage some of these anxieties by leveraging the information already provided by nineteenth century review section headings, which themselves index, organize and classify their contents.

Much of the first phase of this project is already underway: I’ve collected nearly 418 review sections in three prominent Victorian periodicals The Fortnightly Review, The Contemporary Review and The Westminster Review, with a total of nearly 1,230 individual reviews. I’ve extracted and stored the bibliographic metadata in Zotero, and I’m in the process of batch-cleaning the texts of the reviews so as to prepare the texts for topic modeling and for further extraction of bibliographic citations. I’ve also begun topic modeling a subsection of the “fiction” section of the Contemporary Review. Some of the preliminary results are exciting––for instance, the relatively late emergence of “fiction” as its own separate category within the broader category of “literature” reviews in The Contemporary Review.

The next phases will require further data wrangling as I prepare the corpus of metadata and the full-texts for modeling. In the immediate future, I plan to improve my script for extracting the section headers and the titles of reviewed works. Once this is done, I’ll generate a set of topic models for the entire corpus, then use the enriched metadata to sort and analyze into sub-sets (by journal, review section or genre title, and date). Most of the work of the project comes in pre-processing the data for the topic models; running topic models themselves will be a relatively quick process. This will give me time to refining the topic models––disambiguating “topics,” refining the stopwords list––and to work on the best method for collating the topic results with the existing metadata. Finally, I plan to spending the last stages of the project experimenting with the best ways to visualize this topic model and metadata collection. Goldstone and Goodman have created for visualizing topic models of academic that I’ll be building off of in displaying my data from the Victorian reviews.

While relatively in scale (3 periodicals, a 35-year period), this narrower scope, I hope, will make this an achievable project and a test case for how topic modeling could be used more strategically when paired with curated metadata. For my own research, this work is essential. My goal with the project, however, is not just to provide a way to read and study the review section over time, but to provide a portable methodology useful for intellectual historians, genre, and narrative theorists and literary sociologists. By structuring the project around metadata and methodology, I hope to also make a small bid for treating the accessibility and re-usability of data as just as important as the models made from it.

A Reflection on My Internship with DCIP

This fall semester I joined the Digital Center Intern Program(DCIP) as an instructor intern. My internship is primarily focused on developing lesson plans for and hosting weekly R Open Labs. This internship allowed me to try different teaching approaches and explore different topics about R. It was an intellectually challenging and rewarding experience. The highlights of my experience were discussing with people from diverse academic backgrounds about how to use R to help their research. I learned a lot about applications of statistical analysis from these discussions and it felt wonderful to help people.

At the beginning of the semester, I started R Open Lab as a very structured instructional session and covered the basic usage of R. Later on, after talking with other librarians, I decided to make the open lab more free-flowing and put more emphasis on discussion instead of instruction. I found that, by getting participants more engaged in conversation, I was able to better understand their needs and help them with their research.

The internship offered a great opportunity for me to see for myself how R and statistics could be used as a tool for research. For example, one of the open lab’s regular participants used R to conduct sentiment analysis to gain insights about stress measurement and management in medical research. Another participant used R to extract information from Russian literature and conducted text analysis to understand the political situation at different times. It never occurred to me that statistics are so broadly used in different fields until I talked with these people.

Considering the participants’ interest and needs, I am planning to talk more about plotting, data cleaning and data scraping in the next semester. Since people coming to the open lab often have completely different levels of understanding of R, I am hoping to encourage more peer learning at the open lab next semester.

This internship motivated me to gain a deeper understanding of R and enhanced my teaching skills. It is an amazing program and I had an incredibly fulfilling experience. I really look forward to the future work in the program and I hope to do better next semester.

Introduction to Semi-Automated Literary Mapping

Literary mapping presents exciting possibilities for criticism and the digital humanities, but it is hampered by a seemingly intractable technical problem. A critic interested in mapping must rely on either full hand-coding, which takes too much time and labor to be useful at scale, or full automation, which is frequently too imprecise to be of any use at all. As a result, mapping projects are usually either narrow but reliable or broad but dubious.

I am therefore developing an interface that will combine the best aspects of both approaches in a process of semi-automated literary mapping. This interface will combine automatic pre-processing of text data, to identify locations and suggest geotags, with a backend framework designed to speed up the process of cleaning the resulting data by hand. The end product will work something like this: the interface prompts its user for a plain text file, then presents the user sequentially with each of the identified “locations”; for each of these locations, the user confirms whether it is indeed a location, and chooses among the top map hits from Geonames or a similar API; from here, other options are available, such as “Tag all similar locations” or “Categorize location” to add flexibility for individual projects. When the text has been fully processed, the results are presented as a web map and are exportable for use in GIS mapping software like QGIS or ArcGIS. The entire process is simple, but the gains in the speed and ease of mapping literary locations could be considerable.

While much work remains to be done, some of the fundamental backend components have already been built. I already have a working Python prototype of the automation process using the Stanford NLP Group’s Named Entity Recognizer and the Geonames API, which I presented in a workshop as part of Columbia’s Art of Data Visualization series in spring of 2016. I further developed and employed this prototype in a collaborative project studying the medieval Indian text Babur-nama. Using automatic location extraction and geolocation, ArcGIS, and some minor hand-cleaning of geodata by myself and my partners, my team detected clusters in point data from the text, deriving from these clusters a rough sense of the regions implicit in Babur’s spatial imagination (which contrast suggestively with the national borders in maps by colonial maps of the period).

This project demonstrates the viability of the method while also underscoring the importance of preprocessing the data and expediting the hand-cleaning with geotag recommendations.

The next phases of the project will focus on building the framework and improving the location extraction and geotag recommendation algorithms. I will begin developing the interface itself, using either Flask or a combination of Django and GeoDjango for the interface’s skeleton. I will then incorporate the two elements of the automatic pre-processing that I have already developed: location recognition and Geonames-based geotagging. From here, I will turn to the “output” stage, allowing the data to be exported for use in GIS software and, hopefully, displayed in the framework itself. Finally, I will use the prototype in a test case from my own field of literary study, twentieth-century American fiction. If time remains in the year, I will devote it to developing methods to use previously input data to recommend a geotag, thereby making the pre-processing “smarter” the longer it is used.

For me, this is a project born of necessity: I need it for my own work, but it does not yet exist. I would be very surprised if I am alone in this. I am excited by this project, therefore, because I see it as achievable and as having the potential for significant impact in the field of literary mapping: if everyone else doing this kind of work is facing the same problems, an interface that helps them circumvent those problems could be a first step for many humanists working in this field.

I am a fourth-year PhD Student in the Department of English and Comparative Literature studying 20th-century American fiction, ecocriticism, and DH. I started working on location extraction and literary spatial data about a year ago; before that I was working on RogetTools, a Python framework for word categorization and semantic analysis based on Roget’s hierarchy of semantic categories. More at my website.

Perceptual Bases for Virtual Reality: Part 2, Video

This is Part 2 of a post about the perceptual bases for virtual reality. Part 1 deals with the perceptual cues related to the spatial perception of audio.

The chief goal of the most recent virtual reality hardware is to simulate depth perception in the viewer. Depth perception in humans arises when the brain reconciles the images from each eye, which differ slightly as a result of separation of the eyes in space. VR headsets position either a real or virtual (a single LCD panel showing a split screen) display on front of each eye. Software sends to the headset a pair of views onto the same 3D scene that are rendered from the perspective of two cameras in the virtual spaces, separated by distance between the user’s eyes. Accurate measurement and propagation of this inter-pupilliary distance (IPD) is important for effective immersion. The optics inside the Oculus Rift, for instance, are designed to tolerate software changes to the effective IPD within an certain operational range without requiring physical calibration. All these factors taken into consideration, when the user allows themselves to focus on a point beyond the surface of the headset display, they will hopefully experience a perceptually fused view of the scene with the appropriate sense of depth that arises from stereopsis.

However, stereoscopic cues are not the only perceptual cues that contribute to a sense of depth perception. For example, the widely-understood motion parallax effect is a purely monocular cue for depth perception: as we move our head, we expect objects closer to us to seem to move faster than those that are further away. Many of these cues are experiential truisms: objects farther away seem smaller, opaque objects occlude, and so on. Father Ted explains it best to his perenially hapless colleague Father Dougal in this short clip.

Others are less obvious, though well-known to 2D artists, like the effect that texture and shading has on depth perception. Nevertheless, each one of these cues needs to be activated by a convincing VR rendering, and implemented in either the client application code or the helper libraries provided by the device vendor (for instance, the Oculus Rift SDK). Here, I discuss three additional contingencies that impact the sense of VR immersion that go beyond typical depth-perception cues, to show the importance of carefully understanding human perception in order to produce convincing virtual worlds.

Barrel distortion

As this photograph, taken from the perspective of the user of an Oculus Rift, shows, the image rendered to each eye is radially distorted.
Image from inside Oculus Rift
This kind of bulging distortion is known as barrel distortion. It is intentionally applied (using a special shader) by either client software or the vendor SDK to increase the effective field-of-view (FOV) of the user.  The lenses used in the Oculus Rift correct for this distortion. The net result is an effective FOV of about 110 degrees in the case of the Oculus Rift DK1. This approaches the effective stereoscopic FOV for humans, which is between 114 and 130 degrees. Providing visual stimuli in our remaining visual field (our peripheral vision) is important for the perception of immersion, so other VR vendors are working on solutions that increase the effective FOV. One solution is to provide high-resolution display panels which are either curved or tilted in such a way as to encompass more of the real FOV of the user (e.g. StarVR). Another solution is to exploit Fresnel lenses (e.g. Weareality Sky), which can provide an effective FOV larger than regular lenses in a more compact package that is suitable for use in conjunction with a smartphone. Both of these methods have drawbacks: the additional cost of larger panels increases the total cost of the ‘wrap-around method’, while Fresnel lenses produce ‘milky’ images and their optical effects are more difficult to model in software than those of regular lenses.

‘Smoothness’

An exceptionally important factor in the perception of immersion in virtual reality is the sense of smoothness of scene updates in response to both user movement in the real world, and avatar movements in the virtual world. Perhaps the most bottleneck in this process to tackle is the rendering pipeline. For this reason, high-end gaming setups are the norm for the recommended system specifications for virtual reality. Builds in excess of 8GB of system RAM, a processor beating at least an Intel Core i5, and mid- to high-range PCIe graphics cards with at least 4GB of VRAM on-board are de rigeur. NVidia have partnered with component and system manufacturers to develop a commerce-led set of informal standards known as ‘VR Ready’.

Even if the rendering pipeline is able to provide frames to the display at a rate and reliability sufficient for the perception of fluid motion, the user motion detection subsystem must also be able to provide feedback to the game at a sufficiently high rate, so that motions in the real world can be translated to motions in the virtual scene in good time. The Oculus Rift has an innovative and very high resolution head-tracking system that fuses accelerometer, gyrometer, and magnetometer data with computer vision data from a head-tracking camera that infers the position of an array of infrared markers in real space. Interestingly, even very smooth motions in the virtual world can induce nausea and break the perception of immersion if those motions cannot be reconciled with normal human behavior. So, for instance, in cutscene animations, care must be taken not to move the virtual viewpoint in ways which do not correspond with the constraints of human body motion. For example, rotating the viewpoint around the axis of the human neck in excess of 360 degrees entails disorientation and confusion.

Contextual depth cues can remedy confounding aspects of common game mechanics

Apart from those aspects of rendering that are application-invariant, VR game programming poses special problems to the maintenance of user immersion and comfort, because of the visual conventions of video game user interface. This excellent talk by video game developer and UI designer Riho Kroll indicates some of the solutions to potentially problematic representations of certain popular game mechanics.

Kroll gives the example of objective markers in the context of a first-person game that are designed to guide the player to a target location on the map corresponding to the current game objective. Normally, objective markers are scale-invariant and unshaded, and therefore lack some of the important cues that allows the player to locate them in the virtual z-plane. Furthermore, objective markers tend to be excluded from occlusion reckonings. The consequence is that if the player’s view of the spatial context of an objective marker is completely occluded by another game object, almost all of the depth-perception cues for the location of the marker are unavailable. Kroll describes an inelegant but well-implemented solution: during similar conditions of extreme occlusion, an orthogonal grid describing the z-plane is blend-superimposed over the viewport. This orthogonal grid recedes in to the distance, behaving as expected according to the conventions of perspective and thereby providing a crucial and sufficient depth-perception cue in otherwise adversarial circumstances.

Automating the Boring Stuff!

Hey!

I am Harsh Vardhan Tiwari, a first year Master’s student in Financial Engineering student, I am working on web scraping, which is a technique of writing code to extract data from the internet. There are several packages available for this purpose in various programming languages. I am am primarily using the Beautiful Soup 4 package in Python.  There are various resources available online for exploring the functionalities within Beautiful Soup, but the 2 resources I found the most helpful are:

  1. http://www.crummy.com/software/BeautifulSoup/bs4/doc/
  2. Web Scraping with Python: Collecting Data from the Modern Web by Ryan Mitchell

My project basically involves writing a fully automated program to download and archive data mostly in PDF format from about 80 webpages containing about a 1000 PDF documents in total. Imagine how boring it would be to download them manually and more so if these webpages are updated regularly and you need to perform this task on a monthly basis. It would take us hours and hours of work, visiting each webpage and clicking on all the PDF attachments on each webpage to perform this task. And even worse if you have to repeat this regularly! But do we actually need to do this? The answer is NO!

We have this powerful tool called Beautiful Soup in Python that can help us automate this task with ease. About a 100 lines of code can help us accomplish the task. I will now give you an overall outline of how the code could look like.

Step 1: Import the Modules

So this typically parses the webpage and downloads all the pdfs in it. I used BeautifulSoup but you can use mechanize or whatever you want.

img1

 

 

 

Step 2: Input Data

Now you enter your data like your URL(that contains the pdfs) and the download path(where the pdfs will be saved) also I added headers to make it look a bit legit…but you can add yours…it’s not really necessary though. Also the BeautifulSoup is to parse the webpage for links

img2

 

 

 

Step 3: The Main Program

This part of the program is where it actually parses the webpage for links and checks if it has a pdf extension and then downloads it. I also added a counter so you know how many pdfs have been downloaded.

img3

 

 

Step 4: Now Just to Take Care of Exceptions

Nothing really to say here..just to make your program pretty..that is crash pretty XD XD

img4

Conclusion

This post covers the case where you have to download all PDFs in a given webpage. You can easily extend it to the case of multiple webpages. In reality different webpages have different formats and it may not be as easy to identify the PDFs and therefore in the next post I will cover the different formats of the webpages that I encountered and what did I need to do to identify all the PDFs in them.

Thanks for reading till the end and hope you found this helpful!

Blog Post 2.0

Well, I am naming this as Blog Post 2.0, as there has been a serious revamping in terms of my project goals. Firstly, the work on 3D-modelling software is done. I will no longer be talking more about that. My whole focus will be on the data collection software, i.e. Suma. Oh wait, that’s off the table as well.

Will (my advisor) and I have come to a conclusion that after numerous failed attempts at implementing Suma, we will work on our own platform for data collection, which will be molded according to the needs of the Science and Engineering Library, and would be scalable enough, so that other libraries can, with some amount of work, implement this platform there as well.

Currently, the work that has been done is that I have created an array of 6 Library computers, and the librarians can just click on each computer to indicate that it is being used, or double click to indicate that the seat is occupied, but the PC is not being used and a personal Laptop is being used instead.

The next things that I am working on is to add the possibility of indicating that a certain PC is non-functional, and storing that information through different sessions of data collection.

The website is up and running (alpha-version) on www.columbia.edu/~nk2639.

Lastly, let me introduce you to an amazing website that I used for the initial development of this website. It’s called codepen.io, and it allows you to type in code for HTML5, CSS, and JS on the same portal, and it incorporates the code as soon as you complete typing. So, for the sake of testing, it is quite good a platform.

Hoping to complete a lot more by Blog Post 2.1

App4Apis (Update)

Phase: Final stages of completion

As we are going into the spring break, we want to the update the status of the project. Before diving into the details, a quick introduction of the project.

App4Apis: A one stop solution to access APIs that are designed to take the parameters in the query and return a json object. We have two types of configurations. The first is a list of preset APIs (Geocode, Human Resources Archive, Internet Archive) where we are well aware of the structure of the API and provide a form to update the necessary information for querying. The second is more generic where user provides an example url (including the query parameters) from which we identify the API request pattern. This pattern is then used to query the API with a larger dataset in the next step. We let the user download the results or the results can be sent as an email (helpful for particularly large datasets).

Status: In the last blog post, our to-do list was

  1. Finish few pending screens towards our goal.
  2. Integrate Geocode API into preset.
  3. Develop the functionality to support user to upload a file in the ad hoc query case.
  4. Cosmetic changes and make the website visually appealing.
  5. End to end exhaustive testing.
  6. Deploy!

From that list, we are able to finish the layout of pending screens, completed the integration of Geocode into the preset list, and made the website more visually appealing. We are in the process of end-to-end testing of the preset phase before moving onto the ad hoc query case. The redesigned screens of the website are attached at the end of the blog. Any feedback is welcome.

We are able to send the results to the user through email in the chunks of 1000 results per csv file. This will help the users to submit a task to the system and receive the results at a later point of time, enabling the system to handle large inputs and giving an option to user to not wait for the results.

Current tasks at hand are:

  1. Make the email sending process offline: Currently, we can send an email but we are not able to schedule it for a later point of time and allow the user to exit the screen. He can leave the application after submitting the tasks but we are not able to provide it as an option. I am working on taking the input as a task and schedule it for a later point of time.
  2. Complete testing of the preset APIs workflow.
  3. Complete testing of the ad-hoc requests workflow. 
  4. Deploy.

I am expecting to finish the first 3 tasks by this month end so that we can turn our focus on deployment from the start of next month.

Thanks,

Rohit Bharadwaj Gernapudi

The website screens are:

Capture_1

 

Capture_3Capture_4

Capture_2

Motivated Object-oriented programming: Build something from scratch!

This semester, I am putting together a 3D data visualization tool using Processing, to demonstrate the usefulness of this language for rapid prototyping of ideas for 3D graphics applications, including virtual reality. The code, documentation, and issue tracking are, as of recently, hosted on Github here.

If you’ve ever taken a programming course or encountered an intermediate online tutorial in a language that encourages object-oriented programming (OOP), you will have probably studied an example that describes the language feature in terms of a motivating example (or two). You might have designed Dogs that subclass Animals, which emit ‘woof!’s; Cars with max_speeds; Persons with boolean genders, and so on…

But these are toy examples that are underwhelming and too straightforward to help in practice. Furthermore, they’re not all that interesting. Once you’ve completed the typical OOP introduction, you often end up with code that performs a function implemented elsewhere a hundred times over, with greater efficiency. That in itself is not that bad: we all need pedagogical examples simple enough to introduce in a 75-minute lecture. What’s worse is that you’ve written code that you simply don’t care about. And that’s a recipe for demoralization.  For the beginning programmer (yours truly) OOP becomes a convenient abstraction that bored you to death once or twice (and went ‘woof!’).

Nothing motivates design like the task of modeling a system with which you are otherwise quite familiar. Daniel Shiffman’s free online book, The Nature of Code, focuses on the simulation of natural (physical) systems and gently introduces OOP as a means to modeling “the real world” (rather than a toy example of a Car with an Engine). Of course, Shiffman’s models are simplistic too, relying basic mechanics and vector math to animate their construction. Nevertheless, his examples and exercises leverage your best guesses about how the world works and challenge you to implement them in code, which is the nature of (many kinds of) programming: to take the big world and, in code, make a small world that — invariably imperfectly — reflects the large.

My advice, then, is dive right in with a project that you care about, preferably a project that requires many “moving parts” such as interdependent entities (nodes that talk to and consume others), user extensibility, and large amounts of object reuse: a model of a mini-universe of sorts. The model doesn’t have to be physical: it can be of social relationships, knowledge, data. All it has to do is matter.

In the remainder of this post I will sketch the design of the project I am currently working.

Project design

The goal of the software is to generate 3D data visualizations from quantitative and qualitative data ingested from a CSV file. The display of the visualization should be separate from its construction, so that ultimately different display ‘engines’ can be swapped in and out to allow for the presentation of the visualization on, for example, the computer screen, a VR headset, a smartphone, or even in the form of a 3D-printed model.

As it stands, the engine has a structure as depicted below. Incidentally, there is a well-documented and ‘popular’ domain-specific language for the description of the relationships between objects, which can generate similar-looking diagrams out of code, called the Unified Modeling Language (UML), but to take it on requires its own post. So, the picture below is a rough approximation of the design of the engine rather than a reproducible blueprint (such as that provided by UML and the like).

There is exactly one Scene in the application, which contains a list of PrimitiveGroups, which themselves contain Primitives. A Primitive corresponds to a single data point: one row of the CSV input. A Primitive has a location in 3D space, as well as a velocity, which allows for the animated restructuring of the Primitives on the fly. Some simple primitives are included: a sphere (PrimitiveSphere), a cube (PrimitiveCube). New Primitives must subclass the Primitive class (which should never be instantiated: it is an abstract class). Primitives must have display() and update() methods. The display() method contains the calls to Processing’s draw functions (e.g. box()). At this point, you realize that Primitive should (and can) be implemented as a Java interface. After all, Processing.org is Java at base. The Scene also contains an Axis object which can be switched on or off.

How does the engine generate the Primitives per the contents of the data file? And according to what rules? In many ways, this is the heart of any visualization engine. The concept of a DataBinding is introduced.

A DataBinding realizes a one-way mapping from the columns of a data source (i.e. kinds of data) to the properties of a Primitive, by returning a PrimitiveGroup which contains one Primitive for every row in the data source (read by the DataHandler, which is a very thin wrap around Processing’s Table object).

The mapping is specified by the contents of a DataBindingSchema, which is a hashmap (read in from a YAML file, see examples 1 2) in which the keys are the properties of Primitives and the values are column names in the data source. As a consequence, the DataBindingSchema specifies how the visual properties of Primitives respond to the data stored in the CSV file that is being read in. The DataBinding also has a validation method which throws a custom exception when the DataBindingSchema refers to column names and/or primitive properties which do not exist. It will ultimately also do type-checking.

 

Papal Documents Project Part #3 by Yanchen Liu

This post focuses on my work to digitize and transcribe Western MS 82, a major canon law text, describing some of the goals of this project as well as to indicate why this particular text should be of interest to the scholarly community and hence, why its introduction into broader circulation is particularly warranted. I then conclude with a brief note updating my work to produce a new, expanded version of the Libraries’ webpage Papal Documents: A Finding Aid.

Western MS 82, currently preserved in the Rare Book & Manuscript Library of Columbia University, is the most de luxe one of the six surviving manuscripts of Collectio Sinemuriensis, or Semur Collection. Modern-day scholars generally consider the second recension of this collection to be the earliest “Gregorian Reform” canonical collection, which embodies the spirit of the great Church reform movement of the eleventh and twelfth centuries. According to Linda Fowler-Magerl, the initial version of this canonical collection was composed at Reims at the close of the tenth century. While we know something of this first version, many of its canons have not survived. The second version, however, is better attested. There are six surviving examples of the second iteration, which date from the second half of the eleventh century and the early twelfth century. The earliest, most complete of these extant manuscripts, MS Semur-en-Auxois, BM M. 13, has given the collection its name. Columbia’s Western MS 82 seems to have been copied during the first decades of the twelfth century in northern France. To date there has been no critical edition or systematic study of either the Collectio Sinemuriensis or of Western MS 82.

Columbia acquired Western MS 82 in 2004. At the end of last year (2015) the manuscript was scanned into high-resolution images by the Preservation and Digital Conversion Division of Columbia. My project on this manuscript aims at producing a digital transcription that preserves nearly all the scribal and spelling features of the manuscript to provide historians, paleographers and medievalists not only with the contents of the full text but also textual clues that can provide valuable data for paleographical, linguistic, and historical investigations.

Western MS 82, as a parchment manuscript, comprises 15 quires and 119 folios, with the first folio and the last two folios of the last extant quire missing. On the verso of the first flyleaf, an eighteenth-century annotation indicates that this manuscript contains “Notitia provinciarum ecclesiasticarum Galliae“, also known as Notitia Galliarum, and “Collectio canonum“. Texts on the first nine folios, composed of the Notitia Galliarum and the capitulatio, a list of the rubrics of the canons in the collection, are laid out in two columns. The remaining text on ff. 10-119, i.e., the canons themselves, are written in single column. The canons are grouped into three books. Each book begins with an exaggerated and decorated initial (on folio 10r, 49v and 99v) that is nine to twenty-one lines high. The bodies of the canons are copied in a neat book hand. However, the rubrics of the canons, which are written in red ink, are very probably later add-ons, as they are often written in uneven lines at the end or even on the edge of paragraphs, and are enclosed by a curled line drawn on the left side of them to distinguish them from the canons. The script has an appearance of early stage Proto-Gothic.

Compared with other manuscripts of the Collectio Sinemuriensis, one of the most significant and intriguing features of Western MS 82 is that it opens with Notitia provinciarum ecclesiasticarum Galliae, a list of metropolitan cities and provinces of Gaul

.MS 82 picture 2

We do not have the first half of the list as a result of the missing first folio. Nevertheless, the surviving second half of the document, as well as the very fact that the compiler(s) of Western MS 82 chose to incorporate this document, raises many questions and invites further examination. This document, also known as Notitia Galliarum, was originally composed between the late fourth and the very early fifth century, before the massive Germanic invasions by the end of the first decade of the fifth century that isolated much of Gaul from the remainder of the fracturing Western Empire. There is still wide debate as to whether Notitia Galliarum was initially created out of administrative interests by a political institution or by a local church. Nevertheless, the earliest manuscript of this document, dating to the seventh-century, contains in its rubric “ut ordo exposcit pontificum” suggesting that at least since the early Middle Ages, Notitia Galliarum has been regarded and employed as a religious document mapping an ecclesiastical space

.MS 82 picture 3

The Notitia appears in several medieval canonical collections. There are, nevertheless, several peculiar facts about the specific version included in Western MS 82. In the first place, of all the surviving manuscripts of Collectio Sinemuriensis, only Western MS 82 incorporates this document. Further, while versions appears in other texts have been updated, this version appears to have retained its late antique form almost entirely. In other words, it was not updated to represent the actual provincial configuration of early twelfth century France. Only two new cities, civitas Nivernensium and civitas Nundunum, were added to what is found in the oldest extant version, included in a seventh-century manuscript. Some cities included in the list were not actually not bishoprics during the eleventh to the twelfth century, e.g., civitas Oscismorum, civitas Diablintum, civitas Bononensium and civitas Tungrorum. Why did the patron of Western MS 82 want such a “dated” text to stand at the beginning of the manuscript? Why did he invoke the ancient divisions of Gaul? What kind of a “conceptual territory” did he envision this canonical manuscript to represent? There may never be precise answers. Two hypotheses, however, could be employed to conjecture.

The first possibility is that this text is a product of territorial conflicts in northern France between ecclesiastical institutions. These were certainly common in the high Middle Ages. The patron of Western MS 82 might have requested the incorporation of old Roman texts, such as Notitia Galliarum, to buttress his territorial claims. The second possibility is that this ancient document, together with the lands of Gaul that it delineates, may have nothing to do with the real space, but that the patron of Western MS 82, by incorporating in the manuscript an ancient text that depicts the administrative system of the area, sought to invoke a sense of authority.

Both of these possibilities point to an emphasis on the authority and legitimacy manifested through antiquitas. Such emphasis is further accentuated in this manuscript through canons like the one that opens the second book (ff. 49v – 50r), where the initial letter P is exaggeratedly decorated (twenty-one lines high with a form of a bird or dragon vomiting tendrils and flowers). The red-ink rubric of the canon reads “Quod non liceat apostolicis successoribus constituta predecessorum infringere.” This canon, possibly drafted by Hincmar of Reims, is ascribed to Pope Symmachus (r. 498-514) and prohibits the successors of the ancient popes from abrogating the administrative and legal decisions of their predecessors.

MS 82 picture 1

At the same time, Western MS 82, through this specific canon, appears to have employed antiquitas to suppress, rather than buttress, the legislative power and ecclesiastical rights of the contemporary papacy. Such a feature would seem to distinguish this canon law manuscript from the kind of canonical collection likely to be favored by the reform papacy of the eleventh and the twelfth century, which generally asserted the intrinsic juridical power of the popes. Hence, modern scholars’ widely shared view of this canon law manuscript being essentially a product of the Gregorian Reform may have oversimplified the character of Western MS 82. Last but not least, this canon, together with other canons in this manuscript, appears to indicate a connection between Western MS 82 and Reims. These facts about this manuscript seem point us to the historical political tension between the archiepiscopal see Reims and the papacy, and the power relations between Rome and Reims during the Middle Ages.

The digital transcription of Western MS 82 will hopefully make this this manuscript more accessible to such investigation. The framing of the transcription with XMLtags in accordance with the rules of the Text Encoding Initiative will also hopefully enable users of the project to easily navigate and position themselves within the canons, to search for specific terms and variants within the codex and retrieve data in a more orderly fashion, and eventually to more easily conduct comparisons between Western MS 82 and the other surviving manuscripts of Collectio Sinemuriensis.

In closing, I should note that my project of updating the Papal Documents finding aid is under its final review. Most of the entries now have an annotation which introduces and summarizes the work. In addition, structure of the whole document has been adjusted to enhance its navigability. The “Background Bibliography” part has been considerably augmented, in order to help researchers and students to grasp the significance of individual works, and the history and terminology relevant to the study of papal documents.