Digital Humanities at Tufts: How Data Mining has Revolutionized the Study of Classics

The application of computer science to classics has permitted researchers to confirm whether Plato actually wrote the texts attributed to him. Flickr/lentina_x.

The application of computer science to classics has permitted researchers to confirm whether Plato actually wrote the texts attributed to him. lentina_x/Flickr.

 

Digital Humanities is the use of computational tools to uncover new information from various topics in the humanities. In an interdisciplinary venture, the Tufts University Classics Department is now utilizing these tools to learn about a time far removed from the technology-driven 21st century.

The study of classics usually involves close reading: the careful study and interpretation of text passages. By digitizing these texts, powerful data mining techniques can be employed to analyze large quantities of data at a time, allowing classics researchers to uncover patterns and information that have, until now, been inaccessible to them. In addition to providing machine-generated knowledge, the digitization of classical texts allows greater accessibility to information. Researchers can now easily compare classical data sets spanning centuries and large geographical expanses; since scholars have discovered thousands of Greek inscriptions and translated them, perhaps in multiple versions, into highly disparate and specialized publications, the streamlining and heightened accessibility from these databases prove invaluable to the field.


Powerful data mining techniques can be employed to analyze large quantities of data at a time, allowing classics researchers to uncover patterns and information that have, until now, been inaccessible to them.


One method of encoding classical texts is using treebanks, which are visual representations of the syntactical structure of a sentence that map the relationships between words, resulting in data sets that may then be mined. Dr. Marie-Claire Beaulieu, a professor of classics at Tufts who is currently applying computational tools to her work, employs treebanks to analyze the rhythmic structure of ancient Greek poetry found in inscriptions from the 4th century BCE. She explains that, by using treebank analysis, “you can actually have full transparency, go right back to the data at any point and verify those data and also produce an argument that does not rely on rhetoric, rather, [one that] is data driven.”

Classics researchers employ data mining in their analysis of concepts like stylometry and text reuse. Stylometry, the analysis of linguistic style, is often used to attribute authorship to documents. “People have established that some of Plato’s dialogues were not actually by Plato,” Beaulieu tells Enigma, “so they calculate how he uses certain grammatical constructions, certain wordings, and compare texts that are established as Plato’s with texts that are doubtful. Doing this manually would be quite an enterprise – people have done it for centuries – but doing it in a computerized way gives some interesting results and, once again, data driven results.” Additionally, data mining techniques are being used to identify patterns of text reuse, as when ancient authors quote or allude to other texts in their own works, in classical texts. This involves writing software that can analyze large numbers of digitized texts and recognize when one is quoting or alluding to another. By tracing instances of text reuse, scholars can gain greater insight into the classical tradition, or how younger generations have been influenced by classical literature and culture.


 You can actually have full transparency, go right back to the data at any point and verify those data and also produce an argument that does not rely on rhetoric, rather, [one that] is data driven.


 

The Perseus Digital Library at Tufts is a digital platform that amasses such digitized classical texts. Perseus has been in the making since 1985, when Dr. Gregory Crane, Tufts professor of classics and computer science and current Editor in Chief of the Digital Library, began its development. It catalogs ancient texts primarily pertaining to the history, culture and literature of ancient Greece and Rome. Dr. Beaulieu, the associate editor of Perseus, co-directs Perseids, a project under the Perseus Digital Library. Perseids is a collaborative online platform that enables researchers to peer-review, translate, and annotate ancient texts, allowing for a rich dialogue between scholars. “Constituting these large data sets is really enabling research at a scale that was impossible before and with a degree of accuracy that is also unprecedented,” Beaulieu says.

The Visible Worlds Project is an upcoming Digital Humanities collaboration between Tufts University, Brown University and Université Lyon II. This project, which is to be conducted in Athens, Larissa, and Thasos in May 2015, aims to train students to utilize technology in the field for Classics research. These transcriptions will then be analyzed using EpiDoc, an XML text markup for ancient documents. “The use of EpiDoc allows our transcriptions to be incredibly detailed, providing a much closer examination,” Julia Lenzi, a graduate student at Tufts who has been selected to participate in the Visible Worlds Project, says. By aggregating information contained in inscriptions from these archaeological sites, the scholars of the Visible Worlds Project hope to use data mining techniques to uncover connections between different pieces of text written over centuries. These links could then be used to interpret details about the history and the culture of the society that thrived in the area. “We’re crossing the boundaries of classics into other disciplines and showing how the ancient world is a whole connected network … [since] you can deal with the whole of the pre-modern world in connected ways,” Beaulieu explains.

Members of the Visual Worlds Project plan to go to the ancient theater of Larissa during summer 2015. Foklon Xiotakis/Flickr.

The field of Digital Humanities has also influenced the role of classics in the classroom. “Students are doing professional work, just as in the sciences:  students in computer science are producing code all the time that could be used professionally, and students in biology are doing lab work with their professors. We want to bring that same approach to the humanities,” Beaulieu says. By applying computer science to this discipline, classics students are now able to collect, edit, and annotate textual sources. Their contributions, when added to a digital library such as Perseus, are utilized by specialists in the field, giving unprecedented responsibility to student work. Beaulieu believes that this heightened responsibility is integral to the development of the field as a whole. “I think the future of Digital Humanities is in the classroom,” she says, “and it is reconnecting with our students and reorganizing our curriculum to include students in the production of knowledge. This is something that is very very important to all of us … [to understand] how do we teach classics in a meaningful way and how do we involve students in a way they have never been involved before in the humanities.”

Mastering the technologies used in Digital Humanities can be initially challenging to classics students, who do not otherwise require technical knowledge in their field of study. Lenzi, who has been involved in Digital Humanities research at Tufts since 2013, anticipates many long-term benefits to participation in these projects: “By learning new technological skills, classics students at Tufts are able to become better interdisciplinary learners and are able to engage with new and exciting platforms for examining texts. I believe these skills will distinguish us once we graduate from Tufts.” Beaulieu agrees, noting that “Tufts really is a leader in this field, especially in digital classics. And, yes, it is very exciting to be here because we’re on the front lines … the challenge right now might be to navigate just how rich this field is becoming.”


 

 

Sarah Kalinowski
Lead Editor.
Sarah Kalinowski is a senior majoring in biopsychology and minoring in cognitive and brain sciences. She can be reached at sarah.kalinowski@tufts.edu.
Anu Gamage
Staff Writer.

Leave a Reply

Your email address will not be published. Required fields are marked *