Why We Need Digital Hermeneutics
I recently started teaching the sixth iteration of my course Digital Hermeneutics as a part of Berkeley’s Digital Humanities Summer Minor offered by Arts and Humanities and the Social Sciences D-Lab (Data Lab). The course teaches the practices of data science, and text analysis in particular, within the history of hermeneutics. Hermeneutics is just a fancy word for interpretation, both in the practical sense of a sustained methodological effort, and in the philosophical sense of asking what interpretation is in the first place.
As humanists have been dealing with these questions for millennia, hermeneutics provides a rich framework for understanding how we interpret, understand, and ascribe meaning to texts—and more broadly, the world around us. In a Western context, the term comes from the Greek word "hermēneuein," meaning to interpret. It is also related to Hermes, the messenger god in Greek mythology who served as a translator between the divine and human realms and was considered the gods' interpreter.
Being a mediator gave Hermes a significant degree of power. He is often depicted as a cunning negotiator using his wit and charm to outsmart others. As an intermediary between the gods and humans, he could deliver messages that contained hidden meanings or trickery. In other words, interpretation was from the very start posed as a problem. How do we know whether Hermes was telling the truth? Or, put more broadly: how do we know which interpretation is the correct one?
Throughout the ages, these questions have been necessitated by different historical developments. Thinking about hermeneutics, after all, means asking the question: when is something important enough to interpret? When religion dictated social life, we wanted to interpret the Bible. When the law became the central organizing principle of society, lawyers were needed to interpret it. And as the Romantic era shifted our focus from God as the divine creator to the genius of human authors, the literary critic emerged to interpret their art.
Each of these eras led to the development of hermeneutic approaches, with the common question being: how can we ascertain a manner of interpretation that is authoritative, sound—perhaps even scientific?
Teaching Hermeneutics Through Reddit
In Digital Hermeneutics, students draw from this history of insights when analyzing the texts that 21st-century people are most familiar with—the kind found on social media platforms. Specifically, they are looking at Reddit, a social platform that is organized around particular communities with shared interests. They are groups of people talking about specific topics, and using language in their own particular ways. Students pick a community that captures their interest. They then engage in a project that asks a hermeneutic question on two levels. How does this community interpret the world, and how can one interpret this language as a researcher?
In the process, students use text analysis tools such as topic modeling and word embeddings in order to find themes and topics in their data, and figure out which posts are the most emblematic of a theme. The goal is to learn how these analytical approaches allow you to explore data and develop research questions. It also offers a develop a critical framework to ask how they open up—or close down—particular avenues for interpretation.
When we are close-reading a post, we note different things about it than when we use computational methods to discover larger patterns and themes across posts. Moving back and forward between such close and distant readings means tracing the famous hermeneutic circle. Understanding a text as a whole is established through understanding its individual parts, and conversely, understanding each individual part is only possible in reference to the whole.
Interpretation in the Age of AI
Asking questions about interpretation seems difficult to do without referencing the fact that we now live in a time where the need and object of interpretation seem to be shifting again. This is due to the advent of generative large language models (LLMs) like ChatGPT—models that are built on supercharged versions of the tools students use in this course.
When these LLMs were built (or perhaps "grown") using large datasets, this was done with an aim similar to those that students have in my course: teaching a machine to uncover patterns and themes in human language. Big social data like that on Reddit has been instrumental in the development of AI chatbots like ChatGPT, Bard, and Claude. The computational understanding these models have of the world is built on expansive corpora of social interactions, enabling them to now offer conversational responses to a wide range of queries. The output of ChatGPT is a mixture of different hermeneutic layers: when we read it, we are interpreting a computational interpretation of human interpretations of the world.
Again, it is worth stressing that these models could not have been created without the immense amounts of data that we, as a society, have supplied over the years. AI companies such as OpenAI have used this data to train their models, and those models are now the hottest commodity in the computer science landscape. They are already used across many industries and for different tasks: from coding and design to churning out marketing texts or work emails. These models enable us to do all kinds of tasks more rapidly, and they contribute to the general intensification of production and labor. Their surging popularity also implies that we will be dealing with a whole lot more texts, often without knowing whether it was written by a human or an AI.
One response to this has been increased restrictions on language data access. The few social media platforms that allowed researchers to freely take textual data from using Application Programming Interfaces (APIs) have begun to aggressively monetize their data. Reddit and Twitter, arguably the two most popular social platforms for text analyses, now only allow access to their data with substantial costs. We are moving further from an open web to a closed web, and it will become increasingly difficult to do the kind of work that I am asking students to do with recent data.
This matters because language, like society, is constantly evolving. While AI companies will likely have the budget to pay for recent language data and grow their newer models on it, the capacity for individual researchers to do so seems to be vanishing. But such research remains fundamentally important to understand societal norms. The most popular communities students analyze in my course revolve around relationships—Am I the Asshole, Relationship Advice, and so on. These are communities discussing the boundaries of the socially and morally permissible, and are great sites to analyze the kinds of values, biases, ideologies, and myths that permeate our language. What social beliefs do we hold most strongly? When do we hurt others? When is something a red flag? Thinking deeply about these norms is especially necessary as they are encoded and perpetuated by LLMs.
This leads me back to the main question: why does interpretation matter? My hope is that, in using computational tools to analyze online discourse communities, students can untangle some of the ways in which people make sense of the world around them—and how they themselves, as researchers, can in turn make sense of those webs of meaning. That process of interpretation is a complicated, circular process that never really ends. But crucially, it allows us to reflect on the onset of linguistic automation that we are currently living through.