The Statistics of Tufts Confessions

There’s a special part of Facebook where Jumbo secrets echo out into the internet; where serious personal anecdote meets dim-witted college bravado, where weird late night stories in the ZBT basement inscribe a public wall, and where the unheard truths of otherwise light-hearted, studious pre-meds surface to world.

I hate how you’re just born out of nowhere, forced to go to school and get an education so you can get a job. What if I wanted to be a duck?

– Anonymous

On an average day, there are about 60 confessions broadcasted on the Tufts Confessions Facebook page. The page receives about 1132 confessions in a given month and since its creation in 2013, it has posted nearly 25,000 anonymous confessions of Tufts students. Clearly, we have a lot to say. But what are we all talking about and is it important?

Patterns of Confessional Language

From a bird’s eye view, Tufts confessions pretty much look like what you think they’d look like.

The most common phrases rank the types of confessions made: as a whole, people are confessing about their feelings, their sexual encounters, their romantic lives and their experiences of novelty, interestingly in that order of frequency. To put it another way, by numbers alone, confessions describing fleeting desires or encounters are more popular than those recounting lasting emotional experiences. Can you say millennial?

On the other hand, pointwise mutual information measures the distinct subjects of conversation – and it’s hardly surprising that we’re talking about fossil fuels, COMP 40, (Winter Bash… never forget) bi-curiosity and race. If someone knew nothing about this population of people, contextual information gives an accurate overview. Even under anonymity, we are exactly who we think we areOur conversations in secret and in daylight are both so very.. Tuftsy. So what’s all so interesting about our confessions?

Sentiment analysis tells us the (literal) emotional shape of our confessions – bell-curved, but slightly left-leaning. It makes sense that our sentiments aren’t perfectly centered – Tufts Confessions is no stranger to “negative sentiment”. But there is an important gradient between negative and positive polarity, between “the worst part” and “the most beautiful,”: contextual uncertainty and social overload. These are phrases like “I don’t know”, “I’m not sure” and social scale “lots of people”, “so many people” expressed with the same sentiment as context – “at this school”. As students at this school, we are facing both uncertainty and overload in our social environment.

The Truth About Our Confessions

Here’s a statistical breakdown of the latent topics in Tufts Confessions – they are quite telling (trigger warning):

Most of these word patterns may seem obvious, but some are truly intriguing. “ATO” appears under the topic labels of both ‘Someone Hot’… and ‘Sexual Assault’ . The word “marriage” co-occurs with words like “skinny,” “eating” and “obsessed” in the topic of ‘Body.’ But above all, one thing truly pops out. Loneliness is thematically more frequent in our confessions than body image, sexual assault, partying and academic life combinedForget everything you know about campus social issues – one thing statistically underlies all of them. And this is definitely not cooked up.

Even so, surely these topics / problems aren’t mutually exclusive? For example, how does loneliness in a confession relate to other themes like body image? It turns out, we can actually model this. Like moving weather systems, pixels in YouTube videos, and the internet, natural language is a dynamic system.Think of a confession as a series of probabilistic transitions between topics (says the Markov model of natural language). Beyond just a string of words about orgies and toilet paper, a personal confession is a flow of natural language through different ‘confessional topics’. Not surprisingly, the mention of a particular topic most typically leads back into itself – suggesting a self-contained discourse on a particular issue.

However, it is worth noting that besides these self-loops, ‘loneliness’ is the most probabilistically likely state transition in this topic network. Fascinating (in a sad way).

It’s Only Human…

While these data insights are interesting, it’s important to remember the inherent subjectivities in interpreting human data. Because models are just that – interpretations of an imperfect human world (hopefully through non-sketchy, analytical methods). Are my statistics 100% spot-on? Absolutely not. Am I lying to you? Up to you to decide. Either way, this is the first data-driven observation through the looking glass of hidden truths in our personal confessions (which I hope engenders some discussion) – more to come soon.

This post was originally published on our “Tufts Trends” blog on February 28, 2015.

Soubhik Barari on twitterSoubhik Barari on linkedinSoubhik Barari on instagram
Soubhik Barari
Soubhik Barari is a senior majoring in Computer Science and Mathematics. He can be reached at

1 Comment

Leave a Reply

Your email address will not be published. Required fields are marked *