Beautiful Stories From Anonymous People (a.k.a. Beautiful / Anonymous) is a podcast with a simple premise: "One phone call. One hour. No names. No holds barred." Created by comedian Chris Gethard in 2016, every episode features Gethard and an anonymous caller, who phones in to the show for a one-hour conversation. The caller can hang up at any time, but Gethard must stay on the line for the full hour.
What began as an experimental idea has evolved into a rich show that captures moments of real human connection. Sometimes these moments are funny, as one might expect from a podcast hosted by a comedian. More often, though, they hit different notes. Conversations span a wide variety of topics: anxiety about the future, the death of a loved one, relationship struggles and triumphs, and, sometimes, the regularity of one's bowel movements.
In a society that often feels increasingly disconnected, these weekly calls are a helpful reminder of our shared humanity.
It probably goes without saying that I'm a fan of the podcast, and Gethard's work in general. He wears many hats on the show: comedian, confidant, empathizer, friend. But if there's one thing Gethard is not, it's a lover of mathematics.
Gethard's negative math feelings are well-documented. Even his listeners are aware of it. For example, in one episode Gethard chats with a deaf caller via the aid of a sign language interpreter. At one point, the topic of lipreading comes up. Here's what the caller has to say (emphasis added):
I'm actually a pretty bad lipreader, even though I do use it. That's something with lipreading, that's another pet peeve that a lot of deaf people have, is expecting to lipread. I mean, we do it out of survival, but I almost look at lipreading the way that you talk about how much you hate math.
This disdain for mathematics even creeps into Gethard's other works! In his 2018 book Lose Well, while writing about some of his experiences teaching improv classes, Gethard writes (again, emphasis added):
My students were very good, but they had developed the bad habit of turning their art form into a math problem to solve. Art can never be math.
To be honest, I can't blame him. Unfortunately, most people have had at least one traumatic math experience during their school years. But if you're like Chris Gethard — if math makes you feel varying degrees of anxiety, repulsion, existential dread, and / or rage — I'd like to invite you to turn the tables with me, and examine some of Gethard's work through a mathematical lens.
Gethard may not have a high opinion of math, but what does math have to say about him? Let's find out!
As mentioned above, one Beautiful / Anonymous episode features a deaf caller. This experience inspired Gethard to provide transcripts of the show for fans who might not be able to listen to them. In 2019, he released transcripts for 10 episodes curated as part of a beginner's guide to the podcast.
Unfortunately for Gethard, in so doing he opened a Pandora's box of material for analysis. I went through all of those transcripts and crunched the numbers, trying to answer questions like:
In what follows, I'd like to offer some answers to all of these questions. Some caveats and clarifications:
With that said, let's check out the results!
One basic question we can explore is how much each person speaks during a call. We don't have timestamps available on the transcripts, so instead we'll use word count as a proxy.
Here's how the number of words spoken by Gethard compares to the number of words spoken by the caller, for each of the ten episode transcripts:
Figure 1: Word counts per episode for Gethard and the caller. Tap or mouse over a bar for more details on the episode.
My biggest takeaway from this graph is that Gethard seems to have gotten much more comfortable creating space for his guests. He spoke the most in the very first episode: 6,394 words, more than 67% of the total.
Overall, though, his guests tend to say more than he does. Gethard only speaks a majority of the words in three of these ten episodes, and across the ten transcribed episodes his share of the words stands at around 44%. If we exclude his particularly chatty first episode, his share drops to 42%.
Another thing we can do with these transcripts is analyze the sentiment of the text. As people are talking, do their sentences have a positive sentiment, a negative sentiment, or a neutral sentiment?
This may seem like a difficult thing to pin down, especially if we'd like to automate the analysis. Some statements are straightforward: we can probably agree that "I love you" should count as having positive sentiment, and "I hate you" should have negative sentiment.
But what about a sentence like "Sometimes I love you, and sometimes I hate you." Is this positive? Negative? Neutral? And how do you teach a computer how to assess the sentiment so that we don't have to manually do for every transcript?
Fortunately, there's an off-the-shelf solution here works well. It's called VADER Sentiment Analysis (short for Valence Aware Dictionary and sEntiment Reasoner). It's basically a giant lexicon of words along with their associated sentiment. The text was originally pulled from social media, but, according to the maintainers of the project, it "is also generally applicable to sentiment analysis in other domains."
VADER can take a text and analyze it sentence-by-sentence for positive, negative, and neutral sentiment. It spits out a few different stats, but the one we'll use gives each sentence a score between -1 and 1. The closer to 1, the more positive the sentiment; the closer to -1, the more negative the sentiment.
To make things concrete, here are some examples of Gethard statements, along with their associated sentiment score:
|Chris Gethard Quote||Sentiment Score|
|That's been my whole career for about 15 years is depressing people and calling it comedy.||-0.0258|
|Fart, bananas, potatoes needs to be a T-shirt.||0|
|I wanted to reach over the counter, and grab the person, and just fling him across the room.||0.0258|
Figure 2: Sentiment scores for different things Chris Gethard has said.
VADER's not perfect. In particular, some of the "neutral" examples don't seem all that neutral. But in the aggregate this can provide us with a useful metric for how positive or negative someone's speaking patterns might be.
Let's once again take a look at how Gethard compares to his callers. How many times do they each talk about things with positive, negative, or neutral sentiment? Here's a breakdown by sentence and by episode:
Figure 3: Sentiment counts per episode for Gethard and the caller. Use the dropdown to view different ranges of sentiment.
As you can see, positive sentiment greatly outweighs negative sentiment across episodes. And for his part, Gethard plays the extremes pretty evenly: he speaks a majority of sentences with extreme negative sentiment in six out of ten episodes, and the majority of sentences with extreme positive sentiment in six out of ten episodes.
We can take this even further by plotting the sentiment of every line in an episode's transcript. Here's how the episodes look:
Figure 4: Sentiment of every line in an episode. The larger the circle, the more words in the line.
As you can see, most circles lie in the upper half, i.e. they have positive sentiment. So if you're looking for a show that's positive on the whole, this one fits the bill.
It's become somewhat of a tradition on the show for people to apologize to Chris Gethard's mom, Sally, whenever they curse. So before we move on from sentiment, here's a related question: who owes Sally a bigger apology, Chris or his callers?
There are a number of ways to try to detect profanity in a piece of text: the one I used is called profanity-check. Using this tool, I was able to categorize every sentence in every conversation as either having profanity or not. Here are the results, again broken down by speaker:
Figure 5: Profanity counts per episode for Gethard and the caller. Tap or mouse over a bar for more details on the episode.
Sorry, Sally. It looks like your son takes the crown when it comes to profane language. In 9 out of 10 transcripts, he had the majority of sentences marked as profane. The only caller with a dirtier mouth was the Australian caller featured in the "Aussie Best Friend" episode.
Gethard may take comfort in the fact that the profanity checker is easily offended, so sentences like "I'm not trying to be a jerk," or "That sucks" both get marked as profane. While this may indicate that the above counts are too high, the ratio between Gethard and his guests is probably about right, since the profanity checker has a low bar regardless of who is speaking.
Let's move on to our next question: can we find any distinguishing characteristics of the way Gethard speaks compared to his callers?
One way to answer this question is to identify short phrases that appear relatively frequently in the transcripts. Here are the most common two-, three-, and four-word phrases spoken by Gethard and the callers as a group:
|going to (said 160 times)||I was (said 291 times)|
|want to (said 144 times)||and I (said 270 times)|
|and I (said 126 times)||it was (said 199 times)|
|I think (said 107 times)||going to (said 197 times)|
|of the (said 106 times)||a lot (said 177 times)|
|to be (said 103 times)||I don't (said 175 times)|
|in the (said 98 times)||I think (said 174 times)|
|I don't (said 96 times)||kind of (said 159 times)|
|this is (said 93 times)||to be (said 136 times)|
|a lot (said 93 times)||lot of (said 131 times)|
|have to (said 92 times)||and then (said 127 times)|
|on the (said 80 times)||of the (said 123 times)|
|do you (said 75 times)||in the (said 121 times)|
|lot of (said 75 times)||you know (said 106 times)|
|that you (said 74 times)||don't know (said 102 times)|
|and then (said 71 times)||that I (said 96 times)|
|I was (said 65 times)||but I (said 95 times)|
|I have (said 64 times)||so I (said 91 times)|
|in a (said 64 times)||have to (said 90 times)|
|sounds like (said 63 times)||I mean (said 88 times)|
Figure 6: Common phrases for Gethard and his callers. Phrases are restricted to the twenty most frequent, and each phrase must have appeared an average of at least once per episode.
One difference you may notice is how often common phrases include the word "I." The data suggests that Gethard is good at centering the conversation on the caller and their experiences. Callers say "I" in eight of their top twenty most common two-word phrases, but Gethard only says "I" in five of his.
The phrases in the table above are common to Gethard and his callers, but they're also just plain common. Knowing that Gethard says "and I" 126 times across these ten episode transcripts doesn't really tell us that much about what makes Gethard sound like Gethard.
Another way we can study speech patterns in these transcripts is through collocations, a linguistic term for sequences of words that occur disproportionately often.
Here are the two-word collocations for Chris Gethard and his callers:
|sounds like||don't know|
|feel like||yeah yeah|
|little bit||little bit|
|don't know||feel like|
|don't want||mmhmm affirmative|
|bank tech||high school|
|new york||don't want|
|high school||I'll tell|
|minutes left||mental health|
|would imagine||every day|
|bromance test||I've got|
|yeah yeah||you've got|
|that's cool||even though|
|sign language||last night|
|york city||social media|
|year old||wood stove|
|I've never||uncanny valley|
|make sure||what's going|
|Ron Paul||junior year|
|uncanny valley||electric blanket|
Figure 7: Collocations for Gethard and his callers.
A few observations:
Collocations do a better job at telling us about some unique topics that come up on Beautiful / Anonymous. But, except maybe for the way Gethard seems to talk about New York a lot, they don't really answer the original question of how we can distinguish between what Gethard says and what his callers say. For this, we need a slightly different approach.
Collocations tell us whether a sequences of words appears more often than is typical. But we're not interested in comparing word frequency to an absolute average; we're interested in comparing Gethard's word frequency to the average frequency of his callers. This is a slightly different problem, since, for instance, his callers may themselves bring up topics more often than is typical (high school, for example).
So how can we do this? One way is to try to find linguistic tells that will accurately predict when Chris is talking as opposed to one of his callers.
This is a common task within the field of machine learning. Without taking us too far afield, here's the idea: basically, you tell a computer about certain features of a text that you think may help predict the speaker. For example, Chris Gethard may be more likely to talk about New York and New Jersey than his callers.
You can give the computer dozens or hundreds of these features to consider, as well as a test set of data to look at (basically a random subset of all of the transcripts). The computer will then learn how to best optimize and balance these features so that, given a new line, it has a high probability of being able to correctly guess who said it.
I threw the kitchen sink at my computer in terms of features I thought might have some predictive power. In the end, this yielded an algorithm that could accurately predict whether or not Gethard was the speaker of a random line with about 70% accuracy. Here are the top twenty most informative features for one of my training sessions:
Figure 8: Twenty most informative features for distinguishing between Gethard and his callers.
The numbers in the above chart indicate the strength of each predictor. For example, a line where the first word is "wow" is 12.9 times more likely to have come from Gethard, while a line containing the word "definitely" is 14.1 times more likely to have come from one of his callers.
The data here is consistent with what we've already seen. Some observations:
One last thing before we wrap up. We've used computers to search for predictors of Gethard's speech patterns, but can we go further? Can we train a machine to talk like Chris Gethard?
I tried, and you can be the judge.
I took all of Gethard's lines from the ten conversations, and trained a text-generating neural network on them. Want to see how I did? Let's play a game.
Figure 9: Chris vs. Computer: who said it? This 10 question quiz pulls from 24 examples, so you can play it a few times to see different sentences.
How did you do? If you didn't get 100%, don't worry. The robots aren't coming for Gethard's job anytime soon.
I intentionally selected the cream of the algorithm-generated crop. There was plenty of text that was more clearly nonsense. Here's an example:
It sounds like you're going to look into it. I didn't want to have to ask. I had to know your dad gets an except and then also an environment while there in 1996 and when you say it too more about that. You must be a good point. I think that's a tragic baby. I listen for her new life minutes left.
And another one:
Oh, wow. Now, I feel like me. I don't want to make it about me in a way that he never counting the deaf community to me on the started point of things out loud about it, and then you are a strength to the guy everywhere.
You and your husband are like, "Why is your kids?"?
Having crunched the numbers, the verdict is in. Mathematics has a pretty positive view of Gethard, even if the feelings aren't reciprocated.
He walks the walk by refraining from talking the talk. He gives space to his callers, frames the conversation around them, and maintains an overall positive sentiment. This is borne out in his conversational tells (his penchant for asking questions, favoring "your" over "my", etc).
Gethard frequently talks about his show as giving a platform for everyday people, and the data backs up this claim.If you haven't listened to the show, I'd encourage you to give it a go. At its best, it captures what it means to be human at this point in history. Each episode is its own master class in listening and empathy.
Though, as I think his mother would agree, Gethard could cut down on the profanity a bit.
A Beginner's Guide to Beautiful Anonymous, by Chris Gethard.
Building a Better Profanity Detection Library with scikit-learn, by Victor Zhou.
Natural Language Processing with Python – Analyzing Text with the Natural Language Toolkit, by Steven Bird, Ewan Klein, and Edward Loper.
Vader Sentiment on GitHub.
Exploring American voting trends in the 21st century.
Mathematical models of toxic tech culture.
Using data to study what makes a wedding work.