Beautiful / Analysis

Beautiful Stories From Anonymous People (a.k.a. Beautiful / Anonymous) is a podcast with a simple premise: "One phone call. One hour. No names. No holds barred." Created by comedian Chris Gethard in 2016, every episode features Gethard and an anonymous caller, who phones in to the show for a one-hour conversation. The caller can hang up at any time, but Gethard must stay on the line for the full hour.

What began as an experimental idea has evolved into a rich show that captures moments of real human connection. Sometimes these moments are funny, as one might expect from a podcast hosted by a comedian. More often, though, they hit different notes. Conversations span a wide variety of topics: anxiety about the future, the death of a loved one, relationship struggles and triumphs, and, sometimes, the regularity of one's bowel movements.

In a society that often feels increasingly disconnected, these weekly calls are a helpful reminder of our shared humanity.

It probably goes without saying that I'm a fan of the podcast, and Gethard's work in general. He wears many hats on the show: comedian, confidant, empathizer, friend. But if there's one thing Gethard is not, it's a lover of mathematics.

Gethard's negative math feelings are well-documented. Even his listeners are aware of it. For example, in one episode Gethard chats with a deaf caller via the aid of a sign language interpreter. At one point, the topic of lipreading comes up. Here's what the caller has to say (emphasis added):

I'm actually a pretty bad lipreader, even though I do use it. That's something with lipreading, that's another pet peeve that a lot of deaf people have, is expecting to lipread. I mean, we do it out of survival, but I almost look at lipreading the way that you talk about how much you hate math.

This disdain for mathematics even creeps into Gethard's other works! In his 2018 book Lose Well, while writing about some of his experiences teaching improv classes, Gethard writes (again, emphasis added):

My students were very good, but they had developed the bad habit of turning their art form into a math problem to solve. Art can never be math.

To be honest, I can't blame him. Unfortunately, most people have had at least one traumatic math experience during their school years. But if you're like Chris Gethard — if math makes you feel varying degrees of anxiety, repulsion, existential dread, and / or rage — I'd like to invite you to turn the tables with me, and examine some of Gethard's work through a mathematical lens.

Gethard may not have a high opinion of math, but what does math have to say about him? Let's find out!

Beautiful Basics

As mentioned above, one Beautiful / Anonymous episode features a deaf caller. This experience inspired Gethard to provide transcripts of the show for fans who might not be able to listen to them. In 2019, he released transcripts for 10 episodes curated as part of a beginner's guide to the podcast.

Unfortunately for Gethard, in so doing he opened a Pandora's box of material for analysis. I went through all of those transcripts and crunched the numbers, trying to answer questions like:

How much talking does Gethard do compared to his callers?
How does the sentiment of a typical call change? In general are the conversations positive, negative, or neutral in their content?
Given a random line from a transcript, what are some clues that the line came from Gethard as opposed to someone calling in?
To what extent can we train a computer to talk like Chris Gethard?

In what follows, I'd like to offer some answers to all of these questions. Some caveats and clarifications:

Gethard opens and closes every episode with a few minutes of talking directly to the listener. He also reads ads for a few minutes. I've removed all the intro, outro, and ad text from this analysis. The only text I've considered here is the text that comes from conversation with the caller.
Because I only have ten episodes to analyze, rather than the entire library (which, as of this writing, numbers over 170 episodes), some of the results you'll see here are biased by the small sample size. More on this below.

With that said, let's check out the results!

Conversation Domination

One basic question we can explore is how much each person speaks during a call. We don't have timestamps available on the transcripts, so instead we'll use word count as a proxy.

Here's how the number of words spoken by Gethard compares to the number of words spoken by the caller, for each of the ten episode transcripts:

Word counts per episode

Chris

Caller

Word counts per episode for Gethard and the caller. Tap or mouse over a bar for more details on the episode.

My biggest takeaway from this graph is that Gethard seems to have gotten much more comfortable creating space for his guests. He spoke the most in the very first episode: 6,394 words, more than 67% of the total.

Overall, though, his guests tend to say more than he does. Gethard only speaks a majority of the words in three of these ten episodes, and across the ten transcribed episodes his share of the words stands at around 44%. If we exclude his particularly chatty first episode, his share drops to 42%.

Getting Sentimental

Another thing we can do with these transcripts is analyze the sentiment of the text. As people are talking, do their sentences have a positive sentiment, a negative sentiment, or a neutral sentiment?

This may seem like a difficult thing to pin down, especially if we'd like to automate the analysis. Some statements are straightforward: we can probably agree that "I love you" should count as having positive sentiment, and "I hate you" should have negative sentiment.

But what about a sentence like "Sometimes I love you, and sometimes I hate you." Is this positive? Negative? Neutral? And how do you teach a computer how to assess the sentiment so that we don't have to manually do for every transcript?

Fortunately, there's an off-the-shelf solution here works well. It's called VADER Sentiment Analysis (short for Valence Aware Dictionary and sEntiment Reasoner). It's basically a giant lexicon of words along with their associated sentiment. The text was originally pulled from social media, but, according to the maintainers of the project, it "is also generally applicable to sentiment analysis in other domains."

VADER can take a text and analyze it sentence-by-sentence for positive, negative, and neutral sentiment. It spits out a few different stats, but the one we'll use gives each sentence a score between -1 and 1. The closer to 1, the more positive the sentiment; the closer to -1, the more negative the sentiment.

To make things concrete, here are some examples of Gethard statements, along with their associated sentiment score:

Sentiment Filter

Chris Gethard Quote	Score
"That's been my whole career for about 15 years is depressing people and calling it comedy."	-0.0258
"Fart, bananas, potatoes needs to be a T-shirt."	0.0000
"I wanted to reach over the counter, and grab the person, and just fling him across the room."	0.0258

Sentiment scores for different things Chris Gethard has said.

VADER's not perfect. In particular, some of the "neutral" examples don't seem all that neutral. But in the aggregate this can provide us with a useful metric for how positive or negative someone's speaking patterns might be.

Let's once again take a look at how Gethard compares to his callers. How many times do they each talk about things with positive, negative, or neutral sentiment? Here's a breakdown by sentence and by episode:

Sentiment Range

Sentiment counts per episode

Chris

Caller

Sentiment counts per episode for Gethard and the caller. Use the dropdown to view different ranges of sentiment.

As you can see, positive sentiment greatly outweighs negative sentiment across episodes. And for his part, Gethard plays the extremes pretty evenly: he speaks a majority of sentences with extreme negative sentiment in six out of ten episodes, and the majority of sentences with extreme positive sentiment in six out of ten episodes.

We can take this even further by plotting the sentiment of every line in an episode's transcript. Here's how the episodes look:

Sentiment Changes During Episode

Chris

Caller

Sentiment of every line in an episode. The larger the circle, the more words in the line.

As you can see, most circles lie in the upper half, i.e. they have positive sentiment. So if you're looking for a show that's positive on the whole, this one fits the bill.

Sorry Sally

It's become somewhat of a tradition on the show for people to apologize to Chris Gethard's mom, Sally, whenever they curse. So before we move on from sentiment, here's a related question: who owes Sally a bigger apology, Chris or his callers?

There are a number of ways to try to detect profanity in a piece of text: the one I used is called profanity-check. Using this tool, I was able to categorize every sentence in every conversation as either having profanity or not. Here are the results, again broken down by speaker:

Profanity counts per episode

Chris

Caller

Profanity counts per episode for Gethard and the caller. Tap or mouse over a bar for more details on the episode.

Sorry, Sally. It looks like your son takes the crown when it comes to profane language. In 9 out of 10 transcripts, he had the majority of sentences marked as profane. The only caller with a dirtier mouth was the Australian caller featured in the "Aussie Best Friend" episode.

Gethard may take comfort in the fact that the profanity checker is easily offended, so sentences like "I'm not trying to be a jerk," or "That sucks" both get marked as profane. While this may indicate that the above counts are too high, the ratio between Gethard and his guests is probably about right, since the profanity checker has a low bar regardless of who is speaking.

Whose Line is it Anyway?

Let's move on to our next question: can we find any distinguishing characteristics of the way Gethard speaks compared to his callers?

One way to answer this question is to identify short phrases that appear relatively frequently in the transcripts. Here are the most common two-, three-, and four-word phrases spoken by Gethard and the callers as a group:

2-word phrases

Chris	Caller
going to (said 160 times)	I was (said 291 times)
want to (said 144 times)	and I (said 270 times)
and I (said 126 times)	it was (said 199 times)
I think (said 107 times)	going to (said 197 times)
of the (said 106 times)	a lot (said 177 times)
to be (said 103 times)	I don't (said 175 times)
in the (said 98 times)	I think (said 174 times)
I don't (said 96 times)	kind of (said 159 times)
this is (said 93 times)	to be (said 136 times)
a lot (said 93 times)	lot of (said 131 times)
have to (said 92 times)	and then (said 127 times)
on the (said 80 times)	of the (said 123 times)
do you (said 75 times)	in the (said 121 times)
lot of (said 75 times)	you know (said 106 times)
that you (said 74 times)	don't know (said 102 times)
and then (said 71 times)	that I (said 96 times)
I was (said 65 times)	but I (said 95 times)
I have (said 64 times)	so I (said 91 times)
in a (said 64 times)	have to (said 90 times)
sounds like (said 63 times)	I mean (said 88 times)

Common phrases for Gethard and his callers. Phrases are restricted to the twenty most frequent, and each phrase must have appeared an average of at least once per episode.

One difference you may notice is how often common phrases include the word "I." The data suggests that Gethard is good at centering the conversation on the caller and their experiences. Callers say "I" in eight of their top twenty most common two-word phrases, but Gethard only says "I" in five of his.

The phrases in the table above are common to Gethard and his callers, but they're also just plain common. Knowing that Gethard says "and I" 126 times across these ten episode transcripts doesn't really tell us that much about what makes Gethard sound like Gethard.

Another way we can study speech patterns in these transcripts is through collocations, a linguistic term for sequences of words that occur disproportionately often.

Here are the two-word collocations for Chris Gethard and his callers:

Chris	Caller
sounds like	don't know
feel like	yeah yeah
little bit	little bit
don't know	feel like
don't want	mmhmm affirmative
bank tech	high school
new york	don't want
high school	I'll tell
minutes left	mental health
would imagine	every day
bromance test	I've got
yeah yeah	you've got
that's cool	even though
sign language	last night
york city	social media
year old	wood stove
I've never	uncanny valley
make sure	what's going
Ron Paul	junior year
uncanny valley	electric blanket

Collocations for Gethard and his callers.

A few observations:

Chris Gethard talks about New York City a lot.
People talk about high school even more than Gethard talks about New York City. The phrase appears at least once in nine out of the ten episodes!
Because of the small sample size, it's possible for a phrase to show up here even if it's only discussed in a single episode. Among Gethard's collocations, "Bank Tech," "bromance test," "sign language," "Ron Paul," and "uncanny valley" only appear in a single episode each.
Because none of the callers are robots, none of them actually say "mmhmm affirmative." This is a case of metadata in the transcripts messing with the analysis. Since sounds like "mmhmm" can be ambiguous, the transcripts include a parenthetical describing what the sound means. In other words, the collcation "mmhmm affirmative" corresponds to something like "Mm-hmm (affirmative)" in the transcripts.

Collocations do a better job at telling us about some unique topics that come up on Beautiful / Anonymous. But, except maybe for the way Gethard seems to talk about New York a lot, they don't really answer the original question of how we can distinguish between what Gethard says and what his callers say. For this, we need a slightly different approach.

Transcript Training

Collocations tell us whether a sequences of words appears more often than is typical. But we're not interested in comparing word frequency to an absolute average; we're interested in comparing Gethard's word frequency to the average frequency of his callers. This is a slightly different problem, since, for instance, his callers may themselves bring up topics more often than is typical (high school, for example).

So how can we do this? One way is to try to find linguistic tells that will accurately predict when Chris is talking as opposed to one of his callers.

This is a common task within the field of machine learning. Without taking us too far afield, here's the idea: basically, you tell a computer about certain features of a text that you think may help predict the speaker. For example, Chris Gethard may be more likely to talk about New York and New Jersey than his callers.

You can give the computer dozens or hundreds of these features to consider, as well as a test set of data to look at (basically a random subset of all of the transcripts). The computer will then learn how to best optimize and balance these features so that, given a new line, it has a high probability of being able to correctly guess who said it.

I threw the kitchen sink at my computer in terms of features I thought might have some predictive power. In the end, this yielded an algorithm that could accurately predict whether or not Gethard was the speaker of a random line with about 70% accuracy. Here are the top twenty most informative features for one of my training sessions:

Twenty most informative features for distinguishing between Gethard and his callers.

The numbers in the above chart indicate the strength of each predictor. For example, a line where the first word is "wow" is 12.9 times more likely to have come from Gethard, while a line containing the word "definitely" is 14.1 times more likely to have come from one of his callers.

The data here is consistent with what we've already seen. Some observations:

Lines that center the conversation on the other person are more likely to come from Gethard. Notice that when the most common word in a line is "your" or when the first word is "you're", the speaker is much more likely to be Gethard; on the other hand, his guests are much more likely to have the first or most common word in a line by "my."
Similar to 1., notice that lines with "what's" in them are much more likely to be spoken by Gethard. Perhaps his focus on callers implies that he tends to ask more questions than they do.
Wow, Chris Gethard loves saying "wow!"

RoboGeth

One last thing before we wrap up. We've used computers to search for predictors of Gethard's speech patterns, but can we go further? Can we train a machine to talk like Chris Gethard?

I tried, and you can be the judge.

I took all of Gethard's lines from the ten conversations, and trained a text-generating neural network on them. Want to see how I did? Let's play a game.

Chris vs. Computer: who said it? This 10 question quiz pulls from 24 examples, so you can play it a few times to see different sentences.

How did you do? If you didn't get 100%, don't worry. The robots aren't coming for Gethard's job anytime soon.

I intentionally selected the cream of the algorithm-generated crop. There was plenty of text that was more clearly nonsense. Here's an example:

It sounds like you're going to look into it. I didn't want to have to ask. I had to know your dad gets an except and then also an environment while there in 1996 and when you say it too more about that. You must be a good point. I think that's a tragic baby. I listen for her new life minutes left.

And another one:

Oh, wow. Now, I feel like me. I don't want to make it about me in a way that he never counting the deaf community to me on the started point of things out loud about it, and then you are a strength to the guy everywhere.

And finally:

You and your husband are like, "Why is your kids?"?

Conclusion

Having crunched the numbers, the verdict is in. Mathematics has a pretty positive view of Gethard, even if the feelings aren't reciprocated.

He walks the walk by refraining from talking the talk. He gives space to his callers, frames the conversation around them, and maintains an overall positive sentiment. This is borne out in his conversational tells (his penchant for asking questions, favoring "your" over "my", etc).

Gethard frequently talks about his show as giving a platform for everyday people, and the data backs up this claim.If you haven't listened to the show, I'd encourage you to give it a go. At its best, it captures what it means to be human at this point in history. Each episode is its own master class in listening and empathy.

Though, as I think his mother would agree, Gethard could cut down on the profanity a bit.

Sources:

A Beginner's Guide to Beautiful Anonymous, by Chris Gethard.
Building a Better Profanity Detection Library with scikit-learn, by Victor Zhou.
Natural Language Processing with Python – Analyzing Text with the Natural Language Toolkit, by Steven Bird, Ewan Klein, and Edward Loper.
Sentiment analysis with NLTK / VADER — Comments on Lee Hsien Loong’s Facebook post, by Sharon Woo.
Simplifying Sentiment Analysis using VADER in Python (on Social Media Text), by Parul Pandey.
Vader Sentiment on GitHub.

Beautiful / Analysis

August 2019 · 16 min read

Beautiful Basics

Conversation Domination

Word counts per episode

Getting Sentimental

Sentiment counts per episode

Sentiment Changes During Episode

Sorry Sally

Profanity counts per episode

Whose Line is it Anyway?

Transcript Training

RoboGeth

Conclusion

Here are some other stories you may like:

Strength in Numbers

April 2019 - 12 minute read

Dishing on Petrie

April 2018 - 12 minute read

Four (Hundred and Thirty-Two) Weddings

March 2018 - 11 minute read