Who is the Leading Role in Friends TV Show?

By using the transcript of the Friends TV show, we will analyze the main roles in the show. Also, strongest emotions of the characters are studied. This is a fun way to wrangle some data and visualize it.

dplyr
friends
tidyr
ggplot2
ggimage
Author
Affiliations

Ali Emre Karagül

Published

December 5, 2024

Mailchimp Subscription Modal

Introduction

There are some TV shows out there that watchers and followers cannot decide who the main/leading character is. For instance, Game of Thrones, Friends, The Office … The examples can go a little longer. Screen time might be a good indicator of the main character. In Game of Thrones TV show, for example, Tyrion Lannister (aka the Imp) has the most screen time. That’s why I believe he is the main character of the show, instead of Daenerys Targaryen, or Jon Snow, or any other Stark kids.

(i) GoT
(ii) Friends
Figure 1

What about the Friends? The TV show “Friends” has captivated audiences for decades with its humor, relatable characters, and unforgettable moments. Have you ever wondered who the leading character of Friends is? Or how their dialogue trends change over time? Today, we will find out with data. Using the friends R package, we analyzed the show’s main characters to uncover insights into their dialogue patterns. This is a fun way to practice R, wrangle some data, and visualize it. Along with the friends package, we will use the dplyr, tidyr, ggplot2, and ggimage packages.

Getting Started

Let’s start by loading the necessary libraries and datasets.

The friends package contains three datasets: friends, friends_emotions, and friends_info. The friends dataset contains dialogue transcripts, while friends_emotions provides annotations of emotions associated with dialogues. The friends_info dataset contains metadata about the show’s episodes.

Datasets in Friends Package

friends: Contains dialogue transcripts.

friends_emotions: Annotations of emotions associated with dialogues.

friends_info: Metadata about the show’s episodes.

Let’s load and define all of these datasets. You can see the first 6 rows of each dataset by using the head() function. Execute and try the code. You might try other functions like tail(), str(), summary() to gain insight about the datasets.

Preprocessing Data

If you understood the dataset, we can continue to the next step. We will preprocess the data to gain insights from the show. First of all, let’s find out who the 6 main characters are. We will do this by counting the number of distinct texts each character has in the dataset. The output will be the top 6 characters with the most dialogue.

top_6 <- df_friends %>%
  group_by(speaker) %>%
  summarise(count = n_distinct(text)) %>%
  top_n(6, count) %>%
  pull(speaker)

print(top_6)
[1] "Chandler Bing"  "Joey Tribbiani" "Monica Geller"  "Phoebe Buffay" 
[5] "Rachel Green"   "Ross Geller"   

Ok, we can see the 6 characters with the most utterances in the show. Obviously, these are Rachel, Ross, Chandler, Joey, Monica, and Phoebe. I am not interested in other characters in the show for this analysis. So, I will filter the dataset for only these 6 characters, and name this table df_main_roles. Finally, I will create a new dataset, showing each one of these characters’ dialogue count per season, called uttr_count_per_season. See its first 10 rows in the output below.

df_main_roles <- df_friends %>%
  filter(speaker %in% top_6)

uttr_count_per_season <- df_main_roles %>%
  group_by(speaker, season) %>%
  summarise(count = n_distinct(text)) %>%
  arrange(desc(count))

head(uttr_count_per_season, 10)
# A tibble: 10 × 3
# Groups:   speaker [4]
   speaker        season count
   <chr>           <int> <int>
 1 Rachel Green        7   997
 2 Rachel Green        8   970
 3 Ross Geller         3   966
 4 Chandler Bing       6   941
 5 Ross Geller         1   930
 6 Rachel Green        6   909
 7 Ross Geller         8   908
 8 Chandler Bing       4   895
 9 Joey Tribbiani      5   886
10 Rachel Green        4   879

Main Role of Each Season

So we already have an opinion about the main character for each season. For instance, Rachel Green is the character with the most utterances in seasons 7 and 8, those also being the top two among all seasons and characters. Is Rachel our main character? uttr_count_per_season is still too raw to make a decision. Let’s visualize the data to see the trends more clearly. Following code snippet will create a line plot showing the utterance count of each character per season.

main_role_per_season <- uttr_count_per_season %>%
  spread(key = season, value = count) %>%
  ungroup() %>% # ungroup to use rowMeans because  when it is being applied to a grouped data frame, it will return multiple values instead of a single value for each group
  mutate(average = as.integer(rowMeans(.[,2:11], na.rm = TRUE))) %>%
  arrange(desc(average))

print(main_role_per_season)
# A tibble: 6 × 12
  speaker      `1`   `2`   `3`   `4`   `5`   `6`   `7`   `8`   `9`  `10` average
  <chr>      <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>   <int>
1 Rachel Gr…   797   737   830   879   824   909   997   970   856   712     851
2 Ross Gell…   930   838   966   781   798   826   769   908   827   807     845
3 Chandler …   800   732   787   895   844   941   788   645   870   654     795
4 Monica Ge…   830   749   787   751   838   833   854   750   859   657     790
5 Joey Trib…   620   616   724   774   886   840   857   850   817   668     765
6 Phoebe Bu…   622   629   747   672   754   728   746   711   767   648     702

Do you have a gist? Let’s draw a line plot to understand better. Let me add some images to the plot, too, in order to make it funnier. First, add the image file names to the uttr_count_per_season dataset. As I plotted this beforehand, I know exactly where the images would look the best. It is the 8th season, because that is where all characters are distinct from each other. So, let’s set the image location to the 8th season by defining img_loc. Also I will create a color palette for each character to use in the plots later. Finally, plot the line plot with images.

uttr_count_per_season <- uttr_count_per_season %>%
  mutate(avatar = paste0(speaker, ".png"))

img_loc <- uttr_count_per_season %>%
  filter(season == 8)


speaker_colors <- c(
  "Rachel Green" = "#4363d8",   
  "Ross Geller" = "#ffe119",    
  "Chandler Bing" = "#ff0000",
  "Joey Tribbiani" = "#3cb44b",
  "Monica Geller" = "#f58231", 
  "Phoebe Buffay" = "#911eb4"   
)


ggplot(uttr_count_per_season, aes(x = season, y = count, group = speaker, color = speaker)) +
  geom_line(linewidth = 1.5) + 
  scale_color_manual(values = speaker_colors) +  
  geom_image(data = img_loc, aes(x = season, y = count, group = speaker, image = avatar), 
             size = 0.04, inherit.aes = FALSE) +  
  labs(title = "Top Speakers per Season", x = "Season", y = "Count") +
  theme(legend.position = "bottom") +
  scale_x_continuous(breaks = 1:10, limits = c(1, 10))

ggplot is a powerful tool for data visualization. This plot gives a lot of information already. So, first of all, Ross had the scene most for the first 3 seasons. He lost it to Chandler in season 4. But the most democratic season was the 5th, where the utterances of all characters were the closest to each other. And about season 5, I also like that Joey had the most scenes. After season 6, though, Rachel took the lead and kept it until the end of the show. So, we can say, Ross started as the main character, but mid seasons were really a race for all characters. And obviously Rachel won this race.

Emotions

Another interesting analysis could be about the emotions of the characters. If you haven’t investigated emotions data, go top and wrangle. We will join the df_friends and df_emotions datasets on season, episode, scene and utterance columns to see the most common emotions of the main characters. There are many utterances without a tag of emotion, so filter any NA in emotions. Also filter all the other speakers, except our main 6 characters. To understand better let’s also print the unique emotions listed in the table.

d_w_emotions <- left_join(df_friends, friends::friends_emotions, by = c("season", "episode", "scene", "utterance")) %>%
  filter(!is.na(emotion)) %>%
  filter(speaker %in% top_6)


unique(d_w_emotions$emotion)
[1] "Mad"      "Neutral"  "Joyful"   "Scared"   "Sad"      "Powerful" "Peaceful"

There are 7 distinct emotions. Using that joint dataset, d_w_emotions, we will find out who has each emotion the most. The output will be a bar plot showing the most common emotions per speaker. For our plot, we need to count the unique texts for each speaker and emotion. Meaning, there will be a count for each speaker (6 in total) and emotion (7 in total). So, there will be 6*7=42 rows in the table.

emotion_counts <- d_w_emotions %>%
  group_by(speaker, emotion) %>%
  summarise(count = n_distinct(text)) %>%
  arrange(desc(count))
emotion_counts
# A tibble: 42 × 3
# Groups:   speaker [6]
   speaker        emotion count
   <chr>          <chr>   <int>
 1 Chandler Bing  Neutral   506
 2 Ross Geller    Neutral   447
 3 Monica Geller  Neutral   425
 4 Joey Tribbiani Neutral   422
 5 Phoebe Buffay  Neutral   395
 6 Chandler Bing  Joyful    375
 7 Phoebe Buffay  Joyful    372
 8 Rachel Green   Neutral   366
 9 Joey Tribbiani Joyful    355
10 Monica Geller  Joyful    314
# ℹ 32 more rows

Then, we will filter the most common emotion for each speaker. Finally, we will add images to the plot to make it more fun. So we will have 7 rows, one for each emotion, and there will be speakers, who feels that emotion the most in the show, next to each emotion.

highest_emotion_counts <- emotion_counts %>%
  group_by(emotion) %>%
  filter(count == max(count)) %>%
  mutate(avatar = paste0(speaker, ".png"))

print(highest_emotion_counts)
# A tibble: 7 × 4
# Groups:   emotion [7]
  speaker        emotion  count avatar            
  <chr>          <chr>    <int> <chr>             
1 Chandler Bing  Neutral    506 Chandler Bing.png 
2 Chandler Bing  Joyful     375 Chandler Bing.png 
3 Ross Geller    Scared     295 Ross Geller.png   
4 Rachel Green   Mad        236 Rachel Green.png  
5 Chandler Bing  Peaceful   168 Chandler Bing.png 
6 Joey Tribbiani Powerful   141 Joey Tribbiani.png
7 Monica Geller  Sad        122 Monica Geller.png 

Nice, let’s plot this:

ggplot(highest_emotion_counts, aes(x = emotion, y = count, fill = speaker)) +
  geom_bar(stat = "identity", position = position_dodge(width = 0.9)) +
  scale_fill_manual(values = speaker_colors) +
  labs(title = "Most Common Emotions per Speaker", x = "Emotion", y = "Count") +
  geom_image(aes(x = emotion, y = count - (count/2), group = speaker, image = avatar),
             position = position_dodge(width = 0.9), size = 0.1, inherit.aes = FALSE)

Here, we simply use ggplot() function to create a bar plot. We use geom_bar() to create the bars, and geom_image() to add images to the plot. We also use scale_fill_manual() to set the colors of the bars. There are two aes() calls in this code: first defines the aestetic features of the plot, such as x and y axes, and the second one defines the image location. In fact, all functions take the aes() in the ggplot() by default. To override it, we use inherit.aes = FALSE in the geom_image() function. Also, in the geom_image() function, we use y = count - (count/2) adds the avatars to the middle of each bar.

When the plot is investigated, the most joyful, neutral, and peaceful character is the same person: Chandler. The maddest character is Rachel, no surprise. The saddest character is Monica (I guess her relationships are too deep for her). The most powerful character is Joey, which perfectly fits. Finally, the most scared character is Ross. That last one is no surprise, either. A clear image of his eyebrows is just in my mind right now. Just see the video if you need to remember.

Conclusion

In this post, we analyzed the main characters of the “Friends” TV show using the friends package in R. We found that Rachel Green is the leading character based on the number of dialogues per season. We also explored the most common emotions of the main characters, revealing interesting insights about their personalities. We used ggplot2 and ggimage for visualization and dplyr and tidyr for data wrangling. This exploration of “Friends” transcript data highlights the value of analyzing entertainment datasets. I really enjoyed the process. Hope you did, too.