Who Mentions Who in the Office?

Connor Rothschild

Last updated on Jan 25, 2021 5 min read R, D3.js

This project explores the relationships between different characters in the classic TV show The Office. Using transcript data newly released in Bradley H. Lindblad’s schrute package, I’d like to see who mentions who in the Office. Is one character more popular than the others?

library(schrute)
library(tidyverse)
library(cr)
set_cr_theme()

Let’s take a look at the transcripts:

transcripts <- schrute::theoffice

knitr::kable(transcripts[1:3,])

index	season	episode	episode_name	director	writer	character	text	text_w_direction	imdb_rating	total_votes	air_date
1	1	1	Pilot	Ken Kwapis	Ricky Gervais;Stephen Merchant;Greg Daniels	Michael	All right Jim. Your quarterlies look very good. How are things at the library?	All right Jim. Your quarterlies look very good. How are things at the library?	7.6	3706	2005-03-24
2	1	1	Pilot	Ken Kwapis	Ricky Gervais;Stephen Merchant;Greg Daniels	Jim	Oh, I told you. I couldn’t close it. So…	Oh, I told you. I couldn’t close it. So…	7.6	3706	2005-03-24
3	1	1	Pilot	Ken Kwapis	Ricky Gervais;Stephen Merchant;Greg Daniels	Michael	So you’ve come to the master for guidance? Is this what you’re saying, grasshopper?	So you’ve come to the master for guidance? Is this what you’re saying, grasshopper?	7.6	3706	2005-03-24

By using tidytext, we can split the transcripts into their constituent parts (words).

transcripts_tokenized <- transcripts %>%
  tidytext::unnest_tokens(word, text)

knitr::kable(transcripts_tokenized[1:3,])

index	season	episode	episode_name	director	writer	character	text_w_direction	imdb_rating	total_votes	air_date	word
1	1	1	Pilot	Ken Kwapis	Ricky Gervais;Stephen Merchant;Greg Daniels	Michael	All right Jim. Your quarterlies look very good. How are things at the library?	7.6	3706	2005-03-24	all
1	1	1	Pilot	Ken Kwapis	Ricky Gervais;Stephen Merchant;Greg Daniels	Michael	All right Jim. Your quarterlies look very good. How are things at the library?	7.6	3706	2005-03-24	right
1	1	1	Pilot	Ken Kwapis	Ricky Gervais;Stephen Merchant;Greg Daniels	Michael	All right Jim. Your quarterlies look very good. How are things at the library?	7.6	3706	2005-03-24	jim

We can now use the text to see who mentions who. But first, let’s construct a vector with a list of characters to keep in the analysis. There are 485 characters in the transcripts, so its important we filter only those of relevance:

keep_characters <- transcripts %>% 
  group_by(character) %>% 
  count() %>% 
  arrange(desc(n)) %>% 
  head(9) %>% 
  pull(character)

knitr::kable(keep_characters)

x
Michael
Dwight
Jim
Pam
Andy
Angela
Kevin
Erin
Oscar

This is an optional decision. One may be interested in seeing which characters talk about Jim most, including those characters who are otherwise less relevant. I decide to filter according to the main cast so that comparisons between characters (e.g., through a chord diagram) is feasible.

Who Mentions Who?

Jim: A Case Study

Who is talking to who in the Office?

Now that we have keep_characters, we can filter according to it and spit out who mentions who among the most relevant Office characters.

transcripts_tokenized %>% 
  filter(character %in% keep_characters) %>% 
  mutate(jim = ifelse(word == "jim", 1, 0)) %>% 
  group_by(character) %>% 
  summarise(jim = sum(jim)) %>% 
  arrange(desc(jim)) %>% 
  mutate(character = reorder(character, jim)) %>% 
  ggplot(ggplot2::aes(character, jim)) +
  geom_col() +
  coord_flip() +
  fix_bars() +
  labs(title = "Who Mentions Jim?",
       subtitle = "Counts of 'Jim' in The Office Transcripts",
       x = element_blank(),
       y = "Mentions")

The takeaway here is that Dwight mentions Jim the most, followed by Michael. No surprise there! What I find interesting is that only three characters really talk about/to Jim. After Dwight, Michael, and Pam (and Jim referencing himself, apparently), the mention rate for Jim’s name drops from over 200 to only 60 mentions. It seems as if the writers of the Office intentionally made Jim a subject of conversation among only a few characters!

Replicate for the rest of the cast

Next, we replicate that process for the rest of the cast. There is probably a better way to do this.

data_chord <- transcripts_tokenized %>% 
  filter(character %in% keep_characters) %>% 
  mutate(jim = ifelse(word == "jim", 1, 0)) %>% 
  mutate(michael = ifelse(word == "michael", 1, 0)) %>% 
  mutate(dwight = ifelse(word == "dwight", 1, 0)) %>% 
  mutate(pam = ifelse(word == "pam", 1, 0)) %>% 
  mutate(andy = ifelse(word == "andy", 1, 0)) %>% 
  mutate(angela = ifelse(word == "angela", 1, 0)) %>% 
  mutate(kevin = ifelse(word == "kevin", 1, 0)) %>% 
  mutate(erin = ifelse(word == "erin", 1, 0)) %>% 
  mutate(oscar = ifelse(word == "oscar", 1, 0)) %>% 
  # mutate(ryan = ifelse(word == "ryan", 1, 0)) %>% 
  # mutate(darryl = ifelse(word == "darryl", 1, 0)) %>% 
  # mutate(phyllis = ifelse(word == "phyllis", 1, 0)) %>% 
  # mutate(kelly = ifelse(word == "kelly", 1, 0)) %>% 
  # mutate(toby = ifelse(word == "toby", 1, 0)) %>% 
  group_by(character) %>% 
  summarise_at(vars(jim:oscar), funs(sum))

Visualize

Now, let’s make a chord diagram!

We first have to convert the data frame into a format chordDiagram will recognize.

circlize_data <- as.data.frame(data_chord) %>% 
  pivot_longer(jim:oscar, names_to = "to", values_to = "value") %>% 
  rename(from = 'character') %>% 
  mutate(to = str_to_title(to))

This process pivots each row of data into a value-key combination, so that the data looks like this:

from	to	value
Andy	Jim	48
Andy	Michael	40
Andy	Dwight	80
Andy	Pam	34
Andy	Andy	42
Andy	Angela	33

Using that data, we can create a chord diagram quite easily, using a single command from the circlize library. This chapter is helpful.

library(circlize)
chordDiagram(circlize_data, grid.col = c("#B997C7", "#824D99", "#4E78C4", "#57A2AC", "#7EB875", "#D0B541", "#E67F33", "#CE2220", "#521A13"))

Make It Interactive

With nine people, some of the data can get easily concealed (how often did Angela mention Michael’s name?). One way to fix this is to make the visualization interactive, so that a user can hover over chords to see relationships between characters.

First, we conduct some data cleaning. I found that the rownames and column names have to be of the same order; let’s do a little manipulation to get there:

int_chord <- as.data.frame(data_chord)

rownames(int_chord) <- int_chord$character

row.order <- c("Jim", "Michael", "Dwight", "Pam", "Andy", "Angela", "Kevin", "Erin", "Oscar")
               #, "Ryan", "Darryl", "Phyllis", "Kelly", "Toby")
int_chord <- int_chord[row.order,]

Next, we load Matt Flor’s chorddiag package, and construct a matrix according to its function’s liking:

# devtools::install_github("mattflor/chorddiag")
library(chorddiag)

m <- as.matrix(int_chord[-1])

dimnames(m) <- list(have = int_chord$character,
                    prefer = str_to_title(colnames(int_chord[-1])))

Finally, we add a color palette and construct the diagram.

groupColors <- c("#B997C7", "#824D99", "#4E78C4", "#57A2AC", "#7EB875", "#D0B541", "#E67F33", "#CE2220", "#521A13")

p <- chorddiag(m, 
               groupColors = groupColors, 
               groupnamePadding = 35,
               tickInterval = 50,
               groupnameFontsize = 12)
p

# save the widget
# library(htmlwidgets)
# saveWidget(p, file="chord_interactive.html")

Play around with the diagram here!

R visualization interactive D3.js

Connor Rothschild

Undergraduate at Rice University

I’m a senior at Rice University interested in public policy, data science and their intersection. I’m most passionate about translating complex data into informative and entertaining visualizations.

Who Mentions Who in the Office?

Who Mentions Who?

Jim: A Case Study

Replicate for the rest of the cast

Visualize

Make It Interactive

Connor Rothschild

Undergraduate at Rice University

Related