Presidential Debate Watson Tone Analysis

With the emotionally polarizing 2016 political landscape, I thought it would be cool to rationally analyze each candidate’s tone during each of the debates. Utilizing IBM Watson’s Tone Analyzer I set out to quantify the emotional range of each candidate.

View Results

Separate Debates

Debate Summary

First, in order to use the Tone Analyzer, I needed the transcribed speech text of each debate. So, I built a parser to extract the speech text from a given article url.

I arbitrarily chose The Washington Post as my data source for the speech transcripts and used BeautifulSoup to parse the html.

The bulk of the text extractor:

def _key_to_re(k):
    """A speaker is identified with
    {speaker_name}:  
    """
    return r"{k}\: ".format(k=k)

soup = bs4.BeautifulSoup(article, "html.parser")

paragraphs = soup.find_all("p")

speaker = None
for p in paragraphs:
    found_name = None
    text = p.text
    for name in transcript:
        search = _key_to_re(name)
        found_name = re.match(search, text)
        if found_name:
            speaker = name
            text_with_speaker_removed = re.sub(search, "", text)
            transcript[name]["dialog"].append(text_with_speaker_removed)
            break

With the text for each speaker saved in a list, it was time to use Watson.

def run_analyzer(self, content):
    if type(content) == list:
        content_text = ". ".join(content)
    else:
        content_text = content
    response = self.analyzer.tone(content_text, sentences=False)
    if response:
        return response["document_tone"]
    return None

_{sentences = False to analyze the text as one document, not separate sentences}

The response data returned by Watson needed some cleaning:

printer.pprint(transcript["TRUMP"])
{ 'tone_categories': [ { 'category_id': 'emotion_tone',
                         'category_name': 'Emotion Tone',
                         'tones': [ { 'score': 1.0,
                                      'tone_id': 'anger',
                                      'tone_name': 'Anger'},
                                    { 'score': 0.102859,
                                      'tone_id': 'disgust',
                                      'tone_name': 'Disgust'},
                                    { 'score': 1.0,
                                      'tone_id': 'fear',
                                      'tone_name': 'Fear'},
                                    { 'score': 0.081609,
                                      'tone_id': 'joy',
                                      'tone_name': 'Joy'},
                                    { 'score': 0.106559,
                                      'tone_id': 'sadness',
                                      'tone_name': 'Sadness'}]},
                                      ....

Next, I opened up an ipython notebook to clean the data in order to graph it.

from bokeh.charts import (
    Bar,
    show,
    output_file,
    output_notebook,
    )
from bokeh import plotting
import pandas as pd
import simplejson as json

C = "CLINTON"
T = "TRUMP"

tones = json.load(open("the-first-trump-clinton-presidential-debate-transcript-annotated.json", "r"))
tones_2 = json.load(open("everything-that-was-said-at-the-second-donald-trump-vs-hillary-clinton-debate-highlighted.json","r"))
tones_3 = json.load(open("the-final-trump-clinton-debate-transcript-annotated.json","r"))

def tones_to_df(json):
    """tranforms the response dict to DataFrame"""

    emotional_tones = []
    for k in json:
        if k in ("CLINTON", "TRUMP"):
            want = json[k]["tone_categories"]
            for w in want:
                list_of_d = w["tones"]
                for d in list_of_d:
                    d["candidate"] = k
                    emotional_tones.append(d)
    df = pd.DataFrame(emotional_tones)
    df = df[["candidate", "score", "tone_name"]]
    return df

# load dataframes in list
df_tones = []
for t in (tones, tones_2, tones_3):
    df_tones.append(tones_to_df(t))

def show_tone(df, name, tone):
    """
    helper function to view specific tone
    from a given debate
    """
    return df[ (df["tone_name"] == tone) & (df["candidate"] == name)]

# let's make sure the DataFrame is formatted correctly
show_tone(df_tones[0], C, "Tentative")
show_tone(df_tones[0], T, "Tentative")

	candidate	score	tone_name
7	CLINTON	0.274	Tentative
20	TRUMP	0.112	Tentative

To plot the DataFrame for each debate, I used the Bokeh library, which creates quick clean charts.

def create_grouped_bar_from_df(df, num):
    b = Bar(df, "tone_name", values='score',
            group="candidate",
            title="2016 Presidential Debate Tone Analysis as per IBM Watson - Debate %s" % num,
            plot_height = 600,
            plot_width = 800,
            legend="top_right",
            color = ["#70a6ff", "#ed5757"])
    show(b)

for i, df in enumerate(df_tones):
    create_grouped_bar_from_df(df, i+1)

debate_1

such anger, such fear

debate_2

debate_3

Finally, I merged the 3 debate DataFrames together to perform an aggregate analysis of each candidate’s overall tone.

# All transcripts combined in on df
df_all = pd.concat(df_tones)
# Get mean and median
df_all_mean = pd.DataFrame(df_all.groupby(["candidate", "tone_name"])["score"].mean()).reset_index()
df_all_median = pd.DataFrame(df_all.groupby(["candidate", "tone_name"])["score"].median()).reset_index()

# Add the aggregate statistic name to each df
df_all_mean["aggregate"] = "mean"
df_all_median["aggregate"] = "median"
df_aggregate = pd.concat([df_all_mean, df_all_median])
b = Bar(df_aggregate,
        "tone_name",
        values="score",
        group=["aggregate", "candidate"],
        title="Aggregate Debate Tone Analysis",
        plot_height = 600,
        plot_width = 800,
        legend="top_right",
        ylabel="Score",
        color = ["#70a6ff", "#ed5757", "#025df4", "#f20404"])
show(b)

debate_aggregate

Takeaways:

Overall both candidates’ tone’s were fairly negative in all the debates.
The Joy tone never reached above ~ 0.08 / 1.00 for either candidate.
Both Clinton and Trump were not Confident throughout the debates?
Clinton was more Analytical, Fearful and Tentative than Trump.
Trump was more Extraverted and full of Disgust.
The tone with the largest difference between the two candidates was Analytical.
- Clinton’s mean Analytical score was 0.503813.
- Trump’s mean Analytical score was 0.105237.

fear_anger

View on github.