Text Analysis on Election '08 stump speeches

By Raif Majeed February 3, 2008

If you've seen the news, you'll know that there are lots of words flying around nowadays -- political speeches, debates, ads, etc. If you're trying to understand the words and decide how to vote, it can be overwhelming. However, if you look at the words as data, you can suddenly get interesting new insights.

Here's an interesting packaged workbook that shows a text analysis of recent stump speeches by the four major remaining presidential candidates (Hillary Clinton, John McCain, Barack Obama, and Mitt Romney). To give you a flavor of the kind of analysis I've done here, I've developed a packaged workbook showing the most common 2-word phrases uttered in each candidate's speech:
You can adjust the quick filter under the dashboard to limit yourself to phrases of a certain length of characters (the space between the words counts as one character). I want to keep this post politically neutral, so I'll let you dig in with Tableau (or the free Tableau Reader) and make your own discoveries. I'm sure you'll be surprised by some of the results, as I was.
The speeches were pulled from candidate websites; each was in a different forum -- for instance, Hillary Clinton was speaking in a church and Mitt Romney was speaking to auto workers in Michigan, which accounts for some of the unusual phrases you see.
To get the texts into a form that Tableau could understand, I used a quick Perl script to eliminate non-word characters (except whitespace, apostrophes, and hyphens), then split the text on whitespace and output the result as a CSV. To get 2-word analysis, I left-joined the resulting CSV against itself, with a one-off ON condition ("[current].[Position]+1=[next].[Position]", where [current] and [next] are table aliases). I used context filters and dashboards liberally to generate what you see.
I encourage you to play around with the workbook in Tableau and see what patterns you can find. Enjoy!

Comments

Submitted by James B. on

Ah, a church... for a second, I thought you'd swapped out Clinton with Huckabee. :)

Submitted by Raif M. on

Corporate...blog.... Must...remain...neutral.... :^)

In all seriousness, James raises a good point. As the campaign goes on I'd like to add additional speeches to the data set to get a more representative sample over time.

On the other hand, phrase variations as a function of time, audience, and location are interesting in themselves and I hope to capture the extent of that variability too.

Submitted by sandeep dambla (not verified) on

How does the data set look like for this analysis? I mean the raw data file that we import to tableau. Can anyone please share the data file? It will be of great help.

non-humans click here