Wikileaks has provided people like myself with an abundance of material to download, analyze, visualize, and ultimately to share insights on the behaviors of the elites, in this case the emails from the Democratic National Committee or DNC. Using this data source, we have the ability to mine specific aspects of the entire dataset using a simple search term on the Wikileaks site. For this post, and the accompanying visualizations, I have chosen to examine the DNC's treatment of Bernie Sanders, who materialized into a serious contender for the Democratic nomination.
It was revealed through many of these emails that the DNC was consciously favoring Hillary Clinton over the upstart Sanders. In this post, I will examine the linkages between both insiders at the DNC and outside contacts such as reporters and campaign personnel. To do this, I'll employ Gephi, the open source network analysis tool, followed by Sigma.js for visualizing the final networks on the web. The initial goal will be to understand the relationships in the network, using a variety of analytic measures such as centrality, modularity, connected components, and degrees. Using these measures, we will be able to better understand how data flowed both into and out of the DNC via the email channel.
What we'll wind up with is essentially a meta-view of the DNC's email activities. Our initial pass at the data using network analysis will not focus on the content of the emails; for that, we'll do some subsequent text mining to help us understand both the content and tone of the email exchanges. I hope to be able to tie these two pieces together, so that we may ultimately understand who was saying what about Sanders, and who it was being communicated to. Let's get started with the network analysis by providing some background on the graph statistics to be employed.