17:00 PM

State of the Union Text Analysis

The State of the Union, or its equivalent initial speech from a US president to Congress, is likely to provide ideas about where that particular president would like to steer the country on his watch. By analyzing these speeches across all candidates who have delivered one (a few died before having the opportunity), we should be able to see which presidents had similar thought processes and political beliefs across a 225 year period. We should also be able to detect significant changes in how these speeches were delivered, and what topics were central to each speech. To follow through on this, I have taken the first available speech to Congress for each president, and analyzed it using a mix of text extraction, text processing, and data visualization approaches. Let's see what this process reveals about the individual politicians as well as any larger changes that might have occurred over the last 2+ centuries.

Read More

19:20 PM

DNC Bernie Sanders Emails

Wikileaks has provided people like myself with an abundance of material to download, analyze, visualize, and ultimately to share insights on the behaviors of the elites, in this case the emails from the Democratic National Committee or DNC. Using this data source, we have the ability to mine specific aspects of the entire dataset using a simple search term on the Wikileaks site. For this post, and the accompanying visualizations, I have chosen to examine the DNC's treatment of Bernie Sanders, who materialized into a serious contender for the Democratic nomination.

It was revealed through many of these emails that the DNC was consciously favoring Hillary Clinton over the upstart Sanders. In this post, I will examine the linkages between both insiders at the DNC and outside contacts such as reporters and campaign personnel. To do this, I'll employ Gephi, the open source network analysis tool, followed by Sigma.js for visualizing the final networks on the web. The initial goal will be to understand the relationships in the network, using a variety of analytic measures such as centrality, modularity, connected components, and degrees. Using these measures, we will be able to better understand how data flowed both into and out of the DNC via the email channel.

What we'll wind up with is essentially a meta-view of the DNC's email activities. Our initial pass at the data using network analysis will not focus on the content of the emails; for that, we'll do some subsequent text mining to help us understand both the content and tone of the email exchanges. I hope to be able to tie these two pieces together, so that we may ultimately understand who was saying what about Sanders, and who it was being communicated to. Let's get started with the network analysis by providing some background on the graph statistics to be employed.

Read More

21:50 PM

Wikileaks and the Podesta Emails

Thank goodness for Julian Assange and Wikileaks, as well as the others who have dared fight the established political forces in this country. Thanks to their efforts, the veil has been lifted and we can all see how manipulative and crooked these folks are as they do their level best to fleece the average citizen and make themselves wealthy beyond their wildest dreams. So it is with Hillary Clinton in the 2016 campaign, as the recent hacks of the John Podesta emails have confirmed. For full details, you can start here: https://wikileaks.org/podesta-emails/.

Podesta, Hillary Clinton's campaign manager and a long-time associate of the Clintons has been exposed as a master manipulator, working with many others behind the scenes to tilt the campaign in Clinton's favor. Thanks to Wikileaks, we can see very clearly the efforts of a host of players to do everything in their power to discredit Bernie Sanders and Donald Trump in an effort to put their candidate in the White House. The individual emails lay bare the machinations of the Democratic National Committee in scurrilous detail, and make for entertaining reading. Of course, many of Hillary Clinton's supporters will dismiss any notions of wrongdoing courtesy of the rather pathetic pronouncements of FBI Director Comey, but the evidence is plentiful, regardless of the FBI's "official" position.

In this post, I'll take a network graph view of the players involved, using data from the http://gdeltproject.org. This will help shed light on the primary participants, how they interrelate, and who the "targets" of their mischief are. At some point, I'll also work up a text analysis of the email content, but that's for another post.

Read More

20:06 PM

Visualizing CEPA Education Data, Part 2

A couple months back, I wrote about investigating CEPA academic achievement data (provided through the CEPA project at Stanford University (Sean F. Reardon, Demetra Kalogrides, Andrew Ho, Ben Shear, Kenneth Shores, Erin Fahle. (2016). Stanford Education Data Archive. http://purl.stanford.edu/db586ns4974). Finally, I've got part two, wherein I use Exploratory, the powerful R-based tool for data wrangling, analysis, and visualization. Exploratory enables folks like me who love the power of R but have not been immersed in the oftentimes complex world of R coding. The Exploratory front end makes using R a pleasure, as I hope this post will help illustrate.

Exploratory takes the essential R dataframe as a starting point, and then allows you to easily manipulate your data using a multitude of R packages included with the base Exploratory install. This post will walk through some simple analysis using CEPA data within Exploratory. We'll begin by setting the stage with a screenshot of the Exploratory workspace.

On the left side are the dataframes belonging to this project, while the center is dedicated to the primary workspace where all tables and charts will be viewed. Off to the right is where any actions are stored - filters, functions, and so on that act upon the dataframe. In this case, we're viewing the Summary tab for a particular dataframe. We can also look at a tabular view of the data by clicking the Table icon:

Finally, we have the option to see our data in visual form by selecting the Viz tab icon:

Read More

13:57 PM

2016 Detroit Jazz Fest Moods Network

One of the great events of the summer in Detroit is the annual Detroit Jazz Festival, an epic event in the jazz world, with many of the world's foremost musicians convening in Detroit for an entirely free set of performances. It is in fact the world's largest free jazz festival, and may well be the best jazz weekend regardless of price.

For 2016, the festival plays host to the likes of the legendary bassist Ron Carter, Brad Mehldau, John Scofield, Randy Weston, Billy Harper, and a host of other musicians both international and local. So I thought it fitting to blend my love of jazz with my affection for network graphs, by using user tags from the All Music website. These tags are labels given to each musician based on listener perceptions of their work, and provide interesting information to use in building a graph.

The initial graph creation was done in Gephi, followed by deployment using sigma.js, which allows us use the web to probe and explore the graph to find interesting patterns in the data. The Force Atlas 2 algorithm was used to create the layout, with the nodes colored based on their modularity class, a form of clustering based on similar characteristics. When clustering works very well, nodes of the same color will stand apart from other color groups; in this instance, we are partially successful in this regard. We'll learn more about this shortly.

For those who want to interact with the network and draw your own conclusions, here you are:


Let's start with a view of the entire network:

Jazz Moods Network

Read More

1 2 3