01
Jul
2017
10:30 AM

Leviathan Always Grows

Anyone who pays even the least bit of attention knows that the U.S. federal government continues to grow, and grow, and grow. This fact cannot be disputed; using the government's own financial data, one can quickly see the magnitude of growth every year. Some may be confused by the unique language spoken near the Potomac, where budget "cuts" are typically not cuts at all, but merely reductions in the rate of growth versus the prior year. In almost all cases, not only does the total budget grow significantly each year, nearly all components of the budget grow as well. Periodically, certain departments or agencies may see a year over year reduction in their budget, although this is the exception to the rule of near continuous growth.

To illustrate this growth and how disproportionate it is to the world you and I live in, I have constructed a budget tracker dashboard in Tableau Public that allows anyone to select individual departments to see just how rapid their growth has been in the 1962-2016 budget period. This is supported by figures showing the growth rate relative to the government's own CPI inflation calculator (admittedly a flawed measure, but a commonly referenced one), as well as budget shares over time, and trend charts displaying annual patterns. It's a fun tool to explore budget growth, and see where things have really gotten out of control.

Read More

07
Apr
2017
17:00 PM

State of the Union Text Analysis

The State of the Union, or its equivalent initial speech from a US president to Congress, is likely to provide ideas about where that particular president would like to steer the country on his watch. By analyzing these speeches across all candidates who have delivered one (a few died before having the opportunity), we should be able to see which presidents had similar thought processes and political beliefs across a 225 year period. We should also be able to detect significant changes in how these speeches were delivered, and what topics were central to each speech. To follow through on this, I have taken the first available speech to Congress for each president, and analyzed it using a mix of text extraction, text processing, and data visualization approaches. Let's see what this process reveals about the individual politicians as well as any larger changes that might have occurred over the last 2+ centuries.

Read More

15
Nov
2016
19:20 PM

DNC Bernie Sanders Emails

Wikileaks has provided people like myself with an abundance of material to download, analyze, visualize, and ultimately to share insights on the behaviors of the elites, in this case the emails from the Democratic National Committee or DNC. Using this data source, we have the ability to mine specific aspects of the entire dataset using a simple search term on the Wikileaks site. For this post, and the accompanying visualizations, I have chosen to examine the DNC's treatment of Bernie Sanders, who materialized into a serious contender for the Democratic nomination.

It was revealed through many of these emails that the DNC was consciously favoring Hillary Clinton over the upstart Sanders. In this post, I will examine the linkages between both insiders at the DNC and outside contacts such as reporters and campaign personnel. To do this, I'll employ Gephi, the open source network analysis tool, followed by Sigma.js for visualizing the final networks on the web. The initial goal will be to understand the relationships in the network, using a variety of analytic measures such as centrality, modularity, connected components, and degrees. Using these measures, we will be able to better understand how data flowed both into and out of the DNC via the email channel.

What we'll wind up with is essentially a meta-view of the DNC's email activities. Our initial pass at the data using network analysis will not focus on the content of the emails; for that, we'll do some subsequent text mining to help us understand both the content and tone of the email exchanges. I hope to be able to tie these two pieces together, so that we may ultimately understand who was saying what about Sanders, and who it was being communicated to. Let's get started with the network analysis by providing some background on the graph statistics to be employed.

Read More

07
Nov
2016
21:50 PM

Wikileaks and the Podesta Emails

Thank goodness for Julian Assange and Wikileaks, as well as the others who have dared fight the established political forces in this country. Thanks to their efforts, the veil has been lifted and we can all see how manipulative and crooked these folks are as they do their level best to fleece the average citizen and make themselves wealthy beyond their wildest dreams. So it is with Hillary Clinton in the 2016 campaign, as the recent hacks of the John Podesta emails have confirmed. For full details, you can start here: https://wikileaks.org/podesta-emails/.

Podesta, Hillary Clinton's campaign manager and a long-time associate of the Clintons has been exposed as a master manipulator, working with many others behind the scenes to tilt the campaign in Clinton's favor. Thanks to Wikileaks, we can see very clearly the efforts of a host of players to do everything in their power to discredit Bernie Sanders and Donald Trump in an effort to put their candidate in the White House. The individual emails lay bare the machinations of the Democratic National Committee in scurrilous detail, and make for entertaining reading. Of course, many of Hillary Clinton's supporters will dismiss any notions of wrongdoing courtesy of the rather pathetic pronouncements of FBI Director Comey, but the evidence is plentiful, regardless of the FBI's "official" position.

In this post, I'll take a network graph view of the players involved, using data from the http://gdeltproject.org. This will help shed light on the primary participants, how they interrelate, and who the "targets" of their mischief are. At some point, I'll also work up a text analysis of the email content, but that's for another post.

Read More

25
Oct
2016
20:06 PM

Visualizing CEPA Education Data, Part 2

A couple months back, I wrote about investigating CEPA academic achievement data (provided through the CEPA project at Stanford University (Sean F. Reardon, Demetra Kalogrides, Andrew Ho, Ben Shear, Kenneth Shores, Erin Fahle. (2016). Stanford Education Data Archive. http://purl.stanford.edu/db586ns4974). Finally, I've got part two, wherein I use Exploratory, the powerful R-based tool for data wrangling, analysis, and visualization. Exploratory enables folks like me who love the power of R but have not been immersed in the oftentimes complex world of R coding. The Exploratory front end makes using R a pleasure, as I hope this post will help illustrate.

Exploratory takes the essential R dataframe as a starting point, and then allows you to easily manipulate your data using a multitude of R packages included with the base Exploratory install. This post will walk through some simple analysis using CEPA data within Exploratory. We'll begin by setting the stage with a screenshot of the Exploratory workspace.

On the left side are the dataframes belonging to this project, while the center is dedicated to the primary workspace where all tables and charts will be viewed. Off to the right is where any actions are stored - filters, functions, and so on that act upon the dataframe. In this case, we're viewing the Summary tab for a particular dataframe. We can also look at a tabular view of the data by clicking the Table icon:

Finally, we have the option to see our data in visual form by selecting the Viz tab icon:

Read More

1 2 3