10:30 AM

Leviathan Always Grows

Anyone who pays even the least bit of attention knows that the U.S. federal government continues to grow, and grow, and grow. This fact cannot be disputed; using the government's own financial data, one can quickly see the magnitude of growth every year. Some may be confused by the unique language spoken near the Potomac, where budget "cuts" are typically not cuts at all, but merely reductions in the rate of growth versus the prior year. In almost all cases, not only does the total budget grow significantly each year, nearly all components of the budget grow as well. Periodically, certain departments or agencies may see a year over year reduction in their budget, although this is the exception to the rule of near continuous growth.

To illustrate this growth and how disproportionate it is to the world you and I live in, I have constructed a budget tracker dashboard in Tableau Public that allows anyone to select individual departments to see just how rapid their growth has been in the 1962-2016 budget period. This is supported by figures showing the growth rate relative to the government's own CPI inflation calculator (admittedly a flawed measure, but a commonly referenced one), as well as budget shares over time, and trend charts displaying annual patterns. It's a fun tool to explore budget growth, and see where things have really gotten out of control.

Read More

17:00 PM

State of the Union Text Analysis

The State of the Union, or its equivalent initial speech from a US president to Congress, is likely to provide ideas about where that particular president would like to steer the country on his watch. By analyzing these speeches across all candidates who have delivered one (a few died before having the opportunity), we should be able to see which presidents had similar thought processes and political beliefs across a 225 year period. We should also be able to detect significant changes in how these speeches were delivered, and what topics were central to each speech. To follow through on this, I have taken the first available speech to Congress for each president, and analyzed it using a mix of text extraction, text processing, and data visualization approaches. Let's see what this process reveals about the individual politicians as well as any larger changes that might have occurred over the last 2+ centuries.

Read More

20:06 PM

Visualizing CEPA Education Data, Part 2

A couple months back, I wrote about investigating CEPA academic achievement data (provided through the CEPA project at Stanford University (Sean F. Reardon, Demetra Kalogrides, Andrew Ho, Ben Shear, Kenneth Shores, Erin Fahle. (2016). Stanford Education Data Archive. http://purl.stanford.edu/db586ns4974). Finally, I've got part two, wherein I use Exploratory, the powerful R-based tool for data wrangling, analysis, and visualization. Exploratory enables folks like me who love the power of R but have not been immersed in the oftentimes complex world of R coding. The Exploratory front end makes using R a pleasure, as I hope this post will help illustrate.

Exploratory takes the essential R dataframe as a starting point, and then allows you to easily manipulate your data using a multitude of R packages included with the base Exploratory install. This post will walk through some simple analysis using CEPA data within Exploratory. We'll begin by setting the stage with a screenshot of the Exploratory workspace.

On the left side are the dataframes belonging to this project, while the center is dedicated to the primary workspace where all tables and charts will be viewed. Off to the right is where any actions are stored - filters, functions, and so on that act upon the dataframe. In this case, we're viewing the Summary tab for a particular dataframe. We can also look at a tabular view of the data by clicking the Table icon:

Finally, we have the option to see our data in visual form by selecting the Viz tab icon:

Read More

13:57 PM

2016 Detroit Jazz Fest Moods Network

One of the great events of the summer in Detroit is the annual Detroit Jazz Festival, an epic event in the jazz world, with many of the world's foremost musicians convening in Detroit for an entirely free set of performances. It is in fact the world's largest free jazz festival, and may well be the best jazz weekend regardless of price.

For 2016, the festival plays host to the likes of the legendary bassist Ron Carter, Brad Mehldau, John Scofield, Randy Weston, Billy Harper, and a host of other musicians both international and local. So I thought it fitting to blend my love of jazz with my affection for network graphs, by using user tags from the All Music website. These tags are labels given to each musician based on listener perceptions of their work, and provide interesting information to use in building a graph.

The initial graph creation was done in Gephi, followed by deployment using sigma.js, which allows us use the web to probe and explore the graph to find interesting patterns in the data. The Force Atlas 2 algorithm was used to create the layout, with the nodes colored based on their modularity class, a form of clustering based on similar characteristics. When clustering works very well, nodes of the same color will stand apart from other color groups; in this instance, we are partially successful in this regard. We'll learn more about this shortly.

For those who want to interact with the network and draw your own conclusions, here you are:


Let's start with a view of the entire network:

Jazz Moods Network

Read More

14:45 PM

Visualizing CEPA Education Data, Part 1

In this post, we'll begin walking through the massive educational achievement dataset provided through the CEPA project at Stanford University (Sean F. Reardon, Demetra Kalogrides, Andrew Ho, Ben Shear, Kenneth Shores, Erin Fahle. (2016). Stanford Education Data Archive. http://purl.stanford.edu/db586ns4974). This archive provides a wealth of data on educational achievement across grade levels and academic years, and is supported with a vast array of socioeconomic indicators that can be used for deeper analysis.

Our initial steps to analyze and visualize the data will begin with Microsoft Excel for data prep (use your tool of choice) before moving on to Exploratory and Trelliscope for visual analysis of the data. Each of these powerful tools are based on R, the powerful open source statistical framework that will facilitate multiple analytical paths. Exploratory has its own GUI that uses many of R's most powerful analytic packages, while Trelliscope will be employed from within RStudio.

Here's a link to Exploratory:

and Trelliscope:

Read More

1 2 3