The State of the Union, or its equivalent initial speech from a US president to Congress, is likely to provide ideas about where that particular president would like to steer the country on his watch. By analyzing these speeches across all candidates who have delivered one (a few died before having the opportunity), we should be able to see which presidents had similar thought processes and political beliefs across a 225 year period. We should also be able to detect significant changes in how these speeches were delivered, and what topics were central to each speech. To follow through on this, I have taken the first available speech to Congress for each president, and analyzed it using a mix of text extraction, text processing, and data visualization approaches. Let's see what this process reveals about the individual politicians as well as any larger changes that might have occurred over the last 2+ centuries.
A couple months back, I wrote about investigating CEPA academic achievement data (provided through the CEPA project at Stanford University (Sean F. Reardon, Demetra Kalogrides, Andrew Ho, Ben Shear, Kenneth Shores, Erin Fahle. (2016). Stanford Education Data Archive. http://purl.stanford.edu/db586ns4974). Finally, I've got part two, wherein I use Exploratory, the powerful R-based tool for data wrangling, analysis, and visualization. Exploratory enables folks like me who love the power of R but have not been immersed in the oftentimes complex world of R coding. The Exploratory front end makes using R a pleasure, as I hope this post will help illustrate.
Exploratory takes the essential R dataframe as a starting point, and then allows you to easily manipulate your data using a multitude of R packages included with the base Exploratory install. This post will walk through some simple analysis using CEPA data within Exploratory. We'll begin by setting the stage with a screenshot of the Exploratory workspace.
On the left side are the dataframes belonging to this project, while the center is dedicated to the primary workspace where all tables and charts will be viewed. Off to the right is where any actions are stored - filters, functions, and so on that act upon the dataframe. In this case, we're viewing the Summary tab for a particular dataframe. We can also look at a tabular view of the data by clicking the Table icon:
Finally, we have the option to see our data in visual form by selecting the Viz tab icon:
In this post, we'll begin walking through the massive educational achievement dataset provided through the CEPA project at Stanford University (Sean F. Reardon, Demetra Kalogrides, Andrew Ho, Ben Shear, Kenneth Shores, Erin Fahle. (2016). Stanford Education Data Archive. http://purl.stanford.edu/db586ns4974). This archive provides a wealth of data on educational achievement across grade levels and academic years, and is supported with a vast array of socioeconomic indicators that can be used for deeper analysis.
Our initial steps to analyze and visualize the data will begin with Microsoft Excel for data prep (use your tool of choice) before moving on to Exploratory and Trelliscope for visual analysis of the data. Each of these powerful tools are based on R, the powerful open source statistical framework that will facilitate multiple analytical paths. Exploratory has its own GUI that uses many of R's most powerful analytic packages, while Trelliscope will be employed from within RStudio.
Here's a link to Exploratory: