Stone Village

Data Blog

DataXploits Jeff Jenkins
Jeff Jenkins MBA, Data Scientist, Founder of DataXploits;  

Topics to be featured:

Data Science and Analysis

Data Visualizations

Social Listening / Monitoring

See something cool? Something you can use?

   Find the R code at the github link at bottom of page. Or,

   Data Analytics services are available through

Straight Data on Federal Spending - July 2016 post

Have you heard political arguments about the preference of one party or the other on social spending -vs- defense spending? May be interesting to look at some data! Each blog post is (aims to be) around interesting data sets or twists on visualizations or interpretation. Here are some visualizations of data on tax/deficit levels, sources of US gov't revenue and expenditures.

- first a quick sidenote on levels of US taxation on Gross Domestic Product (GDP).

Fed'l tax receipts lagged GDP growth (output to be taxed) by 8.65% 1961 - 2015

(US OMB & World Bank data)
Note the trend in tax receipts growth showing similar variance but with much greater magnitude than before 2000. Reverting lately to an earlier normal.
DataXploits TaxSources

Sources of Federal Revenue. Figures are a % of total US revenue.

DataXploits TaxSources

Spending levels, by US Office of Management and Budget.

Spend categories shown as used by US government.

Note some large shifts - Social Security starts in the 1940s and continues to expand, Medicare begins early 1960's.
DataXploits TaxReceipts percentage

Federal Tax Receipts, % by Source

Federal Spending with Social Security netted out, % by Category

Looking at the above chart of Federal Spending, I thought I had a good picture of the split between defense/social spending categories

Clearly Defense spending as a % of Total Federal spending is declining, Social spending is increasing.

The scale of the spending was equalized across the long time period by using % of Federal Spending. Perfectly valid way to avoid adjustments in the data for inflation...but it occurred to me that large changes in the spending allocations (rise of Social Security as a source of tax collections and largest single outlay) might be skewing the message.

Looking at this unique category, I see that inflow/outflow for Social Security money is nearly a wash in the federal budget, only slightly more is paid out than is collected each year. If I just pull Social Security out of the data, the net impact being very small...what is the picture like?

Federal Spending - Social Security excluded

(excluded because there is nearly equivalent offsetting revenue for US govt so ins/outs become 'noise' in this analysis)
DataXploits FedlOutlay_netted

This better description of Spending shows the scale/relationship of spending categories with less distortion

Defense spending has come down to 26% of total (non-Social Security) govt spending, down from a staggering percentage in earlier decades.

Social spending continues expanding and in 2015 was about 50% of US govt spending. 1993 was the turning point when Social spending surpassed Defense spending as a % of US tax receipts. Some amount of Medicare expense should be subtracted from this 50%, the offsetting income was obscured.

I think further digging into sub-categories of gov't spending will show that the real driver of the expansion in Social spending is cost of medical care.

Map Your Data for Valuable Insight - May 2016 post

Data sets of public interest are increasingly available for download. Roughly 22 police departments across the country have so far started posting Crime data for public usage.

Mapping is such a powerful visualization tool that it frequently brings out insight that would be hard to access otherwise. This example uses Crime Report data from Austin Texas, but the same approach applied to your business data - Sales, Revenue, Addressable Market, etc. - can enable valuable understanding. Two types of maps are shown - all data in one map and faceted, the two reporting years separate. These map arrangements compare two points in time...other categories and more time periods can be depicted; area included in the maps can be zoomed in or scaled outward to include relevant geographic areas.

This data is a sampling of the Austin Crime with so many data sets, clean up is necessary Only ~8% of entries in the raw data have geo coordinates. Other important factors must usually be accomodated; for example, from 2008 to 2015 (years whose stats are available), Austin had substantial population growth. Further analysis with this data set might call for some processing such as converting instances to crimes per 100,000 residents (cluster in the center of map is most population-dense sector). Another caveat is that the classification method for sexual assaults changed during this period, making 2008 stats not totally comparable to 2015.

Using R, I have categorized a sampling of the crime reports; and filtered to one crime type - Drugs. Next post, I'll be exploring a method to generate a dashboard that allows crime type to be selected and (hopefully) allows map to scale at the push of a button.

I'm generating this visualization using R. Another great tool for generating map visualizations (or a wide range of other useful visualizations) is Tableau. Tableau is easy to use but medium expensive; on the other hand, R is free but does require some investment of time to build capability.

DataXploits ATX crime DataXploits ATX crime

Seeing something cool? Something you can use? You can find the code at the github link below. Or,

Data Analytics services are available through

DataXploits code to reproduce visualizations

 Social Monitoring
-Customer Sentiment appearing in social media
-Twitter, Facebook (public posts)
-Measure engagement generated by your social marketing campaigns

  web services by:
  Stone Village
  contact us