Solar Installations in the United States


Sarah Unbehaun
May 2018

Why look at solar data?

Looking at a map of solar installations across the U.S., the distribution of solar installations in the U.S. appears to vary more than we might expect based only on solar irradiation or population centers.

In [12]:
us_plot
Out[12]:

The Data

Data by county on:

  • Solar installations (NREL Open PV)
  • Direct normal irradiation, PV-suitable small buildings (NSRDB)
  • Electricity prices and utility territories (EIA)
  • Population and median income (Census)
  • 2004-2012 presidential elections (USGS, via Hello Word Data)

Feature Importance - Random Forest regression

With solar installations per 1000 small buildings as the dependent variable, most variation is still explained by basic geographic and population features.

Top predictors:

  • Direct Normal Irradiance (DNI)
  • Population density
  • Number of small buildings

Other predictors:

  • Voting percentages (democratic or other, 2004, 2008, and 2012)
  • Median income
  • Installation cost per watt
  • Electricity price
  • Percent of small buildings suitable for solar


$R^{2}$ on test data = .64

The following slides present maps to inspect these data more close for two of the states with the highest number of solar installations: California and Massachusetts. The color scale is a log of the number of solar installations. Hover over each county to learn more about its other features. Below each map, there is also a timeseries graph showing installations per month in each state The dataset only reliably included solar installations through 2015*, so the number of installations was predicted through the middle of 2017 using an ARIMA model.

* Although anyone can contribute to the Open PV database and it claims to be "real time", the vast majority of contributions have been made by NREL themselves after cleaning data for their Tracking the Sun report (data last updated 2016) or by energy companies, consultants, or utilities, which may only report data periodicially.

In [145]:
show(c)
In [146]:
show(ca)
In [12]:
show(m)
In [13]:
show(ma)

Conclusions


A large part of the variation in solar installations seems to be explained by solar irradiation, population, and the number of buildings. However, the voting percentages were a crude proxy for political sentiment (and by extension attitudes towards renewable energy) and the likelihood of solar installation incentives in a county. A better analysis could be carried out using more detailed information about renewable energy sentiment (which may not be available at a county level for the entire country) and solar incentives, for example by quantifying the information contained in DSIRE, the Database of State Incentives for Renewables and Efficiency.