Quickstart¶

Installation¶

To install the thermostat package for the first time, we highly recommend that you create a virtual environment or a conda environment in which to install it. You may choose to skip this step, but do so at the risk of corrupting your existing python environment. Isolating your python environment will also make it easier to debug.

# if using virtualenvwrapper (see https://virtualenvwrapper.readthedocs.org/en/latest/install.html)
$ mkvirtualenv thermostat
(thermostat)$ pip install thermostat

# if using conda (see note below - conda is distributed with Anaconda)
$ conda create --yes --name thermostat pandas
(thermostat)$ pip install thermostat

If you already have an environment, use the following:

# if using virtualenvwrapper
$ workon thermostat
(thermostat)$

# if using conda
$ source activate thermostat
(thermostat)$

To deactivate the environment when you’ve finished, use the following:

# if using virtualenvwrapper
(thermostat)$ deactivate
$

# if using conda
(thermostat)$ source deactivate
$

Check to make sure you are on the most recent version of the package.

>>> import thermostat; thermostat.get_version()
'1.0.0'

If you are not on the correct version, you should upgrade:

$ pip install thermostat --upgrade

The command above will update dependencies as well. If you wish to skip this, use the --no-deps flag:

$ pip install thermostat --upgrade --no-deps

Previous versions of the package are available on github.

Note

If you experience issues installing python packages with C extensions, such as numpy or scipy, we recommend installing and using the free Anaconda Python distribution by Continuum Analytics. It contains many of the numeric and scientific packages used by this package and has installers for Python 2.7 and 3.5 for Windows, Mac OS X and Linux.

Once you have verified a correct installation, import the necessary methods and set a directory for finding and storing data.

Note

If you suspect a package version conflict or error, you can verify the versions of the packages you have installed against the package versions in thermostatreqnotes.txt.

To list your package versions, use:

$ pip freeze

or (if you’re using Anaconda):

$ conda list

Script setup and imports¶

Import the few built-in python packages and methods we will be using in this tutorial as follows.

import sys
import os
import warnings
from os.path import expanduser

Also make sure to import the methods we will be using from the thermostat package.

from thermostat.importers import from_csv
from thermostat.exporters import metrics_to_csv
from thermostat.stats import compute_summary_statistics
from thermostat.stats import summary_statistics_to_csv

Set the data_dir variable as a convenience. We will refer to this directory and save our results in it. You should also move all downloaded and extracted files used in this tutorial into this directory before using them. You may, of course, choose to use a different directory, which you can set here, or override it entirely by replacing it where it appears in the tutorial.

data_dir = os.path.join(expanduser("~"), "thermostat_tutorial")
# or data_dir = "/full/path/to/custom/directory/"

Optional Setup¶

If you wish to follow the progress of downloading and caching external weather files, which will be the most time-consuming portion of this tutorial, you may wish at this point to configure logging. The example here will work within most ipython or script environments. If you have a more complicated logging setup, you may need to use something other than the root logger, which this uses.

import logging
logger = logging.getLogger()
logger.setLevel(logging.DEBUG)

Note

The thermostat package depends on the eemeter package for weather data fetching. The eemeter package automatically creates its own cache directory in which it keeps cached versions of weather source data. This speeds up the (generally I/O bound) NOAA weather fetching routine on subsequent internal calls to fetch the same weather data (i.e. getting outdoor temperature data for thermostats that map to the same weather station).

For more information, see the eemeter package.

Note

US Census Bureau ZIP Code Tabulation Areas (ZCTA) are used to map USPS ZIP codes to outdoor temperature data. If the automatic mapping is unsuccessful for one or more of the ZIP codes in your dataset, the reason is likely to be the discrepancy between “true” USPS ZIP codes and the US Census Bureau ZCTAs. “True” ZIP codes are not used because they do not always map well to location (for example, ZIP codes for P.O. boxes). You may need to first map ZIP codes to ZCTAs, or these thermostats will be skipped. There are roughly 32,000 ZCTAs and roughly 42000 ZIP codes - many fewer ZCTAs than ZIP codes.

Computing individual thermostat-season metrics¶

After importing the package methods, load the example thermostat data, or provide data of your own. See Input data for more detailed file format information.

Fabricated example data from 35 thermostats in various climate zones, is available for download here.

Loading the thermostat data below will take more than a few minutes, even if the weather cache is enabled (see note above). This is because loading thermostat data involves downloading hourly weather data from a remote source - in this case, the NCDC.

The following loads an lazy iterator over the thermostats. The thermostats will be loaded into memory as necessary in the following steps.

metadata_filename = os.path.join(data_dir, "examples/metadata.csv")
thermostats = from_csv(metadata_filename, verbose=True)

To calculate savings metrics, iterate through thermostats and save the results. Uncomment the commented lines if you would like to store the thermostats in memory for inspection. Note that this could eat up your application memory and is only recommended for debugging purposes.

metrics = []
# saved_thermostats = []
for thermostat in thermostats:
    outputs = thermostat.calculate_epa_field_savings_metrics()
    metrics.extend(outputs)
    # saved_thermostats.append(thermostat)

The single-thermostat metrics should be output to CSV and converted to dataframe format.

output_filename = os.path.join(data_dir, "thermostat_example_output.csv")
metrics_df = metrics_to_csv(metrics, output_filename)

The output CSV will be saved in your data directory and should very nearly match the output CSV provided in the example data.

See Output data for more detailed file format information.

Computing summary statistics¶

Once you have obtained output for each individual thermostat in your dataset, use the stats module to compute summary statistics, which are formatted for submission to the EPA. The example below works with the output file from the tutorial above and can be modified to use your data.

Compute statistics across all thermostats.

# uses the metrics_df created in the Quickstart above.
with warnings.catch_warnings():
    warnings.simplefilter("ignore")

    # uses the metrics_df created in the quickstart above.
    stats = compute_summary_statistics(metrics_df)

    # If you want to have advanced filter outputs, use this instead
    # stats_advanced = compute_summary_statistics(metrics_df, advanced_filtering=True)

Save these results to file.

Each row of the saved CSV will represent one type of output, with one row per statistic per output. Each column in the CSV will represent one subset of thermostats, as determined by grouping by EIC climate zone and applying various filtering methods. National weighted averages will be available near the top of the file.

At this point, you will also need to provide an alphanumeric product identifier for the connected thermostat; e.g. a combination of the connected thermostat service plus one or more connected thermostat device models that comprises the data set.

product_id = "INSERT ALPHANUMERIC PRODUCT ID HERE"
stats_filepath = os.path.join(data_dir, "thermostat_example_stats.csv")
stats_df = summary_statistics_to_csv(stats, stats_filepath, product_id)

# or with advanced filter outputs
# stats_advanced_filepath = os.path.join(data_dir, "thermostat_example_stats_advanced.csv")
# stats_advanced_df = summary_statistics_to_csv(stats_advanced, stats_advanced_filepath, product_id)

National savings are computed by weighted average of percent savings results grouped by climate zone. Heavier weights are applied to results in climate zones which, regionally, tend to have longer runtimes. Weightings used are available for download.

More information¶

For additional information on package usage, please see the API documentation.

Input data¶

Input data should be specified using the following formats. One CSV should specify thermostat summary metadata (e.g. unique identifiers, location, etc.). Another CSV (or CSVs) should contain runtime information, linked to the metadata csv by the thermostat_id column.

Example files here.

Thermostat Summary Metadata CSV format¶

Columns¶

Name	Data Format	Units	Description
`thermostat_id`	string	N/A	A uniquely identifying marker for the thermostat.
`equipment_type`	enum, {0..5}	N/A	The type of controlled HVAC heating and cooling equipment. [1]
`zipcode`	string, 5 digits	N/A	The ZIP code in which the thermostat is installed [2].
`utc_offset`	string	N/A	The UTC offset of the times in the corresponding interval data CSV. (e.g. “-0700”)
`interval_data_filename`	string	N/A	The filename of the interval data file corresponding to this thermostat. Should be specified relative to the location of the metadata file.

Each row should correspond to a single thermostat.

Nulls should be specified by leaving the field blank.

All interval data for a particular thermostat should use the same, single UTC offset provided in the metadata file.

Thermostat Interval Data CSV format¶

Columns¶

Name	Data Format	Units	Description
`thermostat_id`	string	N/A	Uniquely identifying marker for the thermostat.
`date`	YYYY-MM-DD (ISO-8601)	N/A	Date of this set of readings.
`cool_runtime`	decimal or integer	minutes	Daily runtime of cooling equipment.
`heat_runtime`	decimal or integer	minutes	Daily runtime of heating equipment. [3]
`auxiliary_heat_HH`	decimal or integer	minutes	Hourly runtime of auxiliary heat equipment (HH=00-23).
`emergency_heat_HH`	decimal or integer	minutes	Hourly runtime of emergency heat equipment (HH=00-23).
`temp_in_HH`	decimal, to nearest 0.5	°F	Hourly average conditioned space temperature over the period of the reading (HH=00-23).
`heating_setpoint_HH`	decimal, to nearest 0.5	°F	Hourly average thermostat setpoint temperature over the period of the reading (HH=00-23).
`cooling_setpoint_HH`	decimal, to nearest 0.5	°F	Hourly average thermostat setpoint temperature over the period of the reading (HH=00-23).

Each row should correspond to a single daily reading from a thermostat.
Nulls should be specified by leaving the field blank.
Zero values should be specified as 0, rather than as blank.
If data is missing for a particular row of one column, data should still be provided for other columns in that row. For example, if runtime is missing for a particular date, please still provide indoor conditioned space temperature and setpoints for that date, if available.
Runtimes should be less than or equal to 1440 min (1 day).
Dates should be specified in the ISO 8601 date format (e.g. 2015-05-19).
All temperatures should be specified in °F (to the nearest 0.5°F).
If no distinction is made between heating and cooling setpoint, set both equal to the single setpoint.
All runtime data MUST have the same UTC offset, as provided in the corresponding metadata file.
If only a single setpoint is used for the thermostat, please copy the same setpoint data in to the heating and cooling setpoint columns.
Outdoor temperature data need not be provided - it will be fetched automatically from NCDC using the eemeter package package.
Dates should be consecutive.

[1]

Options for equipment_type:

0: Other – e.g. multi-zone multi-stage, modulating. Note: module will not output savings data for this type.
1: Single stage heat pump with electric resistance aux and/or emergency heat (i.e., strip heat)
2: Single stage heat pump without additional and/or supplemental heating sources (excludes aux/emergency heat as well as dual fuel systems, i.e., heat pump plus gas- or oil-fired furnace)
3: Single stage non heat pump with single-stage central air conditioning
4: Single stage non heat pump without central air conditioning
5: Single stage central air conditioning without central heating

[2]

Will be used for matching with a weather station that provides external dry-bulb temperature data. This temperature data will be used to determine the bounds of the heating and cooling season over which metrics will be computed. For more information on the mapping between ZIP codes and weather stations, please see eemeter.weather.location.

[3]	Should not include runtime for auxiliary or emergency heat - this should be provided separately in the columns emergency_heat_HH and auxiliary_heat_HH.

Output data¶

Individual thermostat-season¶

The following columns are a intermediate output generated for each thermostat-season.

Columns¶

Name	Data Format	Units	Description
General outputs
`sw_version`	string	N/A	Software version.
`ct_identifier`	string	N/A	Identifier for thermostat as provided in the metadata file.
`equipment_type`	enum {0..5}	N/A	Equipment type of this thermostat (1, 2, 3, 4, or 5).
`heating_or_cooling`	string	N/A	Label for the core day set (e.g. ‘heating_2012-2013’).
`zipcode`	string, 5 digits	N/A	ZIP code provided in the metadata file.
`station`	string, USAF ID	N/A	USAF identifier for station used to fetch hourly temperature data.
`climate_zone`	string	N/A	EIC climate zone (consolidated).
`start_date`	date	ISO-8601	Earliest date in input file.
`end_zone`	date	ISO-8601	Latest date in input file.
`n_days_both_heating_and_cooling`	integer	# days	Number of days not included as core days due to presence of both heating and cooling.
`n_days_insufficient_data`	integer	# days	Number of days not included as core days due to missing data.
`n_core_cooling_days`	integer	# days	Number of days meeting criteria for inclusion in core cooling day set.
`n_core_heating_days`	integer	# days	Number of days meeting criteria for inclusion in core heating day set.
`n_days_in_inputfile_date_range`	integer	# days	Number of potential days in inputfile date range.
`baseline10_core_cooling_comfort_temperature`	float	°F	Baseline comfort temperature as determined by 10th percentile of indoor temperatures.
`baseline90_core_cooling_comfort_temperature`	float	°F	Baseline comfort temperature as determined by 90th percentile of indoor temperatures.
`regional_average_baseline_cooling_comfort_temperature`	float	°F	Baseline comfort temperature as determined by regional average.
`regional_average_baseline_heating_comfort_temperature`	float	°F	Baseline comfort temperature as determined by regional average.
Model outputs
`percent_savings_baseline_percentile`	float	percent	Percent savings as given by hourly average CTD or HTD method with 10th or 90th percentile baseline
`avoided_daily_mean_core_day_runtime_baseline_percentile`	float	minutes	Avoided average daily runtime for core cooling days
`avoided_total_core_day_runtime_baseline_percentile`	float	minutes	Avoided total runtime for core cooling days
`baseline_daily_mean_core_day_runtime_baseline_percentile`	float	minutes	Baseline average daily runtime for core cooling days
`baseline_total_core_day_runtime_baseline_percentile`	float	minutes	Baseline total runtime for core cooling days
`percent_savings_baseline_regional`	float	percent	Percent savings as given by hourly average CTD or HTD method with 10th or 90th percentile regional baseline
`avoided_daily_mean_core_day_runtime_baseline_regional`	float	minutes	Avoided average daily runtime for core cooling days
`avoided_total_core_day_runtime_baseline_regional`	float	minutes	Avoided total runtime for core cooling days
`baseline_daily_mean_core_day_runtime_baseline_regional`	float	minutes	Baseline average daily runtime for core cooling days
`baseline_total_core_day_runtime_baseline_regional`	float	minutes	Baseline total runtime for core cooling days
`mean_demand`	float	°F	Average cooling demand
`alpha`	float	minutes/Δ°F	The fitted slope of cooling runtime to demand regression
`tau`	float	°F	The fitted intercept of cooling runtime to demand regression
`mean_sq_err`	float	N/A	Mean squared error of regression
`root_mean_sq_err`	float	N/A	Root mean squared error of regression
`cv_root_mean_sq_err`	float	N/A	Coefficient of variation of root mean squared error of regression
`mean_abs_err`	float	N/A	Mean absolute error
`mean_abs_pct_err`	float	N/A	Mean absolute percent error
Runtime outputs
`total_core_cooling_runtime`	float	minutes	Total core cooling equipment runtime
`total_core_heating_runtime`	float	minutes	Total core heating equipment runtime
`total_auxiliary_heating_core_day_runtime`	float	minutes	Total core auxiliary heating equipment runtime
`total_emergency_heating_core_day_runtime`	float	minutes	Total core emergency heating equipment runtime
`daily_mean_core_cooling_runtime`	float	minutes	Average daily core cooling runtime
`daily_mean_core_heating_runtime`	float	minutes	Average daily core cooling runtime
Resistance heat outputs
`rhu_00F_to_05F`	decmial	0.0=0%, 1.0=100%	Resistance heat utilization for hourly temperature bin \(0 \leq T_{out} < 5\)
`rhu_05F_to_10F`	decmial	0.0=0%, 1.0=100%	Resistance heat utilization for hourly temperature bin \(5 \leq T_{out} < 10\)
`rhu_10F_to_15F`	decmial	0.0=0%, 1.0=100%	Resistance heat utilization for hourly temperature bin \(10 \leq T_{out} < 15\)
`rhu_15F_to_20F`	decmial	0.0=0%, 1.0=100%	Resistance heat utilization for hourly temperature bin \(15 \leq T_{out} < 20\)
`rhu_20F_to_25F`	decmial	0.0=0%, 1.0=100%	Resistance heat utilization for hourly temperature bin \(20 \leq T_{out} < 25\)
`rhu_25F_to_30F`	decmial	0.0=0%, 1.0=100%	Resistance heat utilization for hourly temperature bin \(25 \leq T_{out} < 30\)
`rhu_30F_to_35F`	decmial	0.0=0%, 1.0=100%	Resistance heat utilization for hourly temperature bin \(30 \leq T_{out} < 35\)
`rhu_35F_to_40F`	decmial	0.0=0%, 1.0=100%	Resistance heat utilization for hourly temperature bin \(35 \leq T_{out} < 40\)
`rhu_40F_to_45F`	decmial	0.0=0%, 1.0=100%	Resistance heat utilization for hourly temperature bin \(40 \leq T_{out} < 45\)
`rhu_45F_to_50F`	decmial	0.0=0%, 1.0=100%	Resistance heat utilization for hourly temperature bin \(45 \leq T_{out} < 50\)
`rhu_50F_to_55F`	decmial	0.0=0%, 1.0=100%	Resistance heat utilization for hourly temperature bin \(50 \leq T_{out} < 55\)
`rhu_55F_to_60F`	decmial	0.0=0%, 1.0=100%	Resistance heat utilization for hourly temperature bin \(55 \leq T_{out} < 60\)

Summary Statistics¶

For each real- or integer-valued column (“###”) from the individual thermostat-season output, the following summary statistics are generated.

(For readability, these columns are actually rows.)

Columns¶

Name	Description
`###_n`	Number of samples
`###_upper_bound_95_perc_conf`	95% confidence upper bound on mean value
`###_mean`	Mean value
`###_lower_bound_95_perc_conf`	95% confidence lower bound on mean value
`###_sem`	Standard error of the mean
`###_10q`	1st decile (10th percentile, q=quantile)
`###_20q`	2nd decile
`###_30q`	3rd decile
`###_40q`	4th decile
`###_50q`	5th decile
`###_60q`	6th decile
`###_70q`	7th decile
`###_80q`	8th decile
`###_90q`	9th decile

The following general columns are also output:

Columns¶

Name	Description
`sw_version`	Software version
`product_id`	Alphanumeric product identifier
`n_thermostat_core_day_sets_total`	Number of relevant rows from thermostat module output before filtering
`n_thermostat_core_day_sets_kept`	Number of relevant rows from thermostat module not filtered out
`n_thermostat_core_day_sets_discarded`	Number of relevant rows from thermostat module filtered out

The following national weighted percent savings columns are also available.

National savings are computed by weighted average of percent savings results grouped by climate zone. Heavier weights are applied to results in climate zones which, regionally, tend to have longer runtimes. Weightings used are available for download.

Columns¶

Name	Description
`percent_savings_baseline_percentile_mean_national_weighted_mean`	National weighted mean percent savings as given by baseline_percentile method.
`percent_savings_baseline_percentile_q10_national_weighted_mean`	National weighted 10th percentile percent savings as given by baseline_percentile method.
`percent_savings_baseline_percentile_q20_national_weighted_mean`	National weighted 20th percentile percent savings as given by baseline_percentile method.
`percent_savings_baseline_percentile_q30_national_weighted_mean`	National weighted 30th percentile percent savings as given by baseline_percentile method.
`percent_savings_baseline_percentile_q40_national_weighted_mean`	National weighted 40th percentile percent savings as given by baseline_percentile method.
`percent_savings_baseline_percentile_q50_national_weighted_mean`	National weighted 50th percentile percent savings as given by baseline_percentile method.
`percent_savings_baseline_percentile_q60_national_weighted_mean`	National weighted 60th percentile percent savings as given by baseline_percentile method.
`percent_savings_baseline_percentile_q70_national_weighted_mean`	National weighted 70th percentile percent savings as given by baseline_percentile method.
`percent_savings_baseline_percentile_q80_national_weighted_mean`	National weighted 80th percentile percent savings as given by baseline_percentile method.
`percent_savings_baseline_percentile_q90_national_weighted_mean`	National weighted 90th percentile percent savings as given by baseline_percentile method.
`percent_savings_baseline_percentile_lower_bound_95_perc_conf_national_weighted_mean`	National weighted mean percent savings lower bound as given by a 95% confidence interval and the baseline_percentile method.
`percent_savings_baseline_percentile_upper_bound_95_perc_conf_national_weighted_mean`	National weighted mean percent savings upper bound as given by a 95% confidence interval and the baseline_percentile method.
`percent_savings_baseline_regional_mean_national_weighted_mean`	National weighted mean percent savings as given by baseline_regional method.
`percent_savings_baseline_regional_q10_national_weighted_mean`	National weighted 10th percentile percent savings as given by baseline_regional method.
`percent_savings_baseline_regional_q20_national_weighted_mean`	National weighted 20th percentile percent savings as given by baseline_regional method.
`percent_savings_baseline_regional_q30_national_weighted_mean`	National weighted 30th percentile percent savings as given by baseline_regional method.
`percent_savings_baseline_regional_q40_national_weighted_mean`	National weighted 40th percentile percent savings as given by baseline_regional method.
`percent_savings_baseline_regional_q50_national_weighted_mean`	National weighted 50th percentile percent savings as given by baseline_regional method.
`percent_savings_baseline_regional_q60_national_weighted_mean`	National weighted 60th percentile percent savings as given by baseline_regional method.
`percent_savings_baseline_regional_q70_national_weighted_mean`	National weighted 70th percentile percent savings as given by baseline_regional method.
`percent_savings_baseline_regional_q80_national_weighted_mean`	National weighted 80th percentile percent savings as given by baseline_regional method.
`percent_savings_baseline_regional_q90_national_weighted_mean`	National weighted 90th percentile percent savings as given by baseline_regional method.
`percent_savings_baseline_regional_lower_bound_95_perc_conf_national_weighted_mean`	National weighted mean percent savings lower bound as given by a 95% confidence interval and the baseline_regional method.
`percent_savings_baseline_regional_upper_bound_95_perc_conf_national_weighted_mean`	National weighted mean percent savings upper bound as given by a 95% confidence interval and the baseline_regional method.

Quickstart¶

Installation¶

Script setup and imports¶

Optional Setup¶

Computing individual thermostat-season metrics¶

Computing summary statistics¶

More information¶

Input data¶

Thermostat Summary Metadata CSV format¶

Columns¶

Thermostat Interval Data CSV format¶

Columns¶

Output data¶

Individual thermostat-season¶

Columns¶

Summary Statistics¶

Columns¶

Columns¶

Columns¶

Table Of Contents

Related Topics

This Page