API

thermostat.importers

thermostat.importers.from_csv(metadata_filename, verbose=False)[source]

Creates Thermostat objects from data stored in CSV files.

Parameters:
  • metadata_filename (str) – Path to a file containing the thermostat metadata.
  • verbose (boolean) – Set to True to output a more detailed log of import activity.
Returns:

thermostats – Thermostats imported from the given CSV input files.

Return type:

iterator over thermostat.Thermostat objects

thermostat.importers.get_single_thermostat(thermostat_id, zipcode, equipment_type, utc_offset, interval_data_filename)[source]

Load a single thermostat directly from an interval data file.

Parameters:
  • thermostat_id (str) – A unique identifier for the thermostat.
  • zipcode (str) – The zipcode of the thermostat, e.g. “01234”.
  • equipment_type (str) – The equipment type of the thermostat.
  • utc_offset (str) – A string representing the UTC offset of the interval data, e.g. “-0700”. Could also be “Z” (UTC), or just “+7” (equivalent to “+0700”), or any other timezone format recognized by the library method dateutil.parser.parse.
  • interval_data_filename (str) – The path to the CSV in which the interval data is stored.
Returns:

thermostat – The loaded thermostat object.

Return type:

thermostat.Thermostat

thermostat.exporters

thermostat.exporters.metrics_to_csv(metrics, filepath)[source]

Writes metrics outputs to the file specified.

Parameters:
  • metrics (list of dict) – list of outputs from the function thermostat.calculate_epa_draft_rccs_field_savings_metrics()
  • filepath (str) – filepath specification for location of output CSV file.
Returns:

df – DataFrame containing data output to CSV.

Return type:

pd.DataFrame

thermostat.core

class thermostat.core.CoreDaySet(name, daily, hourly, start_date, end_date)

Bases: tuple

__getnewargs__()

Return self as a plain tuple. Used by copy and pickle.

__getstate__()

Exclude the OrderedDict from pickling

__repr__()

Return a nicely formatted representation string

daily

Alias for field number 1

end_date

Alias for field number 4

hourly

Alias for field number 2

name

Alias for field number 0

start_date

Alias for field number 3

class thermostat.core.Thermostat(thermostat_id, equipment_type, zipcode, station, temperature_in, temperature_out, cooling_setpoint, heating_setpoint, cool_runtime, heat_runtime, auxiliary_heat_runtime, emergency_heat_runtime)[source]

Bases: object

Main thermostat data container. Each parameter which contains timeseries data should be a pandas.Series with a datetimeIndex, and that each index should be equivalent.

Parameters:
  • thermostat_id (object) – An identifier for the thermostat. Can be anything, but should be identifying (e.g., an ID provided by the manufacturer).
  • equipment_type ({ 0, 1, 2, 3, 4, 5 }) –
    • 0: Other - e.g. multi-zone multi-stage, modulating. Note: module will not output savings data for this type.
    • 1: Single stage heat pump with aux and/or emergency heat
    • 2: Single stage heat pump without aux or emergency heat
    • 3: Single stage non heat pump with single-stage central air conditioning
    • 4: Single stage non heat pump without central air conditioning
    • 5: Single stage central air conditioning without central heating
  • zipcode (str) – Installation ZIP code for the thermostat.
  • station (str) – USAF identifier for weather station used to pull outdoor temperature data.
  • temperature_in (pandas.Series) – Contains internal temperature data in degrees Fahrenheit (F), with resolution of at least 0.5F. Should be indexed by a pandas.DatetimeIndex with hourly frequency (i.e. freq='H').
  • heating_setpoint (pandas.Series) – Contains target temperature (setpoint) data in degrees Fahrenheit (F), with resolution of at least 0.5F used to control heating equipment. Should be indexed by a pandas.DatetimeIndex with hourly frequency (i.e. freq='H').
  • cooling_setpoint (pandas.Series) – Contains target temperature (setpoint) data in degrees Fahrenheit (F), with resolution of at least 0.5F used to control cooling equipment. Should be indexed by a pandas.DatetimeIndex with hourly frequency (i.e. freq='H').
  • temperature_out (pandas.Series) – Contains outdoor temperature (setpoint) data as observed by a relevant weather station in degrees Fahrenheit (F), with resolution of at least 0.5F. Should be indexed by a pandas.DatetimeIndex with hourly frequency (i.e. freq='H').
  • cool_runtime (pandas.Series,) – Daily runtimes for cooling equipment controlled by the thermostat, measured in minutes. No datapoint should exceed 1440 mins, which would indicate over a day of runtime (impossible). Should be indexed by a pandas.DatetimeIndex with daily frequency (i.e. freq='D').
  • heat_runtime (pandas.Series,) – Daily runtimes for heating equipment controlled by the thermostat, measured in minutes. No datapoint should exceed 1440 mins, which would indicate over a day of runtime (impossible). Should be indexed by a pandas.DatetimeIndex with daily frequency (i.e. freq='D').
  • auxiliary_heat_runtime (pandas.Series,) – Hourly runtimes for auxiliary heating equipment controlled by the thermostat, measured in minutes. Auxiliary heat runtime is counted when both resistance heating and the compressor are running (for heat pump systems). No datapoint should exceed 60 mins, which would indicate over a hour of runtime (impossible). Should be indexed by a pandas.DatetimeIndex with hourly frequency (i.e. freq='H').
  • energency_heat_runtime (pandas.Series,) – Hourly runtimes for emergency heating equipment controlled by the thermostat, measured in minutes. Emergency heat runtime is counted when resistance heating is running when the compressor is not (for heat pump systems). No datapoint should exceed 60 mins, which would indicate over a hour of runtime (impossible). Should be indexed by a pandas.DatetimeIndex with hourly frequency (i.e. freq='H').
calculate_epa_field_savings_metrics(core_cooling_day_set_method='entire_dataset', core_heating_day_set_method='entire_dataset', climate_zone_mapping=None)[source]

Calculates metrics for connected thermostat savings as defined by the specification defined by the EPA Energy Star program and stakeholders.

Parameters:
  • core_cooling_day_set_method ({"entire_dataset", "year_end_to_end"}, default: "entire_dataset") –

    Method by which to find core cooling day sets.

    • “entire_dataset”: all core cooling days in dataset (days with >= 1 hour of cooling runtime and no heating runtime.
    • “year_end_to_end”: groups all core cooling days (days with >= 1 hour of total cooling and no heating) from January 1 to December 31 into independent core cooling day sets.
  • core_heating_day_set_method ({"entire_dataset", "year_mid_to_mid"}, default: "entire_dataset") –

    Method by which to find core heating day sets.

    • “entire_dataset”: all core heating days in dataset (days with >= 1 hour of heating runtime and no cooling runtime.
    • “year_mid_to_mid”: groups all core heating days (days with >= 1 hour of total heating and no cooling) from July 1 to June 30 into independent core heating day sets.
  • climate_zone_mapping (filename, default: None) –

    A mapping from climate zone to zipcode. If None is provided, uses default zipcode to climate zone mapping provided in tutorial.

    default mapping

Returns:

metrics – list of dictionaries of output metrics; one per set of core heating or cooling days.

Return type:

list

get_baseline_cooling_demand(core_cooling_day_set, temp_baseline, tau)[source]

Calculate baseline cooling demand for a particular core cooling day set and fitted physical parameters.

\(\text{daily CTD base}_d = \frac{\sum_{i=1}^{24} [\tau_c - \text{hourly } \Delta T \text{ base cool}_{d.n}]_{+}}{24}\), where

\(\text{hourly } \Delta T \text{ base cool}_{d.n} (^{\circ} F) = \text{base heat} T_{d.n} - \text{hourly outdoor} T_{d.n}\), and

\(d\) is the core cooling day; \(\left(001, 002, 003 ... x \right)\),

\(n\) is the hour; \(\left(01, 02, 03 ... 24 \right)\),

\(\tau_c\) (cooling), determined earlier, is a constant that is part of the CT/home’s thermal/HVAC cooling run time model, and

\([]_{+}\) indicates that the term is zero if its value would be negative.

Parameters:
  • core_cooling_day_set (thermostat.core.CoreDaySet) – Core cooling days over which to calculate baseline cooling demand.
  • temp_baseline (float) – Baseline comfort temperature
  • tau (float, default: None) – From fitted demand model.
Returns:

baseline_cooling_demand – A series containing baseline daily heating demand for the core cooling day set.

Return type:

pandas.Series

get_baseline_cooling_runtime(baseline_cooling_demand, alpha)[source]

Calculate baseline cooling runtime given baseline cooling demand and fitted physical parameters.

\(RT_{\text{base cool}} (\text{minutes}) = \alpha_c \cdot \text{daily CTD base}_d\)

Parameters:
  • baseline_cooling_demand (pandas.Series) – A series containing estimated daily baseline cooling demand.
  • alpha (float) – Slope of fitted line
Returns:

baseline_cooling_runtime – A series containing estimated daily baseline cooling runtime.

Return type:

pandas.Series

get_baseline_heating_demand(core_heating_day_set, temp_baseline, tau)[source]

Calculate baseline heating demand for a particular core heating day set and fitted physical parameters.

\(\text{daily HTD base}_d = \frac{\sum_{i=1}^{24} [\text{hourly } \Delta T \text{ base heat}_{d.n} - \tau_h]_{+}}{24}\), where

\(\text{hourly } \Delta T \text{ base heat}_{d.n} (^{\circ} F) = \text{base cool} T_{d.n} - \text{hourly outdoor} T_{d.n}\), and

\(d\) is the core heating day; \(\left(001, 002, 003 ... x \right)\),

\(n\) is the hour; \(\left(01, 02, 03 ... 24 \right)\),

\(\tau_h\) (heating), determined earlier, is a constant that is part of the CT/home’s thermal/HVAC heating run time model, and

\([]_{+}\) indicates that the term is zero if its value would be negative.

Parameters:
  • core_heating_day_set (thermostat.core.CoreDaySet) – Core heating days over which to calculate baseline cooling demand.
  • temp_baseline (float) – Baseline comfort temperature
  • tau (float, default: None) – From fitted demand model.
Returns:

baseline_heating_demand – A series containing baseline daily heating demand for the core heating days.

Return type:

pandas.Series

get_baseline_heating_runtime(baseline_heating_demand, alpha)[source]

Calculate baseline heating runtime given baseline heating demand. and fitted physical parameters.

\(RT_{\text{base heat}} (\text{minutes}) = \alpha_h \cdot \text{daily HTD base}_d\)

Parameters:
  • baseline_heating_demand (pandas.Series) – A series containing estimated daily baseline heating demand.
  • alpha (float) – Slope of fitted line
Returns:

baseline_heating_runtime – A series containing estimated daily baseline heating runtime.

Return type:

pandas.Series

get_cooling_demand(core_cooling_day_set)[source]

Calculates a measure of cooling demand using the hourlyavgCTD method.

Starting with an assumed value of zero for Tau \((\tau_c)\), calculate the daily Cooling Thermal Demand \((\text{daily CTD}_d)\), as follows

\(\text{daily CTD}_d = \frac{\sum_{i=1}^{24} [\tau_c - \text{hourly} \Delta T_{d.n}]_{+}}{24}\), where

\(\text{hourly} \Delta T_{d.n} (^{\circ} F) = \text{hourly indoor} T_{d.n} - \text{hourly outdoor} T_{d.n}\), and

\(d\) is the core cooling day; \(\left(001, 002, 003 ... x \right)\),

\(n\) is the hour; \(\left(01, 02, 03 ... 24 \right)\),

\(\tau_c\) (cooling) is the \(\Delta T\) associated with \(CTD=0\) (zero cooling runtime), and

\([]_{+}\) indicates that the term is zero if its value would be negative.

For the set of all core cooling days in the CT interval data file, use ratio estimation to calculate \(\alpha_c\), the home’s responsiveness to cooling, which should be positive.

\(\alpha_c \left(\frac{\text{minutes}}{^{\circ} F}\right) = \frac{RT_\text{actual cool}}{\sum_{d=1}^{x} \text{daily CTD}_d}\), where

\(RT_\text{actual cool}\) is the sum of cooling run times for all core cooling days in the CT interval data file.

For the set of all core cooling days in the CT interval data file, optimize \(\tau_c\) that results in minimization of the sum of squares of the difference between daily run times reported by the CT, and calculated daily cooling run times.

Next recalculate \(\alpha_c\) (in accordance with the above step) and record the model’s parameters \(\left(\alpha_c, \tau_c \right)\)

Parameters:core_cooling_day_set (thermostat.core.CoreDaySet) – Core day set over which to calculate cooling demand.
Returns:
  • demand (pd.Series) – Daily demand in the core heating day set as calculated using the method described above.
  • tau (float) – Estimate of \(\tau_c\).
  • alpha (float) – Estimate of \(\alpha_c\)
  • mse (float) – Mean squared error in runtime estimates.
  • rmse (float) – Root mean squared error in runtime estimates.
  • cvrmse (float) – Coefficient of variation of root mean squared error in runtime estimates.
  • mape (float) – Mean absolute percent error
  • mae (float) – Mean absolute error
get_core_cooling_day_baseline_setpoint(core_cooling_day_set, method='tenth_percentile', source='temperature_in')[source]

Calculate the core cooling day baseline setpoint (comfort temperature).

Parameters:
  • core_cooling_day_set (thermost.core.CoreDaySet) – Core cooling days over which to calculate baseline cooling setpoint.
  • method ({"tenth_percentile"}, default: "tenth_percentile") –

    Method to use in calculation of the baseline.

    • “tenth_percentile”: 10th percentile of source temperature. (Either cooling setpoint or temperature in).
  • source ({"cooling_setpoint", "temperature_in"}, default "temperature_in") – The source of temperatures to use in baseline calculation.
Returns:

baseline – The baseline cooling setpoint for the core cooling days as determined by the given method.

Return type:

float

get_core_cooling_days(method='entire_dataset', min_minutes_cooling=30, max_minutes_heating=0)[source]

Determine core cooling days from data associated with this thermostat.

Parameters:
  • method ({"entire_dataset", "year_end_to_end"}, default: "entire_dataset") –

    Method by which to find core cooling days.

    • “entire_dataset”: all cooling days in dataset (days with >= 30 min of cooling runtime and no heating runtime.
    • “year_end_to_end”: groups all cooling days (days with >= 30 min of total cooling and no heating) from January 1 to December 31 into individual core cooling sets.
  • min_minutes_cooling (int, default 30) – Number of minutes of core cooling runtime per day required for inclusion in core cooling day set.
  • max_minutes_heating (int, default 0) – Number of minutes of heating runtime per day beyond which the day is considered part of a shoulder season (and is therefore not part of the core cooling day set).
Returns:

core_cooling_day_sets – List of core day sets detected; Core day sets are represented as pandas Series of boolean values, intended to be used as selectors or masks on the thermostat data at hourly and daily frequencies.

A value of True at a particular index indicates inclusion of of the data at that index in the core day set. If method is “entire_dataset”, name of core day set is “cooling_ALL”; if method is “year_end_to_end”, names of core day sets are of the form “cooling_YYYY”

Return type:

list of thermostat.core.CoreDaySet objects

get_core_day_set_n_days(core_day_set)[source]

Returns number of days in the core day set.

get_core_heating_day_baseline_setpoint(core_heating_day_set, method='ninetieth_percentile', source='temperature_in')[source]

Calculate the core heating day baseline setpoint (comfort temperature).

Parameters:
  • core_heating_day_set (thermostat.core.CoreDaySet) – Core heating days over which to calculate baseline heating setpoint.
  • method ({"ninetieth_percentile"}, default: "ninetieth_percentile") –

    Method to use in calculation of the baseline.

    • “ninetieth_percentile”: 90th percentile of source temperature. (Either heating setpoint or indoor temperature).
  • source ({"heating_setpoint", "temperature_in"}, default "temperature_in") – The source of temperatures to use in baseline calculation.
Returns:

baseline – The baseline heating setpoint for the heating day as determined by the given method.

Return type:

float

get_core_heating_days(method='entire_dataset', min_minutes_heating=30, max_minutes_cooling=0)[source]

Determine core heating days from data associated with this thermostat

Parameters:
  • method ({"entire_dataset", "year_mid_to_mid"}, default: "entire_dataset") –

    Method by which to find core heating day sets.

    • “entire_dataset”: all heating days in dataset (days with >= 30 min of heating runtime and no cooling runtime. (default)
    • “year_mid_to_mid”: groups all heating days (days with >= 30 min of total heating and no cooling) from July 1 to June 30 (inclusive) into individual core heating day sets. May overlap with core cooling day sets.
  • min_minutes_heating (int, default 30) – Number of minutes of heating runtime per day required for inclusion in core heating day set.
  • max_minutes_cooling (int, default 0) – Number of minutes of cooling runtime per day beyond which the day is considered part of a shoulder season (and is therefore not part of the core heating day set).
Returns:

core_heating_day_sets – List of core day sets detected; Core day sets are represented as pandas Series of boolean values, intended to be used as selectors or masks on the thermostat data at hourly and daily frequencies.

A value of True at a particular index indicates inclusion of of the data at that index in the core day set. If method is “entire_dataset”, name of core day sets are “heating_ALL”; if method is “year_mid_to_mid”, names of core day sets are of the form “heating_YYYY-YYYY”

Return type:

list of thermostat.core.CoreDaySet objects

get_heating_demand(core_heating_day_set)[source]

Calculates a measure of heating demand using the hourlyavgCTD method.

\(\text{daily HTD}_d = \frac{\sum_{i=1}^{24} [\text{hourly} \Delta T_{d.n} - \tau_h]_{+}}{24}\), where

\(\text{hourly} \Delta T_{d.n} (^{\circ} F) = \text{hourly indoor} T_{d.n} - \text{hourly outdoor} T_{d.n}\), and

\(d\) is the core heating day; \(\left(001, 002, 003 ... x \right)\),

\(n\) is the hour; \(\left(01, 02, 03 ... 24 \right)\),

\(\tau_h\) (heating) is the \(\Delta T\) associated with \(HTD=0\), reflecting that homes with no heat running tend to be warmer that the outdoors, and

\([]_{+}\) indicates that the term is zero if its value would be negative.

For the set of all core heating days in the CT interval data file, use ratio estimation to calculate \(\alpha_h\), the home’s responsiveness to heating, which should be positive.

\(\alpha_h \left(\frac{\text{minutes}}{^{\circ} F}\right) = \frac{RT_\text{actual heat}}{\sum_{d=1}^{x} \text{daily HTD}_d}\), where

\(RT_\text{actual heat}\) is the sum of heating run times for all core heating days in the CT interval data file.

For the set of all core heating days in the CT interval data file, optimize \(\tau_h\) that results in minimization of the sum of squares of the difference between daily run times reported by the CT, and calculated daily heating run times.

Next recalculate \(\alpha_h\) (in accordance with the above step) and record the model’s parameters \(\left(\alpha_h, \tau_h \right)\)

Parameters:core_heating_day_set (array_like) – Core day set over which to calculate heating demand.
Returns:
  • demand (pd.Series) – Daily demand in the core heating day set as calculated using the method described above.
  • tau (float) – Estimate of \(\tau_h\).
  • alpha (float) – Estimate of \(\alpha_h\)
  • mse (float) – Mean squared error in runtime estimates.
  • rmse (float) – Root mean squared error in runtime estimates.
  • cvrmse (float) – Coefficient of variation of root mean squared error in runtime estimates.
  • mape (float) – Mean absolute percent error
  • mae (float) – Mean absolute error
get_ignored_days(core_day_set)[source]

Determine how many days are ignored for a particular core day set

Returns:
  • n_both (int) – Number of days excluded from core day set because of presence of both heating and cooling runtime.
  • n_days_insufficient (int) – Number of days excluded from core day set because of null runtime data.
get_inputfile_date_range(core_day_set)[source]

Returns number of days of data provided in input data file.

get_resistance_heat_utilization_bins(core_heating_day_set)[source]

Calculates resistance heat utilization metrics in temperature bins of 5 degrees between 0 and 60 degrees Fahrenheit.

Parameters:core_heating_day_set (thermostat.core.CoreDaySet) – Core heating day set for which to calculate total runtime.
Returns:RHUs – Resistance heat utilization for each temperature bin, ordered ascending by temperature bin. Returns None if the thermostat does not control the appropriate equipment
Return type:numpy.array or None
total_auxiliary_heating_runtime(core_day_set)[source]

Calculates total auxiliary heating runtime.

Parameters:core_day_set (thermostat.core.CoreDaySet) – Core day set for which to calculate total runtime.
Returns:total_runtime – Total auxiliary heating runtime.
Return type:float
total_cooling_runtime(core_day_set)[source]

Calculates total cooling runtime.

Parameters:core_day_set (thermostat.core.CoreDaySet) – Core day set for which to calculate total runtime.
Returns:total_runtime – Total cooling runtime.
Return type:float
total_emergency_heating_runtime(core_day_set)[source]

Calculates total emergency heating runtime.

Parameters:core_day_set (thermostat.core.CoreDaySet) – Core day set for which to calculate total runtime.
Returns:total_runtime – Total heating runtime.
Return type:float
total_heating_runtime(core_day_set)[source]

Calculates total heating runtime.

Parameters:core_day_set (thermostat.core.CoreDaySet) – Core day set for which to calculate total runtime.
Returns:total_runtime – Total heating runtime.
Return type:float

thermostat.regression

thermostat.regression.runtime_regression(daily_runtime, daily_demand, method)[source]

Least squares regession of runtime against a measure of demand.

Parameters:
  • hourly_runtime (pd.Series with pd.DatetimeIndex) – Runtimes for a particular heating or cooling season.
  • daily_demand (pd.Series with pd.DatetimeIndex) – A daily demand measure for each day in the heating or cooling season.
Returns:

  • slope (float) – The slope parameter found by the regression to minimize sq error
  • intercept (float) – The intercept parameter found by the regression to minimize sq error
  • mean_sq_err (float) – The mean squared error of the regession.
  • root_mean_sq_err (float) – The root mean squared error of the regession.

thermostat.stats

thermostat.stats.combine_output_dataframes(dfs)[source]

Combines output dataframes. Useful when combining output from batches.

Parameters:dfs (list of pd.DataFrame) – Output dataFrames to combine into one.
Returns:out – Dataframe with combined output metadata.
Return type:pd.DataFrame
thermostat.stats.compute_summary_statistics(metrics_df, target_baseline_method='baseline_percentile', advanced_filtering=False)[source]

Computes summary statistics for the output dataframe. Computes the following statistics for each real-valued or integer valued column in the output dataframe: mean, standard error of the mean, and deciles.

Parameters:
  • df (pd.DataFrame) – Output for which to compute summary statistics.
  • label (str) – Name for this set of thermostat outputs.
  • target_baseline_method ({"baseline_percentile", "baseline_regional"}, default "baseline_percentile") – Baselining method by which samples will be filtered according to bad fits.
Returns:

stats – An ordered dict containing the summary statistics. Column names are as follows, in which ### is a placeholder for the name of the column:

  • mean: ###_mean
  • standard error of the mean: ###_sem
  • 10th quantile: ###_10q
  • 20th quantile: ###_20q
  • 30th quantile: ###_30q
  • 40th quantile: ###_40q
  • 50th quantile: ###_50q
  • 60th quantile: ###_60q
  • 70th quantile: ###_70q
  • 80th quantile: ###_80q
  • 90th quantile: ###_90q
  • number of non-null core day sets: ###_n

The following general values are also output:

  • label: label
  • number of total core day sets: n_total_core_day_sets

Return type:

collections.OrderedDict

thermostat.stats.get_filtered_stats(df, row_filter, label, heating_or_cooling, target_columns, target_baseline_method)[source]
thermostat.stats.summary_statistics_to_csv(stats, filepath, product_id)[source]

Write metric statistics to CSV file.

Parameters:
  • stats (list of dict) – List of outputs from thermostat.stats.compute_summary_statistics()
  • filepath (str) – Filepath at which to save the suppary statistics
  • product_id (str) – A combination of the connected thermostat service plus one or more connected thermostat device models that comprises the data set.
Returns:

df – A pandas dataframe containing the output data.

Return type:

pandas.DataFrame

thermostat.parallel

thermostat.parallel.schedule_batches(metadata_filename, n_batches, zip_files=False, batches_dir=None)[source]

Batch scheduler for large sets of thermostats. Can either create zipped directories ready be sent to separate processors for parallel processing, or unpackaged metadata dataframes for more flexible processing.

Parameters:
  • metadata_filename (str) – Full path to location of file containing CSV formatted metadata for
  • n_batches (int) – Number of batches desired. Should be <= the number of available thermostats.
  • zip_files (boolean) – If True, create zipped directories of metadata and interval data. Each batch will be named batch_XXXXX.zip, and will contain a directory named data, which contains metadata and interval data for the batch. Must supply batches_dir argument to use this option.
  • batches_dir (str) – Path to directory in which to save created batches. Ignored for zip_files=False.
Returns:

batches – If zip_files is True, then returns list of names of created zip files. Otherwise, returns list of metadata dataframes containing batches.

Return type:

list of str or list of pd.DataFrame