QC Module
The QC module includes quality control functions from Pecos, see https://pecos.readthedocs.io for more details.
| Check time series for missing, non-monotonic and duplicate timestamps | |
| Check for missing data | |
| Check for corrupt data | |
| Check for data that is outside expected range | |
| Check for stagnant data and/or abrupt changes in the data using the difference between max and min values (delta) within a rolling window | |
| Check for outliers using normalized data within a rolling window | 
- mhkit.qc.check_timestamp(data, frequency, expected_start_time=None, expected_end_time=None, min_failures=1, exact_times=True)[source]
- Check time series for missing, non-monotonic and duplicate timestamps - Parameters:
- data (pandas DataFrame) – Data used in the quality control test, indexed by datetime 
- frequency (int or float) – Expected time series frequency, in seconds 
- expected_start_time (Timestamp, optional) – Expected start time. If not specified, the minimum timestamp is used 
- expected_end_time (Timestamp, optional) – Expected end time. If not specified, the maximum timestamp is used 
- min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1 
- exact_times (bool, optional) – Controls how missing times are checked. If True, times are expected to occur at regular intervals (specified in frequency) and the DataFrame is reindexed to match the expected frequency. If False, times only need to occur once or more within each interval (specified in frequency) and the DataFrame is not reindexed. 
 
- Returns:
- dictionary – Results include cleaned data, mask, and test results summary 
 
- mhkit.qc.check_missing(data, key=None, min_failures=1)[source]
- Check for missing data - Parameters:
- data (pandas DataFrame) – Data used in the quality control test, indexed by datetime 
- key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test. 
- min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1 
 
- Returns:
- dictionary – Results include cleaned data, mask, and test results summary 
 
- mhkit.qc.check_corrupt(data, corrupt_values, key=None, min_failures=1)[source]
- Check for corrupt data - Parameters:
- data (pandas DataFrame) – Data used in the quality control test, indexed by datetime 
- corrupt_values (list of int or floats) – List of corrupt data values 
- key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test. 
- min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1 
 
- Returns:
- dictionary – Results include cleaned data, mask, and test results summary 
 
- mhkit.qc.check_range(data, bound, key=None, min_failures=1)[source]
- Check for data that is outside expected range - Parameters:
- data (pandas DataFrame) – Data used in the quality control test, indexed by datetime 
- bound (list of floats) – [lower bound, upper bound], None can be used in place of a lower or upper bound 
- key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test. 
- min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1 
 
- Returns:
- dictionary – Results include cleaned data, mask, and test results summary 
 
- mhkit.qc.check_delta(data, bound, window, key=None, direction=None, min_failures=1)[source]
- Check for stagnant data and/or abrupt changes in the data using the difference between max and min values (delta) within a rolling window - Parameters:
- data (pandas DataFrame) – Data used in the quality control test, indexed by datetime 
- bound (list of floats) – [lower bound, upper bound], None can be used in place of a lower or upper bound 
- window (int or float) – Size of the rolling window (in seconds) used to compute delta 
- key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test. 
- direction (str, optional) – - Options = ‘positive’, ‘negative’, or None - If direction is positive, then only identify positive deltas (the min occurs before the max) 
- If direction is negative, then only identify negative deltas (the max occurs before the min) 
- If direction is None, then identify both positive and negative deltas 
 
- min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1 
 
- Returns:
- dictionary – Results include cleaned data, mask, and test results summary 
 
- mhkit.qc.check_outlier(data, bound, window=None, key=None, absolute_value=False, streaming=False, min_failures=1)[source]
- Check for outliers using normalized data within a rolling window - The upper and lower bounds are specified in standard deviations. Data normalized using (data-mean)/std. - Parameters:
- data (pandas DataFrame) – Data used in the quality control test, indexed by datetime 
- bound (list of floats) – [lower bound, upper bound], None can be used in place of a lower or upper bound 
- window (int or float, optional) – Size of the rolling window (in seconds) used to normalize data, If window is set to None, data is normalized using the entire data sets mean and standard deviation (column by column). default = None. 
- key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test. 
- absolute_value (boolean, optional) – Use the absolute value the normalized data, default = True 
- streaming (boolean, optional) – Indicates if streaming analysis should be used, default = False 
- min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1 
 
- Returns:
- dictionary – Results include cleaned data, mask, and test results summary