QC Module

The QC module includes quality control functions from Pecos, see https://pecos.readthedocs.io for more details.

Functions	Description
`check_corrupt`	Check for corrupt data
`check_delta`	Check for stagant data and/or abrupt changes in the data using the difference between max and min values within a rolling window
`check_increment`	Check data increments using the difference between values
`check_missing`	Check for missing data
`check_outlier`	Check for outliers using normalized data within a rolling window
`check_range`	Check for data outside the expected range
`check_timestamp`	Check time series for missing, non-monotonic, and duplicate timestamps
`qc_data_to_dataframe`	Convert qc data structure to pandas dataframe

Note

The names of the functions below are of the convention path.path.function. Only the function name is used when calling the function in MATLAB. For example, to call on mhkit.qc.check_timestamp simply use check_timestamp.

mhkit.qc.check_corrupt(data, vals, options)

Check for data that is outside expected range

Parameters:

data (pandas dataframe or qcdata structure) –
Pandas dataframe indexed by datetime (use py.mhkit_python_utils.pandas_dataframe.timeseries_to_pandas(ts,time,x))

OR

qcdata structure of form:

data.values: 2D array of doubles with arbitrary number of columns

data.time: 1D array of datetimes or posix times
vals (cell array of floats) – Array of at least two corrupt data values Use cell array for one value, or pack array with NaN (A single value array becomes a non-iterable scalar in python and will cause an error.)
key (string (optional)) – Data column name or translation dictionary key. If not specified or set to py.None, all columns are used for test. to call: check_corrupt(data,vals,”key”,key)
min_failures (int (optional)) – Minimum number of consecutive failures required for reporting default = 1 to call: check_corrupt(data,vals,”min_failures”,min_failures)

Returns:

results (qcdata structure of form:) –

results.values: array of doubles: Same shape as input data.values Elements that failed QC test replaced with NaN
results.mask: array of int64: Same shape as input data.values Logical mask of QC results (1 = passed, 0 = failed QC test)
results.time: array of datetimes: Same as input times

mhkit.qc.check_outlier(data, bound, options)

Check or outliers using normalized data within a rolling window Upper and lower bounds in standard deviations Data is normalized using (data-mean)/std

Parameters:

data (pandas dataframe or qcdata structure) –
Pandas dataframe indexed by datetime (use py.mhkit_python_utils.pandas_dataframe.timeseries_to_pandas(ts,time,x))

OR

qcdata structure of form:

data.values: 2D array of doubles with arbitrary number of columns

data.time: 1D array of datetimes or posix times
bound (cell array of floats) – [lower bound, upper bound] of standard deviations from mean allowed NaN or py.None can be used for either bound
key (string (optional)) – Data column name or translation dictionary key. If not specified or set to py.None, all columns are used for test. to call: check_outlier(data,bound,”key”,key)
window (int (optional)) – Size of rolling window (in seconds) used to normalize data default = 3600. If window is set to py.None, data is normalized using mean and stddev of entire data set (column by column) to call: check_outlier(data,bound,”window”,window)
absolute_value (logical (optional)) – Use the absolute value of the normalized data, default = py.True to call: check_outlier(data,bound,”absolute_value”,absolute_value)
min_failures (int (optional)) – Minimum number of consecutive failures required for reporting, default = 1 to call: check_outlier(data,bound,”min_failures”,min_failures)
streaming (logical (optional)) – Indicates if streaming analysis should be used, default = py.False to call: check_outlier(data,bound,”streaming”,streaming)

Returns:

results (qcdata structure of form:) –

results.values: array of doubles: Same shape as input data.values Elements that failed QC test replaced with NaN
results.mask: array of int64: Same shape as input data.values Logical mask of QC results (1 = passed, 0 = failed QC test)
results.time: array of datetimes: Same as input times

mhkit.qc.check_increment(data, bound, options)

Check data increments using the difference between values

Parameters:

data (pandas dataframe or qcdata structure) –
Pandas dataframe indexed by datetime (use py.mhkit_python_utils.pandas_dataframe.timeseries_to_pandas(ts,time,x))

OR

qcdata structure of form:

data.values: 2D array of doubles with arbitrary number of columns

data.time: 1D array of datetimes or posix times
bound (cell array of floats) – [lower bound, upper bound] for min/max difference NaN or py.None can be used for either bound
key (string (optional)) – Data column name or translation dictionary key. If not specified or set to py.None, all columns are used for test. to call: check_increment(data,bound,”key”,key)
increment (int (optional)) – Time step shift used to compute difference, default = 1 to call: check_increment(data,bound,”increment”,increment)
absolute_value (logical (optional)) – Use the absolute value of increment data, default = py.True to call: check_increment(data,bound,”absolute_value”,absolute_value)
min_failures (int (optional)) – Minimum number of consecutive failures required for reporting default = 1 to call: check_increment(data,bound,”min_failures”,min_failures)

Returns:

results (qcdata structure of form:) –

results.values: array of doubles: Same shape as input data.values Elements that failed QC test replaced with NaN
results.mask: array of int64: Same shape as input data.values Logical mask of QC results (1 = passed, 0 = failed QC test)
results.time: array of datetimes: Same as input times

mhkit.qc.check_missing(data, options)

Check for missing data

Parameters:

data (pandas dataframe or qcdata structure) –
Pandas dataframe indexed by datetime (use py.mhkit_python_utils.pandas_dataframe.timeseries_to_pandas(ts,time,x))

OR

data structure of form:

data.values: 2D array of doubles with arbitrary number of columns

data.time: 1D array of datetimes or posix times
key (string (optional)) – Data column name or translation dictionary key. If not specified or set to py.None, all columns are used for test. to call: check_missing(data,”key”,key)
min_failures (int (optional)) – Minimum number of consecutive failures required for reporting, default = 1 to call: check_missing(data,”min_failures”,min_failures)

Returns:

results (qcdata structure of form:) –

results.values: array of doubles: Same shape as input data.values Elements that failed QC test replaced with NaN
results.mask: array of int64: Same shape as input data.values Logical mask of QC results (1 = passed, 0 = failed QC test)
results.time: array of datetimes: Same as input times

mhkit.qc.check_delta(data, bound, window, options)

Check for stagnant data and/or abrupt changes in the data using difference between max and min values within a rolling window

Parameters:

data (pandas dataframe or qcdata structure) –
Pandas dataframe indexed by datetime (use py.mhkit_python_utils.pandas_dataframe.timeseries_to_pandas(ts,time,x))

OR

qcdata structure of form:

data.values: 2D array of doubles with arbitrary number of columns

data.time: 1D array of datetimes or posix times
bound (cell array of floats) – [lower bound, upper bound] for min/max delta checking NaN or py.None can be used for either bound
window (int or double) – Size of the rolling window (in seconds) used to compute delta
key (string (optional)) – Data column name or translation dictionary key. If not specified or set to py.None, all columns are used for test. to call: check_delta(data,bound,”key”,key)
direction (string (optional)) –

Options: ‘positive’, ‘negative’, or py.None (default)
If direction is positive, then only identify positive deltas (the min occurs before the max) If direction is negative, then only identify negative deltas (the max occurs before the min) If direction is py.None, then identify both positive and negative deltas

to call: check_delta(data,bound,”direction”,direction)
min_failures (int (optional)) – Minimum number of consecutive failures required for reporting, default = 1 to call: check_delta(data,bound,”min_failures”,min_failures)

Returns:

results (qcdata structure of form:) –

results.values: array of doubles: Same shape as input data.values Elements that failed QC test replaced with NaN
results.mask: array of int64: Same shape as input data.values Logical mask of QC results (1 = passed, 0 = failed QC test)
results.time: array of datetimes: Same as input times

mhkit.qc.check_timestamp(data, freq, options)

Check time series for missing, non-monotonic, and duplicate timestamps

Parameters:

data (pandas dataframe or qcdata structure) –
Pandas dataframe indexed by datetime (use py.mhkit_python_utils.pandas_dataframe.timeseries_to_pandas(ts,time,x))

OR

qcdata structure of form:

data.values: 2D array of doubles with arbitrary number of columns

data.time: 1D array of datetimes or posix times
freq (int) – Expected time series frequency, in seconds
expected_start_time (Timestamp (optional)) – Expected start time in datetime format. Default: None to call: check_timestamp(data,freq,”expected_start_time”,expected_start_time)
expected_end_time (Timestamp (optional)) – Expected end time in datetime format. Default: None to call: check_timestamp(data,freq,”expected_end_time”,expected_end_time)
min_failures (int (optional)) – Minimum number of consecutive failures required for reporting, default = 1 to call: check_timestamp(data,freq,”min_failures”,min_failures)
exact_times (logical (optional)) – If py.True, times are expected to occur at regular intervals (specified by freq) and data is reindexed to match expected frequency If py.False, times only need to occur once or more within each interval (specified by freq) and data is not reindexed to call: check_timestamp(data,freq,”exact_times”,exact_times)

Returns:

results (qcdata structure of form:) –

results.values: array of doubles: Same shape as input data.values Elements that failed QC test replaced with NaN
results.mask: array of int64: Same shape as input data.values Logical mask of QC results (1 = passed, 0 = failed QC test)
results.time: array of datetimes: Same as input times (possibly reindexed by exact_times)

mhkit.qc.qc_data_to_dataframe(data)

Convert qc data structure to pandas dataframe

Parameters:: data –
Returns:: results (Pandas DataFrame)

mhkit.qc.check_range(data, bound, options)

Check for data that is outside expected range

Parameters:

data (pandas dataframe or qcdata structure) –
Pandas dataframe indexed by datetime (use py.mhkit_python_utils.pandas_dataframe.timeseries_to_pandas(ts,time,x))

OR

qcdata structure of form:

data.values: 2D array of doubles with arbitrary number of columns

data.time: 1D array of datetimes or posix times
bound (cell array of floats) – [lower bound, upper bound] for range checking NaN or py.None can be used for either bound
key (string (optional)) – Data column name or translation dictionary key. If not specified or set to py.None, all columns are used for test. to call: check_range(data,bound,”key”,key)
min_failures (int (optional)) – Minimum number of consecutive failures required for reporting default = 1 to call: check_range(data,bound,”min_failures”,min_failures)

Returns:

results (qcdata structure of form:) –

results.values: array of doubles: Same shape as input data.values Elements that failed QC test replaced with NaN
results.mask: array of int64: Same shape as input data.values Logical mask of QC results (1 = passed, 0 = failed QC test)
results.time: array of datetimes: Same as input times