MHKiT Metocean Data
Metocean data is a useful addition to wave and current ocean data for the design for offshore structures, including wave energy converters (WECs). Metocean data primarily includes information pertaining to the wind, wave, and climate conditions within and over the ocean at a given location.
MHKiT’s wave.io.ndbc
module includes functions to request and manipulate metocean data from the National Data Buoy Center, including spectral wave density, standard meteorological data and continuous wind speed. The wave.io.hindcast.wind_toolkit
module includes function to request and manipulate metocean data from the WIND Toolkit datasets including precipitation rate, relative humidity and vertical profiles of wind speed, wind direction, precipitation, temperature, and pressure.
As a demonstration, this notebook will walk through the following steps to compare the surface wind speed and direction at a given location using both NDBC and WIND Toolkit data, and then plot a vertical air temperature profile with WIND Toolkit data.
Request Continous Wind Data from NDBC
Request Surface Wind Data from WIND Toolkit
Compare NDBC and WIND Toolkit Data
Request Temperature Data from WIND Toolkit
Visualize Temperature Data
We will start by importing the necessary python packages (pandas
, numpy
, matplotlib
), and MHKiT submodule (wave.io
).
[1]:
from mhkit.wave.io import ndbc
from mhkit.wave.io.hindcast import wind_toolkit
from mhkit.tidal.graphics import plot_rose
from mhkit.tidal.graphics import plot_joint_probability_distribution
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
1. Request Continuous Wind Data from NDBC
MHKiT can be used to request historical data from the National Data Buoy Center (NDBC). This process is split into the following steps:
Query available NDBC data
Select years of interest
Request data from NDBC
Convert the DataFrames to DateTime Index
Query available NDBC data
Looking at the help for the ndbc.available_data
function (help(ndbc.available_data)
) the function requires a parameter to be specified and optionally the user may provide a station ID as a string. A full list of available historical parameters can be found here although only some of these are currently supported. We are interested in historical continuous wind data 'cwind'
. Additionally, we will specify the buoy number as '46022'
to only return data associated with this site.
[2]:
# Specify the parameter as continuous wind speeds and the buoy number to be 46022
ndbc_dict = {"parameter": "cwind", "buoy_number": "46022"}
available_data = ndbc.available_data(ndbc_dict["parameter"], ndbc_dict["buoy_number"])
available_data
[2]:
id | year | filename | |
---|---|---|---|
1215 | 46022 | 1996 | 46022c1996.txt.gz |
1216 | 46022 | 1997 | 46022c1997.txt.gz |
1217 | 46022 | 1998 | 46022c1998.txt.gz |
1218 | 46022 | 1999 | 46022c1999.txt.gz |
1219 | 46022 | 2000 | 46022c2000.txt.gz |
1220 | 46022 | 2001 | 46022c2001.txt.gz |
1221 | 46022 | 2002 | 46022c2002.txt.gz |
1222 | 46022 | 2003 | 46022c2003.txt.gz |
1223 | 46022 | 2004 | 46022c2004.txt.gz |
1224 | 46022 | 2005 | 46022c2005.txt.gz |
1225 | 46022 | 2006 | 46022c2006.txt.gz |
1226 | 46022 | 2007 | 46022c2007.txt.gz |
1227 | 46022 | 2008 | 46022c2008.txt.gz |
1228 | 46022 | 2009 | 46022c2009.txt.gz |
1229 | 46022 | 2010 | 46022c2010.txt.gz |
1230 | 46022 | 2011 | 46022c2011.txt.gz |
1231 | 46022 | 2012 | 46022c2012.txt.gz |
1232 | 46022 | 2013 | 46022c2013.txt.gz |
1233 | 46022 | 2014 | 46022c2014.txt.gz |
1234 | 46022 | 2015 | 46022c2015.txt.gz |
1235 | 46022 | 2016 | 46022c2016.txt.gz |
1236 | 46022 | 2017 | 46022c2017.txt.gz |
1237 | 46022 | 2018 | 46022c2018.txt.gz |
Select years of interest
The ndbc.available_data
function has returned a DataFrame with columns ‘id’, ‘year’, and ‘filename’. The year column is of type int while the filename and id (5 digit alpha-numeric specifier) are of type string. In this case, the years returned from available_data
span 1996 to the last year the buoy was operational (currently 2018 for 46022 wind data). For demonstration, we have decided we are interested in the data from 2018, so we will create a new years_of_interest
DataFrame which
only contains the year 2018.
[3]:
# Slice the available data to only include 2018 and more recent
years_of_interest = available_data[available_data.year == 2018]
years_of_interest
[3]:
id | year | filename | |
---|---|---|---|
1237 | 46022 | 2018 | 46022c2018.txt.gz |
Request Data from NDBC
The filename column in our years_of_interest
DataFrame and the parameter is needed to request the data. To get the data we can use the ndbc.request_data
function for each buoy id and year in the passed DataFrame. This function will return the parameter data as a dictionary of DataFrames which may be accessed by buoy id and then the year for multiple buoys or just the year for a single buoy.
[4]:
# Get dictionary of parameter data by year
ndbc_dict["filenames"] = years_of_interest["filename"]
requested_data = ndbc.request_data(ndbc_dict["parameter"], ndbc_dict["filenames"])
requested_data
[4]:
{'2018': #YY MM DD hh mm WDIR WSPD GDR GST GTIME
0 2017 12 31 23 0 206 7.8 999 99.0 9999
1 2017 12 31 23 10 202 8.3 999 99.0 9999
2 2017 12 31 23 20 199 8.2 999 99.0 9999
3 2017 12 31 23 30 194 7.6 999 99.0 9999
4 2017 12 31 23 40 188 6.6 999 99.0 9999
... ... .. .. .. .. ... ... ... ... ...
18319 2018 5 8 16 10 107 0.9 999 99.0 9999
18320 2018 5 8 16 20 210 0.3 999 99.0 9999
18321 2018 5 8 16 30 271 1.1 999 99.0 9999
18322 2018 5 8 16 40 275 1.4 999 99.0 9999
18323 2018 5 8 16 50 277 1.4 281 2.4 1642
[18324 rows x 10 columns]}
Convert the DataFrames to DateTime Index
The data returned for each year has a variable number of columns for the year, month, day, hour, minute, and the way the columns are formatted (this is a primary reason for return a dictionary of DataFrames indexed by years). A common step a user will want to take is to remove the inconsistent NDBC date/ time columns and create a standard DateTime index. The MHKiT function ndbc.to_datetime_index
will perform this standardization by parsing the NDBC date/ time columns into DateTime format and
setting this as the DataFrame Index and removing the NDBC date/ time columns. This function operates on a DateFrame therefore we will only pass the relevant year of the ndbc_requested_data
dictionary.
Additionally, NDBC uses ‘99’, ‘999’, and ‘9999’ as NaN values. We will replace these values with numpy.NaN
to prevent errors when manipulating the data.
[5]:
# Convert the header dates to a Datetime Index and remove NOAA date columns for each year
ndbc_dict["2018"] = ndbc.to_datetime_index(
ndbc_dict["parameter"], requested_data["2018"]
)
# Replace 99, 999, 9999 with NaN
ndbc_dict["2018"] = ndbc_dict["2018"].replace({99.0: np.nan, 999: np.nan, 9999: np.nan})
# Display DataFrame of 46022 data from 2018
ndbc_dict["2018"]
[5]:
WDIR | WSPD | GDR | GST | GTIME | |
---|---|---|---|---|---|
date | |||||
2017-12-31 23:00:00 | 206.0 | 7.8 | NaN | NaN | NaN |
2017-12-31 23:10:00 | 202.0 | 8.3 | NaN | NaN | NaN |
2017-12-31 23:20:00 | 199.0 | 8.2 | NaN | NaN | NaN |
2017-12-31 23:30:00 | 194.0 | 7.6 | NaN | NaN | NaN |
2017-12-31 23:40:00 | 188.0 | 6.6 | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... |
2018-05-08 16:10:00 | 107.0 | 0.9 | NaN | NaN | NaN |
2018-05-08 16:20:00 | 210.0 | 0.3 | NaN | NaN | NaN |
2018-05-08 16:30:00 | 271.0 | 1.1 | NaN | NaN | NaN |
2018-05-08 16:40:00 | 275.0 | 1.4 | NaN | NaN | NaN |
2018-05-08 16:50:00 | 277.0 | 1.4 | 281.0 | 2.4 | 1642.0 |
18324 rows × 5 columns
2. Request Surface Wind Data from WIND Toolkit
MHKiT can be used to request historical data from WIND Toolkit. The WIND Toolkit is a high-spatial-resolution dataset of meteorological parameters covering several offshore regions including California, Hawaii, the Northwest Pacific, and the Mid-Atlantic. The offshore datasets span 2000-2019 (or to 2020 for the Mid-Atlantic region).
Included Variables:
Dataset variables included are indexed by ‘lat_lon’ (latitude and longitude), and a ‘time_interval’. The following variables can be accessed through the
request_wtk_point_data
function and are available at both ‘5-minute’ and ‘1-hour’ time intervals and in all offshore regions:
Variable Name |
Units |
Height(s) above sea level (m) |
---|---|---|
Precipitation rate |
mm |
0 |
Roughness length |
meters |
none |
Sea surface temperature |
degrees Celsius |
none |
Inverse Monin-Obukhov length |
per meter |
2 |
Relative humidity |
% |
2 |
Friction velocity |
meters per second |
2 |
Wind speed |
meters per second |
10, 20, 40, 60, 80, 100, 120, 140, 160, 180, 200 |
Wind direction |
degrees true |
10, 20, 40, 60, 80, 100, 120, 140, 160, 180, 200 |
Temperature |
degrees Celsius |
2, 10, 20, 40, 60, 80, 100, 120, 140, 160, 180, 200 |
Pressure |
meters |
0, 100, 200 |
Setting up Access to WIND Toolkit
To access the WIND Toolkit, you will need to configure h5pyd for data access on HSDS. To get your own API key, visit https://developer.nrel.gov/signup/. If you have already accessed WPTO Hindcast data, you may skip this step.
To configure h5pyd type:
hsconfigure
and enter at the prompt:
hs_endpoint = https://developer.nrel.gov/api/hsds
hs_username = None
hs_password = None
hs_api_key = {your key}
Using the WIND Toolkit MHKiT Functions
The process of obtaining and manipulating WIND Toolkit data with MHKiT is split into the following steps: - Choose input parameters - Request data from WIND Toolkit
Choose input parameters
Looking at the help for the wind_toolkit.request_wtk_point_data
function shows four required input parameters: * time interval * parameter * location * year
A full list of available historical parameters can be found here. Here we are interested in historical surface wind data ('windspeed_10m'
, 'winddirection_10m'
) and a vertical temperature profile. For demonstration purposes, not all temperature elevations will be used. Additionally, we will use the latitude and longitude of NDBC buoy '46022'
to later compare to the NDBC dataset. The buoy location can
be found here. The elevation_to_string
can be used to easily combine an array of elevations with a parameter name to create the correctly formatted parameter string. This is demonstrated for the array of temperature parameters.
[6]:
# Input parameters for site of interest
temperatures = wind_toolkit.elevation_to_string(
"temperature", [2, 20, 40, 80, 120, 160]
)
temperatures
[6]:
['temperature_2m',
'temperature_20m',
'temperature_40m',
'temperature_80m',
'temperature_120m',
'temperature_160m']
[7]:
wtk_inputs = {
"time_interval": "1-hour",
"wind_parameters": ["windspeed_10m", "winddirection_10m"],
"temp_parameters": temperatures,
"year": [2018],
"lat_lon": (40.748, -124.527),
}
Visualize WIND Toolkit Region
The offshore WIND Toolkit contains four possible regions, each corresponding to separate hindcast simulations: “Offshore_CA”, “Hawaii”, “NW_Pacific”, “Mid_Atlantic”. This is automatically determined by MHKiT based on the user-defined latitude-longitude pair. Note that the “NW_Pacific” and “Offshore_CA” region overlap. If you desire to use a location in this overlap region, you must specify the preferred region when calling request_wtk_point_data
.
If you are unsure of the appropriate region, region_selection
function will return the region that a given latitude-longitude pair corresponds to. MHKiT also provides the plot_region
function so that users can visualize where their chosen latitude-longitude pair lies in a given region. Note that regions are not rectangular on a latitude-longitude grid. Comparing the location and region is recommended before requesting data. These functions are demonstrated here.
[8]:
requested_region = wind_toolkit.region_selection(wtk_inputs["lat_lon"])
requested_region
[8]:
'Offshore_CA'
[9]:
wind_toolkit.plot_region(requested_region, lat_lon=wtk_inputs["lat_lon"])
[9]:
<Axes: title={'center': 'Extent of the WIND Toolkit Offshore_CA region'}, xlabel='Longitude (deg)', ylabel='Latitude (deg)'>
Request data from the WIND Toolkit
To get the data we can use the wind_toolkit.request_wtk_point_data
function with multiple parameters, locations and years. If multiple locations are input, they must correspond to the same hindcast region. If multiple parameters are input they must be available for the same time interval.
This function will return the parameter data as a DataFrame which may be accessed by parameter name and index of the location. Here we only use the single location defined above, so the relevant DataFrame key is 'wind_speed_10m_0'
.
[10]:
wtk_wind, wtk_metadata = wind_toolkit.request_wtk_point_data(
wtk_inputs["time_interval"],
wtk_inputs["wind_parameters"],
wtk_inputs["lat_lon"],
wtk_inputs["year"],
)
wtk_wind
[10]:
windspeed_10m_0 | winddirection_10m_0 | |
---|---|---|
time_index | ||
2018-01-01 00:00:00+00:00 | 7.78 | 179.199997 |
2018-01-01 01:00:00+00:00 | 7.81 | 190.649994 |
2018-01-01 02:00:00+00:00 | 6.89 | 191.690002 |
2018-01-01 03:00:00+00:00 | 7.58 | 190.720001 |
2018-01-01 04:00:00+00:00 | 8.10 | 183.199997 |
... | ... | ... |
2018-12-31 19:00:00+00:00 | 15.04 | 359.070007 |
2018-12-31 20:00:00+00:00 | 14.59 | 358.470001 |
2018-12-31 21:00:00+00:00 | 14.76 | 358.489990 |
2018-12-31 22:00:00+00:00 | 14.41 | 358.269989 |
2018-12-31 23:00:00+00:00 | 14.34 | 359.769989 |
8760 rows × 2 columns
3. Compare NDBC and WIND Toolkit Data
Wind conditions may be characterized by the speed and direction over a day. A hourly time series of wind speed is a useful way of visualizing the diurnal variation on a give date. A wind rose can visualize variation in wind direction over the same period. Using the historical continuous wind data from NDBC and surface wind data from WIND Toolkit, we can calculate these variables in MHKiT.
Plot the hourly mean wind speeds on January 11, 2018
The NDBC data for January 11, 2018 is given in 10 minute intervals. We can get the instantaneous hourly wind data using the DataFrame resample().nearest()
function. The WIND Toolkit data is already given as instantaneous hourly data. Then, we can plot the speeds vs the hour they occur. Some variation is seen due to difference in method and height.
[11]:
# Get WIND Toolkit and NDBC wind data for 2018-01-11
ndbc_hourly_data = ndbc_dict["2018"].loc["2018-01-11"].resample("h").nearest()
wtk_hourly_wind = wtk_wind.loc["2018-01-11"]
# Plot the timeseries
fig = plt.figure()
ax = fig.add_subplot(111)
ax.set_xlabel("Time, UTC (h)")
ax.set_ylabel("Speed (m/s)")
ax.set_title("Hourly mean wind speeds on January 11, 2018")
ax.grid()
ax.set_ylim([5, 14])
ax.set_xlim([0, 24])
line1 = ax.plot(
ndbc_hourly_data.index.hour,
ndbc_hourly_data["WSPD"].values,
"o",
label="NDBC 4m wind speed",
)
line2 = ax.plot(
wtk_hourly_wind.index.hour,
wtk_hourly_wind["windspeed_10m_0"].values,
"x",
label="WIND Toolkit 10m wind speed",
)
ax.legend()
[11]:
<matplotlib.legend.Legend at 0x232d04a1790>
Create a mean wind speed wind rose for January 11, 2018
To complement the timeseries of wind speed, we can visualize the variation in wind direction over the day with a wind rose. The MHKiT plot_rose
function will do this for us. Some variation is seen between NDBC and WIND Toolkit, possibly due to method and height. Users can further investigate different source of metocean data as desired.
[12]:
# Set the rose bin widths
width_direction = 10 # in degrees
width_velocity = 1 # in m/s
# Plot the wind rose
ax = plot_rose(
ndbc_hourly_data["WDIR"], ndbc_hourly_data["WSPD"], width_direction, width_velocity
)
[13]:
ax2 = plot_rose(
wtk_hourly_wind["winddirection_10m_0"],
wtk_hourly_wind["windspeed_10m_0"],
width_direction,
width_velocity,
)
Plot vertical temperature profile
The air temperature at several vertical levels can be downloaded from the WIND Toolkit dataset. More or less vertical levels can be defined in the parameter input to request_wtk_point_data
to increase resolution or decrease computation expense as desired. Note that times are given in UTC, can optionally be converted to local time by finding the time zone of the chosen latitude/longitude pair.
[14]:
wtk_temp, wtk_metadata = wind_toolkit.request_wtk_point_data(
wtk_inputs["time_interval"],
wtk_inputs["temp_parameters"],
wtk_inputs["lat_lon"],
wtk_inputs["year"],
)
# wtk_temp = wtk_temp.shift(-7) # optionally UTC to local time
# Pick times corresponding to stable and unstable temperature profiles
stable_temp = wtk_temp.at_time("2018-01-11 03:00:00").values[0]
unstable_temp = wtk_temp.at_time("2018-01-11 15:00:00").values[0]
# Find heights from temperature DataFrame columns
heights = []
for s in wtk_temp.keys():
s = s.removeprefix("temperature_")
s = s.removesuffix("m_0")
heights.append(float(s))
heights = np.array(heights)
# Plot the profiles
fig = plt.figure()
ax = fig.add_subplot(111)
ax.set_xlabel("Temperature (C)")
ax.set_ylabel("Height (m)")
ax.set_title("Temperature profiles from January 11, 2018")
ax.grid()
line1 = ax.plot(stable_temp, heights, "o-", label="time=03:00:00 UTC")
line2 = ax.plot(unstable_temp, heights, "x-", label="time=15:00:00 UTC")
ax.legend()
[14]:
<matplotlib.legend.Legend at 0x232d0335690>