{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# MHKiT CDIP IO\n", "\n", "MHKiT includes functions to pull data directly from the Coastal Data Information Program ([CDIP](http://cdip.ucsd.edu/m/about/)), an extensive network for monitoring waves and beaches along the coastlines of the United States.\n", "\n", "To run this example of using CDIP and its data in MHKiT we will start by importing the necessary python packages (`scipy`, `pandas`, `numpy`), and MHKiT wave submodules (`resource`, `graphics`, and `io.ndbc`)." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from mhkit.wave import graphics\n", "from mhkit.wave.io import cdip\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Quick Start\n", "\n", "The `request_parse_workflow` function is your one-stop shop for pulling and parsing data. It wraps MHKiT's `request_data` function so that if a station number is passed the function will pull the buoy data from the CDIP servers and then proceed to process the data for the user. `request_parse_workflow` allows the user to slice data between a start and/or end date. Alternatively, the user may specify a single year or multiple years of interest. To reduce processing time `request_parse_workflow` will allow the user to specify parameters of interest and only return data for the specified parameters. By default, the `request_parse_workflow` will pull historic data but the user may also specify real-time data if interested. Lastly, due to the long processing time two-dimensional (2D) data is not returned by default. If __all__ 2D data is desired the boolean `all_2D_variables` may be specified as True. It is recommended however that if 2D data is needed that the user passes that parameter in the parameters field. For a full list of 1D, 2D, and metadata variables see the CDIP reference [here](https://docs.google.com/document/d/1Uz_xIAVD2M6WeqQQ_x7ycoM3iKENO38S4Bmn6SasHtY/edit) (see BUOY VARIABLES)." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Processing 2D Variables:\n", "\n", "\n", "Returned data: dict_keys(['data', 'metadata']) \n", "\n" ] } ], "source": [ "from mhkit.wave.io import cdip\n", "import matplotlib.pyplot as plt\n", "station_number = '100'\n", "start_date = '2020-04-01'\n", "end_date= '2020-04-30'\n", "parameters =['waveHs', 'waveTp', 'waveMeanDirection']\n", "\n", "data = cdip.request_parse_workflow(station_number=station_number, parameters=parameters, \n", " start_date=start_date, end_date=end_date)\n", "\n", "print('\\n')\n", "print(f'Returned data: {data.keys()} \\n')\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Request Data from CDIP\n", " \n", "In the above example `request_parse_workflow` is used to both request and parse the CDIP data using multiple MHKiT functions in a presepecfied workflow for the user. Here, we will start by running the consituient functions which make up parse data.\n", "\n", "To get started we will take a look at the `request_netCDF` function and what is needed to call it. Requesting the NetCDF file is broken out to provide users flexibility by taking a modular approach to the data request. This is useful for the user to make custom workflows beyond what `request_parse_workflow` does currently and can also be used if the user prefers to work with the requested data in a tool such as xarray instead of the returned dictionary of dataframes. \n", "\n", "MHKiT can be used to request historical or realtime data from the CDIP buoys. A station table and map can be found here to determine a buoy of interest http://cdip.ucsd.edu/m/stn_table/. To get started we will call historic data from station number 100. The function will return a netCDF file with all historic data." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "station_number='100'\n", "data_type='historic'\n", "nc = cdip.request_netCDF(station_number, data_type)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Processing the NetCDF file\n", "\n", "The returned NetCDF file stored as the nc file above may now be passed to the `get_netcdf_variables` function to extract the data into a dictionary with a DataFrame of 1D variables and the metadata. The 'data' and 'metadata' each hold different types of data based based on a prefix used on the variable returned by CDIP. Example prefixs are 'wave', 'sst', and 'gps' which are described in further detail below." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Returned data: dict_keys(['data', 'metadata']) \n", "\n" ] } ], "source": [ "buoy_data = cdip.get_netcdf_variables(nc)\n", "\n", "print(f'Returned data: {buoy_data.keys()} \\n')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.a The 'data' key\n", "As can be seen above the function has returned a dictionary with two keys:\n", "\n", " 1. 'data' - dictionary of DataFrames with keys reffering to the data type. Possible keys include\n", " a. 'wave' - includes processed variables such as Hs, and Tp\n", " b. 'sst' - timeseries of sea surface tempature\n", " c. 'gps' - timeseries of buoy latitude and longitude\n", " d. 'dwr' - directional waverider contains information about accelerometer and buoy batery level\n", " e. 'wave2D' - dictionary of dataframes which are of length time and have columns of wave frequency.\n", " 2. 'metadata' - any other data that was not of length time of a 2D variable. This has the same keys as the 'data' key above.\n", " a. 'meta' - processed variables which started with the meta prefix and are not part of the variable prefixs listed below\n", " b. 'wave' - includes processed variables such as Hs, and Tp\n", " c. 'sst' - timeseries of sea surface tempature\n", " d. 'gps' - timeseries of buoy latitude and longitude\n", " e. 'dwr' - directional waverider contains information about accelerometer and buoy batery level\n", " f. 'wave2D' - dictionary of dataframes which are of length time and have columns of wave frequency.\n", " \n", "By calling on the dictionary key ['data']['wave'] we can see the associated DataFrame for the wave data. The DataFrame is summarized below where one can see both the time index and values of each column." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | waveTime | \n", "waveFlagPrimary | \n", "waveFlagSecondary | \n", "waveHs | \n", "waveTp | \n", "waveTa | \n", "waveDp | \n", "wavePeakPSD | \n", "waveTz | \n", "waveSourceIndex | \n", "
---|---|---|---|---|---|---|---|---|---|---|
2001-01-30 00:17:11 | \n", "9.808138e+08 | \n", "1.0 | \n", "0.0 | \n", "0.97 | \n", "14.285714 | \n", "5.635270 | \n", "275.96875 | \n", "1.255072 | \n", "4.597701 | \n", "1.0 | \n", "
2001-01-30 00:47:10 | \n", "9.808156e+08 | \n", "1.0 | \n", "0.0 | \n", "0.95 | \n", "14.285714 | \n", "5.248230 | \n", "266.12500 | \n", "1.022441 | \n", "4.347826 | \n", "1.0 | \n", "
2001-01-30 01:17:11 | \n", "9.808174e+08 | \n", "1.0 | \n", "0.0 | \n", "0.93 | \n", "15.384616 | \n", "4.967487 | \n", "252.06250 | \n", "0.607862 | \n", "4.166667 | \n", "1.0 | \n", "
2001-01-30 01:47:10 | \n", "9.808192e+08 | \n", "1.0 | \n", "0.0 | \n", "1.03 | \n", "13.333333 | \n", "5.265260 | \n", "275.96875 | \n", "1.728395 | \n", "4.395604 | \n", "1.0 | \n", "
2001-01-30 02:17:10 | \n", "9.808210e+08 | \n", "1.0 | \n", "0.0 | \n", "1.00 | \n", "13.333333 | \n", "5.288306 | \n", "275.96875 | \n", "1.107597 | \n", "4.395604 | \n", "1.0 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
2021-02-19 06:30:00 | \n", "1.613716e+09 | \n", "1.0 | \n", "0.0 | \n", "1.22 | \n", "15.384616 | \n", "7.991556 | \n", "198.03125 | \n", "4.272474 | \n", "6.060606 | \n", "17.0 | \n", "
2021-02-19 07:00:00 | \n", "1.613718e+09 | \n", "1.0 | \n", "0.0 | \n", "1.12 | \n", "14.285714 | \n", "7.378790 | \n", "237.40625 | \n", "2.264136 | \n", "5.797101 | \n", "17.0 | \n", "
2021-02-19 07:30:00 | \n", "1.613720e+09 | \n", "1.0 | \n", "0.0 | \n", "1.13 | \n", "14.285714 | \n", "7.833025 | \n", "237.40625 | \n", "3.071583 | \n", "6.153846 | \n", "17.0 | \n", "
2021-02-19 08:00:00 | \n", "1.613722e+09 | \n", "1.0 | \n", "0.0 | \n", "1.07 | \n", "13.333333 | \n", "7.808791 | \n", "244.43750 | \n", "2.477359 | \n", "6.153846 | \n", "17.0 | \n", "
2021-02-19 08:30:00 | \n", "1.613723e+09 | \n", "1.0 | \n", "0.0 | \n", "1.09 | \n", "15.384616 | \n", "8.456770 | \n", "195.21875 | \n", "2.489777 | \n", "6.666667 | \n", "17.0 | \n", "
345608 rows × 10 columns
\n", "