NEESI DATASET
Numerical Experiments of Estuarine Salt Intrusion (NEESI) dataset.

DATE OF RELEASE
March 16, 2023

CONTACT INFORMATION
Gijs G. Hendrickx
Delft University of Technology
G.G.Hendrickx@tudelft.nl

OVERVIEW
The data set contains the processed data of 1252 simulations using Delft3D
Flexible Mesh (DFM) in which estuaries were designed using a parametric 
design. Every estuary design is based on thirteen (13) input parameters: three
(3) boundary conditions, and ten (10) geomorphological characteristics. The 
output is represented by two (2) variables: (1) the salt intrusion length, 
'L'; and (2) the salt variability, 'V'. Simulations are carried out over a 
span of nine (9) days of which the first eight (8) are considered spin-up; 
i.e., one (1) day of the simulation is used for further post-processing. The 
salt intrusion length is a depth- and tide-averaged estimation of the salt 
intrusion of this last day; and the salt variability an estimate of the 
difference between the maximum salinity and the minimum salinity over the 
tide, depth- and spatially- averaged. The various settings of the simulations 
are drawn using machine learning techniques:

- MDA (maximum dissimilarity algorithm): runs [0, 100);
- TGP-LLM (treed Gaussian process, limiting linear model): runs [100, 1202);
- GA (genetic algorithm): runs [1202, 1252).

More details on the sampling and subsequent post-processing of the data are 
described in Hendrickx et al. (2023).

REFERENCES
Hendrickx, G.G., Antolinez, J.A.A., and Herman, P.M.J. (2023). Predicting the
  response of complex systems for coastal management. Coastal Engineering.

CONTENTS
- output.csv: 
  Contains the input and output data of all 1252 simulations, i.e. the *.csv-
  file is a 1252 x 15 matrix.

DATA FORMAT
- output.csv:
  *.csv-file including string-headers.

DATA STRUCTURE
The data set contains a single *.csv-file in which fifteen (15) variables are
written. Every line of the file contains a single sample, i.e. simulation. The
included variables are the following (units in square brackets):
 1. 'tidal_range' [m]: The tidal range (twice the tidal amplitude) imposed at 
    the offshore boundary;
 2. 'surge_level' [m]: The storm surge level imposed at the offshore boundary;
 3. 'river_discharge' [m3 s-1]: The river discharge imposed as the upstream 
    boundary;
 4. 'channel_depth' [m]: The depth of the channel (thalweg) at the estuary 
    mouth;
 5. 'channel_width' [m]: The width of the channel (thalweg) at the estuary 
    mouth;
 6. 'channel_friction' [s m-1/3]: The bottom friction in the channel (thalweg) 
    defined as Manning's n;
 7. 'convergence' [m-1]: The convergence of the estuarine width, equivalent to
    the inverse of the convergence length;
 8. 'flat_depth_ratio' [-]: The ratio between the depth of the intertidal flat
    with respect to the tidal range (depth = 0.5 * ratio * range), with a
    domain of [-1, 1] (-1: intertidal flat at high water; +1: intertidal flat 
    at low water);
 9. 'flat_width' [m]: The width of the intertidal flats, which is equally
    distributed at both sides of the channel (thalweg);
10. 'flat_friction' [s m-1/3]: The bottom friction on the intertidal flats
    defined as Manning's n;
11. 'bottom_curvature' [m-1]: The curvature of the bottom profile in the 
    channel (thalweg) modifying the depth over the lateral axis, while the
    lateral-averaged depth remains the same;
12. 'meander_amplitude' [m]: The amplitude of the sine-wave describing the
    meandering of the estuary.
13. 'meander_length' [m]: The wavelength of the sine-wave describing the
    meandering of the estuary.
14. 'L' [-]: The salt intrusion length normalised by the estuary length, which
    equals 200 km (de-normalisation requires the multiplication by 200 km);
15. 'V' [-]: The salt variability normalised by the offshore salinity, which
    equals 30 psu (de-normalisation requires the multiplication by 30 psu).

DATA CLEANING
The data set contains processed data of 1,252 DFM simulations for which time-
series of the last simulated day have been used to determine the salt 
intrusion length ('L'), and the salt variability ('V'). These time-series have
been stored along the thalweg of the whole estuary, including an extension of
1,000 metres into the offshore domain. Every 625 metres, a virtual observation
location was placed and the data in between has been linearly interpolated.
This results in a discrete representation of a cross-section of the estuary
along the thalweg.

DATA LIMITATIONS
- The numerical simulations reflect idealised settings due which no 
  calibration of the model performance was possible. Therefore, the absolute
  values of salt intrusion/variability should be taken with a grain of salt,
  due which we recommend to focus on the changes in the output data due to 
  changes in the input data, correlations between the variables, etc.
- The input variables are not completely independent from each other, as the
  parametric design required the inclusion of an input check that verified the
  physical validity of the samples, i.e. combination of input variables. These
  checks are included in the Supplementary Information of Hendrickx et al. 
  (2023).
- The storm surge ('surge_level') is expected to be ill-implemented in the
  model simulations.

USAGE
Any software or programming language that can read *.csv-files can be used to
analyse the data. Note that the first line of "output.csv" contains the names
of the input and output variables as listed above.

LICENSE
Apache-2.0