TY - DATA T1 - Dataset of trash and water segmentations in riverine environments PY - 2024/09/25 AU - Marga Don AU - Stijn Pinson AU - Blanca Guillen Cebrian UR - DO - 10.4121/90d13261-b0fe-444a-b408-c5a63db3d887.v1 KW - macroplastic KW - plastic pollution KW - marine debris KW - riverine trash KW - river KW - segmentation KW - detection N2 -


***General Introduction***

This dataset contains images of trash patches in riverine environments and corresponding segmentation of the trash, possible barriers and water.

It is being made public both to act as supplementary data for publications of M. Don in her conference paper 'Foundation Model or Finetune? Evaluation of

few-shot semantic segmentation for river pollution' and in order for other researchers to use this data in their own work.

The data in this dataset was collected as part of the operations of The Ocean Cleanup between 2020 and 2023 and comes from multiple locations around the world and contains 300 images together with annotations of trash.


***Purpose of the data***

The data was collected and annotated to research if estimations of trash loads can be made based on segmentation of that trash locations around rivers

with the goal of assessing debris loads and debris loads dynamics in these rivers as well as assessing efficacy of the barriers and extraction operations.


***Description of the data in this data set***

The data in the dataset has been organized in three folders:


-images. This contains 6 subfolders of the 6 locations from which the images are collected. each of these folders contains roughly 50 images in jpg format. There are mixed 5MP and 12MP camera images in different resolution from a range of timestamps per location.

   - the name of the folder identifies the location, using identifiers 1-6

   - the name of the image is set as: <unix timestamp>_<iso timestamp>_<device serial>.jpg

-annotations. This folder contains the annotations in COCO format. It contains two files:

   - annotation.json: all annotations in coco format.

   - split_mapping.json: a file denoting how the dataset is split amongst the different train/test splits used in the paper by M. Don. The GitHub repository corresponding to the paper contains instructions on generating these splits.

-pretrained_yolo_models. with the resulting yolo ultralytics models used in the conference paper.

   - different_train_sizes: this contains all models trained with the different training/validation splits. naming train_training%.pt. so train10.pt means 10% of the dataset is used for training.

   - generalization_loc6: models trained on loc1-loc5 and finetuned to location 6 according to train<nr_images>_epochs<nr_epochs>.pt

   - trained_one_location: the models trained on only data from their respective locations in 80/20 split.

ER -