Dataset of trash and water segmentations in riverine environments

doi:10.4121/90d13261-b0fe-444a-b408-c5a63db3d887.v1
The doi above is for this specific version of this dataset, which is currently the latest. Newer versions may be published in the future. For a link that will always point to the latest version, please use
doi: 10.4121/90d13261-b0fe-444a-b408-c5a63db3d887
Datacite citation style:
Don, Marga; Pinson, Stijn; Guillen Cebrian, Blanca (2024): Dataset of trash and water segmentations in riverine environments. Version 1. 4TU.ResearchData. dataset. https://doi.org/10.4121/90d13261-b0fe-444a-b408-c5a63db3d887.v1
Other citation styles (APA, Harvard, MLA, Vancouver, Chicago, IEEE) available at Datacite
Dataset


***General Introduction***

This dataset contains images of trash patches in riverine environments and corresponding segmentation of the trash, possible barriers and water.

It is being made public both to act as supplementary data for publications of M. Don in her conference paper 'Foundation Model or Finetune? Evaluation of

few-shot semantic segmentation for river pollution' and in order for other researchers to use this data in their own work.

The data in this dataset was collected as part of the operations of The Ocean Cleanup between 2020 and 2023 and comes from multiple locations around the world and contains 300 images together with annotations of trash.


***Purpose of the data***

The data was collected and annotated to research if estimations of trash loads can be made based on segmentation of that trash locations around rivers

with the goal of assessing debris loads and debris loads dynamics in these rivers as well as assessing efficacy of the barriers and extraction operations.


***Description of the data in this data set***

The data in the dataset has been organized in three folders:


-images. This contains 6 subfolders of the 6 locations from which the images are collected. each of these folders contains roughly 50 images in jpg format. There are mixed 5MP and 12MP camera images in different resolution from a range of timestamps per location.

   - the name of the folder identifies the location, using identifiers 1-6

   - the name of the image is set as: <unix timestamp>_<iso timestamp>_<device serial>.jpg

-annotations. This folder contains the annotations in COCO format. It contains two files:

   - annotation.json: all annotations in coco format.

   - split_mapping.json: a file denoting how the dataset is split amongst the different train/test splits used in the paper by M. Don. The GitHub repository corresponding to the paper contains instructions on generating these splits.

-pretrained_yolo_models. with the resulting yolo ultralytics models used in the conference paper.

   - different_train_sizes: this contains all models trained with the different training/validation splits. naming train_training%.pt. so train10.pt means 10% of the dataset is used for training.

   - generalization_loc6: models trained on loc1-loc5 and finetuned to location 6 according to train<nr_images>_epochs<nr_epochs>.pt

   - trained_one_location: the models trained on only data from their respective locations in 80/20 split.

history
  • 2024-09-25 first online, published, posted
publisher
4TU.ResearchData
format
image/jpg, annotations/json, models/.pt
organizations
The Ocean Cleanup Foundation

DATA

files (1)