Supplementary data for the paper 'Putting ChatGPT Vision (GPT-4V) to the test: Risk perception in traffic images'

DOI:10.4121/dfbe6de4-d559-49cd-a7c6-9bebe5d43d50.v2

The DOI displayed above is for this specific version of this dataset, which is currently the latest. Newer versions may be published in the future. For a link that will always point to the latest version, please use
DOI: 10.4121/dfbe6de4-d559-49cd-a7c6-9bebe5d43d50

Datacite citation style

Driessen, Tom; Dodou, Dimitra; Bazilinskyy, Pavlo; de Winter, Joost (2024): Supplementary data for the paper 'Putting ChatGPT Vision (GPT-4V) to the test: Risk perception in traffic images'. Version 2. 4TU.ResearchData. dataset. https://doi.org/10.4121/dfbe6de4-d559-49cd-a7c6-9bebe5d43d50.v2

Other citation styles (APA, Harvard, MLA, Vancouver, Chicago, IEEE) available at Datacite

Dataset

Version 2 - 2024-04-16 (latest)

Version 1 - 2024-04-15

Usage statistics

320

views

425

downloads

Keywords

ChatGPT GPT-4V risk assessment traffic vision-language models

Licence

CC BY 4.0

Interoperability

RO-Crate Metadata

Export as...

RefWorks BibTeX Reference Manager Endnote DataCite NLM DC CFF

by Tom Driessen, Dimitra Dodou

, Pavlo Bazilinskyy

, Joost de Winter

Vision-language models are of interest in various domains, including automated driving, where computer vision techniques can accurately detect road users, but where the vehicle sometimes fails to understand context. This study examined the effectiveness of GPT-4V in predicting the level of ‘risk’ in traffic images as assessed by humans. We used 210 static images taken from a moving vehicle, each previously rated by approximately 650 people. Based on psychometric construct theory and using insights from the self-consistency prompting method, we formulated three hypotheses: 1) repeating the prompt under effectively identical conditions increases validity, 2) varying the prompt text and extracting a total score increases validity compared to using a single prompt, and 3) in a multiple regression analysis, the incorporation of object detection features, alongside the GPT-4V-based risk rating, significantly contributes to improving the model’s validity. Validity was quantified by the correlation coefficient with human risk scores, across the 210 images. The results confirmed the three hypotheses. The eventual validity coefficient was r = 0.83, indicating that population-level human risk can be predicted using AI with a high degree of accuracy. The findings suggest that GPT-4V must be prompted in a way equivalent to how humans fill out a multi-item questionnaire.

History

2024-04-15 first online

2024-04-16 published, posted

Publisher

4TU.ResearchData

Format

scripts/m; data/mat; data/xlsx; images/png

Funding

Organizations

Delft University of Technology, Faculty of Mechanical Engineering,
Eindhoven University of Technology, Department of Industrial Design

DATA

Files (3)

4,557 bytesMD5:1d6942ad0858b1713b3043c006c45e3breadme.txt
8,127,900,020 bytesMD5:5fec1cbdb9a98ded1627b9699f490917Composite image creation.zip
1,109,437,113 bytesMD5:f98972e7076536d90534498b1d67da34Supplementary material.zip
download all files (zip)
9,237,341,690 bytes unzipped