TY - DATA T1 - Code associated to MSc thesis: Using AI/ML to identify propagation characteristics of European heatwaves PY - 2025/01/13 AU - J.S.T. van der Geest UR - DO - 10.4121/466aa0d2-9685-438c-a3fe-317b35eb23ff.v1 KW - complex networks KW - European heatwaves KW - AI/ML KW - Artificial Intelligence (AI) KW - Machine Learning (ML) N2 -
The aim of this MSc thesis is to propose proper definitions of heatwave sources and sinks as well as exploring their influence on Europe using complex network variables and AI/ML respectively. Such complex network variables show relations amongst geographical locations representing the movement of heat from a heatwave. The thesis was split up into two parts, where part one related to investigating the data to come up with proposed definitions for sources and sinks, and part two would take these definitions and propose machine learning methods to generalize the definitions to the available climatological period of 1990 to 2020. In the end, it was found that the input degree (ID), and output degree (OD) can best define sources and sink, as they describe the movement of heat amongst the network/ geographical locations. This resulted in sources and sinks being defined daily, where sources are defined by the maximum OD and minimum ID on a given day, whereas sinks would be defined by the minimum OD and maximum ID on a given day. Following these definitions, multiple nodes can be classified as a source or sink at the same time resulting in cloud-like structures. The definitions were further investigated using climatological expertise and unsupervised learning (i.e. one-class SVM) to explore if a clear distinction can be made between the two. Next, the proposed definitions were used to identify optimal models for generalizing to the available climatological period of 1990 to 2020. After extensive training and testing of six machine learning models (i.e. decision trees, random forest, adaptive boosting, gradient boosting, support vector machine, and gaussian naïve bayes), gaussian naïve bayes (GNB), gradient boosting (GB), and random forest (RF) were able to outperform the remaining three models recording higher f1-scores.
ER -