Data underlying the publication: "GONNECT: Coupling Biological Systems to Neural Networks for Improved Model Interpretability"
DOI: 10.4121/0d78788b-6bd7-4941-a942-245309107b6d
Datacite citation style
Dataset
Licence CC BY 4.0
Interoperability
This dataset contains all processed data required to reproduce the results of the GONNECT paper. In this work, we couple the structure of a neural network model to biological prior information, to gain interpretable activations in the neural network's hidden layers (see preprint/publication for more information). The data presented here includes processed gene expression data from The Cancer Genome Atlas (TCGA) data that is used as input data for the model, and both the raw and processed Gene Ontology (GO, https://geneontology.org/) knowledge base, from which the structure of the neural networks in this study is derived. There are also some miscellaneous files used to link genes to proteins. All data used in this study is publically available.
Code corresponding to the paper can be found here: https://github.com/DelftBioinformaticsLab/GONNECT
History
- 2025-11-07 first online, published, posted
Publisher
4TU.ResearchDataFormat
csv, csv.gz, txt, tsv, pkl, gafOrganizations
TU Delft, Faculty of Electrical Engineering, Mathematics and Computer Science, Department of Intelligent Systems, Pattern Recognition and BioinformaticsDATA
Files (13)
- 5,646 bytesMD5:
77b32ad5c3eb526400165a16bbdbb00eREADME.md - 4,269,355 bytesMD5:
bbc11ce7fbbb1593e16e415a485df0eaclinical.csv - 1,248,774,035 bytesMD5:
b6acf503f02a033e8e66e3669cfd3ec5expression.pkl - 495,790 bytesMD5:
006fcd2ceca85c1ad444f9b92020477cgene_go_bp_id.txt - 31,751,046 bytesMD5:
3fc61b6459f38fecea3e0dd0c38d575bgo-basic.obo - 220 bytesMD5:
1161a2d006cada764f51532605377d7fGO_TCGA_info.txt - 149,778,374 bytesMD5:
ce37e422721f86b580f93720461a6aa4goa_human.gaf - 320,659,551 bytesMD5:
4b768302b89e69e3caef3fded1b020feTCGA_complete_bp.csv.gz - 420,189,164 bytesMD5:
152539e38447a1a0672fdee2766a0a9dTCGA_complete_bp_norm.csv.gz - 23,852,309 bytesMD5:
5f642f4d14bcfbfde83d4bd226401039TCGA_complete_bp_top1k.csv.gz - 583,811 bytesMD5:
2e5ba065b4f38fbed46dd8e75d913013TCGA_complete_full_gene_list.txt - 253,345 bytesMD5:
36477e61b7318e06a39b23f7cfa5701dTCGA_complete_gene_id_pairs_in_go_bp.txt - 2,138,140 bytesMD5:
b3721893eb5d1e93eabfc6cf8514c056TCGA_complete_name2uniprotkb.tsv -
download all files (zip)
2,202,750,786 bytes unzipped





