Data and code underlying the publication: Emergence of novel SARS-CoV-2 variants in the Netherlands
DOI: 10.4121/11bff1ea-4784-463e-90d0-eb2e2b64fe96
Datacite citation style
Dataset
Categories
Licence MIT
SARS-CoV-2 genome dataset to accompany our publication: Emergence of novel SARS-CoV-2 variants in the Netherlands from Scientific Reports [1].
Complete, high quality (number of undetermined bases less than 1% of the whole sequence) genome sequences of SARS-COV-2 that were isolated from human hosts only were obtained from GISAID, NCBI and China’s National Genomics Data Center (NGDC) on June 13th 2020. The dataset contained 29,503 sequences with unique identifiers in total, including the Wuhan-Hu-1 reference sequence (accession ID NC_045512.2). The “Collection date” field was also extracted for all sequences, and it is referred to as “date” throughout our work.
The acknowledgement table for GISAID sequences can be found in Supplementary file 2 and the full list of sequence identifiers for NCBI and NGDC records are provided in Supplementary file 3 in the corresponding publication [1].
[1]: Urhan, A., Abeel, T. Emergence of novel SARS-CoV-2 variants in the Netherlands. Sci Rep 11, 6625 (2021). https://doi.org/10.1038/s41598-021-85363-7
History
- 2025-04-07 first online, published, posted
Publisher
4TU.ResearchDataFormat
fastaAssociated peer-reviewed publication
Emergence of novel SARS-CoV-2 variants in the NetherlandsOrganizations
TU Delft, Faculty of Electrical Engineering, Mathematics and Computer Science, The Delft Bioinformatics LabDATA
Files (4)
- 3,515 bytesMD5:
788207a3238f9b11a8e8f94b54e2fde3
README.md - 6,887,897 bytesMD5:
ce8108121e4fd92dafcc69a0c42dea3c
sarscov2_metadata.tsv - 24,874,189 bytesMD5:
ec17e69ba9faeeccd7e8f615741418c6
sarscov2_mutations_coronapp.tsv - 896,338,972 bytesMD5:
6fdb926a766b7c24a10f7a7374d694f4
sarscov2_sequences.fasta -
download all files (zip)
928,104,573 bytes unzipped