SAFPredDB: Bacterial synteny database
DOI: 10.4121/ac84802e-853f-46f1-9786-b9d29c0f7557
Datacite citation style
Dataset
SAFPredDB is a bacterial synteny database built for the gene function prediction tool SAFPred, Synteny Aware Function Predictor. The database is a collection of conserved synteny and operons found across the bacterial kingdom. First, we formulated a synteny model based on experimentally known operons and the genomic features common in bacteria. We designed a bottoms-up, purely computational approach to build our database based on the proposed synteny model using complete bacterial genome assemblies from the Genome Taxonomy Database (GTDB).
Although we initially built SAFPred for our prediction tool only, it can be used for other purposes where such a catalog is needed. As a standalone database, it can be queried to mine information about conserved genomic patterns in bacteria. In addition, it can be updated as newer assemblies are added to GTDB.
History
- 2024-04-05 first online
- 2024-11-28 published, posted
Publisher
4TU.ResearchDataFormat
Gzipped pickle fileAssociated peer-reviewed publication
SAFPred: synteny-aware gene function prediction for bacteria using protein embeddingsReferences
Organizations
TU Delft, Faculty of Electrical Engineering, Mathematics and Computer Science, Department of Intelligent Systems, Delft Bioinformatics Lab;Broad Institute of MIT and Harvard, Infectious Disease and Microbiome Program
DATA
Files (4)
- 3,575 bytesMD5:25eea7001dd366b4ab7f492de20c8900README.md
- 325,656,953 bytesMD5:be2a05217809ad2d2339957dd1bcf385safpreddb_cluster_dict.pkl
- 1,929,538,045 bytesMD5:fd1ed5a32279aa628e273533f0fc0f1csafpreddb_full_emb.pkl
- 6,111,583 bytesMD5:f022a0fc61d66d304c203f5343b45e0asafpreddb_full_nr.pkl.gz
- 
                    download all files (zip)
                2,261,310,156 bytes unzipped





