TY - DATA T1 - SAFPredDB: Bacterial synteny database PY - 2024/11/28 AU - Aysun Urhan AU - Bianca-Maria Cosma AU - Ashlee M. Earl AU - Abigail L. Manson AU - Thomas Abeel UR - DO - 10.4121/ac84802e-853f-46f1-9786-b9d29c0f7557.v2 KW - bionformatics KW - microbial genomics KW - genomics KW - protein language model KW - bacterial genomics KW - comparative genomics KW - protein embeddings KW - sequence analysis KW - bacterial synteny N2 - <p>SAFPredDB is a bacterial synteny database built for the gene function prediction tool SAFPred, Synteny Aware Function Predictor. The database is a collection of conserved synteny and operons found across the bacterial kingdom. First, we formulated a synteny model based on experimentally known operons and the genomic features common in bacteria. We designed a bottoms-up, purely computational approach to build our database based on the proposed synteny model using complete bacterial genome assemblies from the Genome Taxonomy Database (GTDB).</p><p><br></p><p>Although we initially built SAFPred for our prediction tool only, it can be used for other purposes where such a catalog is needed. As a standalone database, it can be queried to mine information about conserved genomic patterns in bacteria. In addition, it can be updated as newer assemblies are added to GTDB.</p> ER -