cff-version: 1.2.0 abstract: "

This dataset belongs to the PhD thesis of Céline Cleij titled "Building the genome of a minimal synthetic cell".

Specifically, the dataset belongs to Chapter 4 titled "De novo design and assembly of minimal genomes for the synthetic cell".


Authors: Céline Cleij, Pascale Daran-Lapujade, Christophe Danelon

Corresponding authors: Pascale Daran-Lapujade and Christophe Danelon

Contact information: p.a.s.daran-lapujade@tudelft.nl and danelon@insa-toulouse.fr


This dataset contains data collected during experiments as part of Céline Cleij's PhD project. The data was collected from 2023-2025.


All data processing and analysis steps are described in detail in the Methods section of thesis chapter 4.

Designed SynMG sequences (GenBank) were prepared with the SnapGene software, using the plasmid maps of the sequenced template plasmids and the designed primer sequences.

Raw Nanopore sequencing reads (fastq) were obtained by Plasmidsaurus (Eugene, OR, USA) using Nanopore sequencing technology.

Consensus SynMG sequences (GenBank) were obtained by Plasmidsaurus after processing of the raw reads, and were manually annotated in SnapGene.

The overview of relevant mutations ("Relevant mutations in SynMG variants") was prepared in Excel, based on mutations in consensus sequences and raw reads obtained from sequencing.

LC-MS data was obtained after processing in the Mascot software.



The data is grouped into seven files:

i) Designed SynMG sequences. Files are named after the SynMG version (SynMG1 or SynMG2).

ii) S. cerevisiae - Raw Nanopore sequencing reads. Files are named after the yeast strain from which total DNA was extracted, and after the SynMG variant which was assembled in this strain.

iii) S. cerevisiae - Consensus SynMG sequences. Files are named after the yeast strain from which total DNA was extracted, and after the SynMG variant which was assembled in this strain.

iv) E. coli - Raw Nanopore sequencing reads. Files are named after the E. coli strain from which SynChr DNA was extracted, and after the SynMG variant which was amplified in this strain.

v) E. coli - Consensus SynMG sequences. Files are named after the E. coli strain from which SynChr DNA was extracted, and after the SynMG variant which was amplified in this strain.

vi) Relevant mutations in SynMG variants. This Excel file contains an overview of all relevant mutations SynMG1.1, SynMG1.2, SynMG1.3, SynMG2.1 and SynMG2.2 compared to the designed maps.

vii) LC-MS data. This Excel file contains LC-MS data used for making Figure 4A, B & D.


" authors: - family-names: Cleij given-names: Céline orcid: "https://orcid.org/0000-0001-6580-1106" - family-names: Daran-Lapujade given-names: Pascale orcid: "https://orcid.org/0000-0002-4097-7831" - family-names: Danelon given-names: Christophe title: "Data underlying chapter 4 of PhD thesis: Building the genome of a minimal synthetic cell" keywords: version: 1 identifiers: - type: doi value: 10.4121/ad21c652-ad75-4a99-a09a-46c7d8f383d6.v1 license: CC BY-NC 4.0 date-released: 2025-03-27