Metagenomic analysis was conducted on our oral biofilm model to assess the viability of the model as a proxy for dental calculus.
A total of 35 biofilm model samples were taken throughout the course of the experiment and includes saliva inoculate, media from the wells, and the biofilm end-product. DNA extraction was performed at the archaeogenetic facility at the Max Planck Institute for the Science of Human History (Jena, Germany).
Extractions were performed in duplicates. A total of DNA extracts.
DNeasy PowerSoil Kit from QIAGEN. C2 inhibitor removal step skipped, going directly to C3 step.
were paired-end sequenced on a NextSeq (2 color chemistry) to 150bp
Processing of the raw DNA reads was conducted using the nf-core/eager, v2.4.4 pipeline [@yatesEAGER2020]. Adapter removal and read merging was performed using AdapterRemoval, v2.3.2 [@AdapterRemovalv2]. Merged reads were mapped to the human reference genome, GRCh38, using BWA, v0.7.17-r1188 [@BWA] with default settings (-n 0.01; -l 32), and unmapped reads were extracted using Samtools, v1.12.
Metagenomic classification was conducted in kraken, v2.1.2 using the Standard database (https://benlangmead.github.io/aws-indexes/k2).
Kraken output reports were combined and converted to OTU tables, and only species-level assignments were selected for downstream analysis 02-scripts/00-comb-kraken-reports.R. The OTU tables were further filtered by removing species with relative abundance lower than 0.001%.
Library_Id | Quantification_post-Indexing_total |
---|---|
LIB030.A0117 | 7.10400e+09 |
SYN001.A0101 | 5.32800e+11 |
SYN002.A0101 | 1.29120e+12 |
SYN003.A0101 | 6.40320e+11 |
SYN005.I0101 | 8.57280e+13 |
SYN006.I0101 | 6.51360e+13 |
SYN008.I0101 | 4.37280e+13 |
SYN009.I0101 | 2.46384e+12 |
SYN012.I0101 | 5.73120e+12 |
SYN013.I0101 | 3.45888e+12 |
SYN014.A0101 | 5.50560e+12 |
SYN014.C0101 | 2.37072e+14 |
SYN015.B0101 | 6.16320e+12 |
SYN015.D0101 | 2.31120e+12 |
SYN015.E0101 | 1.02528e+13 |
SYN015.F0101 | 1.62336e+12 |
SYN015.G0101 | 8.04480e+12 |
SYN015.H0101 | 2.35344e+12 |
SYN015.I0101 | 7.27200e+12 |
SYN016.I0101 | 1.37520e+13 |
SYN017.A0101 | 6.05280e+12 |
SYN017.B0101 | 1.05456e+13 |
SYN017.D0101 | 6.79200e+11 |
SYN017.E0101 | 1.51536e+12 |
SYN017.F0101 | 8.56800e+11 |
SYN017.G0101 | 7.26240e+11 |
SYN018.C0101 | 1.67664e+12 |
SYN018.H0101 | 6.95040e+11 |
SYN018.I0101 | 2.67840e+11 |
SYN019.I0101 | 1.23312e+12 |
SYN021.I0101 | 2.58144e+12 |
SYN022.I0101 | 1.57920e+12 |
SYN023.I0101 | 1.15008e+13 |
SYN025.I0101 | 3.73584e+12 |
SYN026.I0101 | 4.87680e+12 |
SYN028.I0101 | 1.47888e+14 |
SourceTracker [@knightsSourceTracker2011] was used to estimate source composition of the oral biofilm model samples using a Bayesian framework. Samples were compared with oral and environmental controls to detect potential external contamination.
Potential contaminants were identified using the frequency and prevalence method in the decontam v1.16.0 [@R-decontam] R package. 02-scripts/02-authentication.R. Samples from indoor_air, skin, and sediment were used as negative controls for the prevalence method with a probability threshold of 0.01. DNA concentrations were used for the ‘frequency’ method with a probability threshold of 0.99 and negative controls.
Putative contaminants were filtered out of the OTU tables for all downstream analyses.
See 06-reports/metagen-authentication.Rmd for more details.
Genus- and species-level OTU tables were prepared from the decontaminated OTU table, and relative abundance of species in a sample was calculated as recommended for compositional data [@gloorMicrobiomeDatasets2017].
Information on the oxygen tolerance of bacterial species was downloaded from BacDive on 2022-05-27, 2022-06-29, 2022-08-26. A list of amylase-binding streptococci (ABS) was created based on [@nikitkovaStarchBiofilms2013].
Alpha-diversity, specifically the Shannon Index, was calculated to compare species richness and diversity across experimental and comparative oral samples. Shannon Index was calculated using the R package vegan v2.6.2 [@Rvegan].
Sparse principal components analysis (sPCA) was conducted on centered-log-ratio (CLR) transformed species-level counts using the R package mixOmics v6.20.0 [@RmixOmics]. The mixOmics implementation of sPCA uses a LASSO penalisation to eliminate unimportant variables. Two separate analyses were conducted: 1) on experiment samples to assess the difference in sample types and biofilm age; and 2) on oral comparative samples and biofilm model end-products to explore community differences between in vitro and in vivo biofilms.
Loadings obtained from the sPCA analyses were used in combination with CLR-transformed counts to create heatmaps of species- and genus-level counts within the experiment and across comparative samples.
Beta-diversity, Bray Curtis dissimilarity.
Differential abundance is calculated using the ANCOM-BC method from the ANCOMBC v1.6.1 R package [@linANCOMBC2020]. P-values adjusted using the false discovery rate (FDR) method. Samples grouped by sample type (i.e. saliva, plaque, modern calculus, model calculus).
First, differential abundance of experimental samples were calculated, comparing abundance between donated saliva, medium, and the model calculus. Then, differential abundance of oral reference samples and model calculus was calculated.
Results from SourceTracker indicated a large portion of species from
most samples are from an oral origin (plaque, saliva, or calculus). Some
samples contained a large proportion of species from potential
contaminants (indoor air) and of unknown origin. Potential contaminants
were compared to a database of oral bacteria to see the proportion of
known oral species were present in the samples. Many of the species
assigned to indoor_air
are known oral species (Figure
@ref(fig:st-plot)).
Plot of estimated contributions of various sources to the model calculus and model saliva samples. Samples are arranged from top to bottom by how late in the experiment they were sampled, with bottom being the earliest samples. Sample names in red indicate samples that were removed from further analysis due to contamination.
Based on these results, samples SYN015.F0101, SYN015.G0101, SYN015.H0101, SYN017.F0101, SYN017.G0101, SYN018.H0101, SYN013.I0101, SYN016.I0101 were removed from further analysis. The removed samples were all sampled late in the experiment (day 18+).
Cumulative percent decay plots.
A total of 5424 potential contaminants were removed. Remaining counts of species in the biofilm samples ranged between 88 and 284 with a mean of 181.59.
Remaining number of species per sample following removal of contaminants.
Alpha-diversity. Days are grouped to increase sample sizes:
inoc
) = days 0,3,5treatm
) = days 7,9,12,15final
) = day 24Plot of alpha-diversity indices across experiment samples grouped by sampling time.
Alpha-diversity sees a slight decrease over the course of the experiment. The model calculus is less variable than the initial and middle biofilm samples. Compared to other oral samples, there is lower species diversity and richness in the medium and model calculus.
Information on the oxygen tolerance of bacterial species was downloaded from BacDive on . A total of 67 out of 1129 (5.9%) species did not have a match on BacDive. These were assigned oxygen tolerance based on the most common occurrence within the genus they belong.
A list of amylase-binding streptococci (ABS) was created based on @nikitkovaStarchBiofilms2013.
The distribution of oxygen tolerance (left) and ABS (right) in experimental samples.
Scree plot of principal components from the sPCA on experiment samples.
sPCA on species-level counts from experiment samples only.
sPCA on species-level counts from experiment samples only.
sPCA on species-level counts from experiment samples only.
PC1 separates most of the samples types, i.e., saliva, medium, and model calculus, as well as increasing sample age (from left to right). PC2 separates medium from saliva and model calculus, as well as early medium samples from later medium samples. Negative loadings consist of aerobes, anaerobes, and facultative anaerobes, while positive loadings are mostly anaerobes. The main driver of differences between early and late samples seems to be that earlier samples are enriched with aerobes and facultative anaerobes, while later samples are dominated by anaerobes.
The core microbiome of plaque and calculus samples.
Mean relative abundances at the genus level between sample types. NA = other genera.
The main overlap between the model calculus and oral comparative samples are the high relative abundance of Streptococcus. Model calculus consists mostly of Enterococcus and Veillonella spp., while oral comparative samples are more diverse (consistent with results from alpha diversity).
The most notable differences between model calculus and oral samples are the lower proportion of ABS in and higher proportion of anaerobes in model calculus, as well as the absence of aerobes.
Scree plot of principal components from the sPCA on comparative samples and model calculus.
sPCA on species-level counts from comparative oral samples and model calculus.
sPCA on species-level counts from comparative oral samples and model calculus.
sPCA on species-level counts from comparative oral samples and model calculus.
sPCA on species-level counts from comparative oral samples and model calculus.
sPCA on species-level counts from comparative oral samples and model calculus.
sPCA on species-level counts from comparative oral samples and model calculus.
PC1 separates in vivo (negative) from in vitro (positive) oral samples. PC2 separates the model calculus (positive) from the comparative in vitro biofilm (negative). The positive PC1 loadings are dominated by Enterococcus spp., Lactobacillus spp., and aerobes. Negative PC1 loadings are mixed oxygen tolerant and dominated by Capnocytophaga and Neisseria spp. Model calculus has a unique signature on PC1 and PC2, driven largely by Enterococcus spp.
Heat maps of the top 100 negative loadings (top) and top 100 positive loadings (bottom) from the sPCA. Colour is centered log-ratio transformed species counts. Light is more abundant.
Enterococcus faecalis, Enterococcus casseliflavus, and Enterococcus durans are more abundant in the in vitro biofilm samples than in vivo samples of plaque and calculus. Conversely, Neisseria, Actinomyces, and Capnocytophaga spp. are deficient in the in vitro samples.
Log-fold changes between sample types. Circles are species enriched in the model calculus, squares are enriched in saliva, and triangles in medium. Plot shows the top 20 absolute log-fold changes between model calculus and other sample types.
Species enriched in saliva compared to model calculus are largely aerobic, while species enriched in model calculus compared to saliva are mainly anaerobes.
Log-fold changes between sample types. Circles are species enriched in the model calculus, squares are enriched in saliva, and triangles in medium. Plot shows the top 30 species in PC1 (A) and PC2 (B) between model calculus and other sample types.
Log-fold changes between sample types. Circles are species enriched in the model calculus, triangles are enriched in modern calculus, diamonds in subgingival plaque, and squares in supragingival plaque. Plot shows the top 30 absolute log-fold changes between model calculus and other sample types. Bars within shapes are standard error.
None of the species enriched in model calculus are aerobes. The top two species (Corynebacterium matruchotii and Rothia dentocariosa) enriched in plaque and calculus comparative samples are aerobes, while the rest are more balanced between the various types of oxygen tolerance. The abundance of anaerobes in model calculus compared to oral comparative samples is consistent with the sPCA analysis.
Log-fold changes between sample types. Circles are species enriched in the model calculus, triangles in modern calculus, diamonds are enriched in subgingival plaque, and squares in supragingival plaque. Plot shows the top 30 species in PC1 (A) and PC2 (B) between model calculus and other sample types. Bars represent standard error.
Main takeaway is the loss of diversity from donated saliva to model calculus, and when compared to oral reference samples. The donated saliva for the experiment had a lower diversity than the reference saliva samples, and may have contributed to a lower diversity in experiment samples. Community profile of model calculus differs from the oral reference samples, while modern calculus mostly resembles oral reference samples. The main difference between model and reference samples seems to be a lack of aerobes in model samples and a dominance by Enterococcus spp.
Enterococcus faecalis has the highest log-fold change, with a higher abundance in the model calculus samples compared to the reference samples and may represent one of the main differences between model calculus and other reference samples, especially modern calculus, consistent with the results of the sPCA analysis. The high relative abundance of E. faecalis may represent contamination despite it being commonly found in the oral cavity. It was also abundant in the comparative in vitro biofilm study. Enterococcus spp. out-competing other species may be a problem in model biofilm studies, but more comparative studies are needed to confirm (and there are not many WGS studies out there). Capnocytophaga, Actinomyces, and Neisseria spp. deficient in model samples (both this study and comparative). Very few aerobes made it into the model. Apart from the donated saliva, very few aerobes were detected in any of the experimental samples. Rothia spp. disappeared between saliva and medium samples. We may need to reduce the frequency of medium replacement (currently every three days) to help promote the growth of slower growing organisms.
Capnocytophaga and Actinomyces spp. are predominantly (facultative) anaerobes, so their deficiency must be attributed to different reasons. Both Capnocytophaga and Neisseria spp. are fastidious and require an atmosphere with at least 5% carbon dioxide to thrive, so the model may have a low carbon dioxide atmosphere [@tonjumNeisseria2017].
ABS are also underrepresented in the experimental samples compared to the modern calculus and plaque reference samples. There was a relatively low frequency of ABS in the model samples compared to the reference samples, which may be attributed to the presence of sucrose in the treatment solutions, potentially eliminating the niche for ABS.