Loading report..

Highlight Samples

This report has flat image plots that won't be highlighted.
See the documentation for help.

Regex mode off

    Rename Samples

    This report has flat image plots that won't be renamed.
    See the documentation for help.

    Click here for bulk input.

    Paste two columns of a tab-delimited table here (eg. from Excel).

    First column should be the old name, second column the new name.

    Regex mode off

      Show / Hide Samples

      This report has flat image plots that won't be hidden.
      See the documentation for help.

      Regex mode off

        Export Plots

        px
        px
        X

        Download the raw data used to create the plots in this report below:

        Note that additional data was saved in multiqc_data when this report was generated.


        Choose Plots

        If you use plots from MultiQC in a publication or presentation, please cite:

        MultiQC: Summarize analysis results for multiple tools and samples in a single report
        Philip Ewels, Måns Magnusson, Sverker Lundin and Max Käller
        Bioinformatics (2016)
        doi: 10.1093/bioinformatics/btw354
        PMID: 27312411

        Save Settings

        You can save the toolbox settings for this report to the browser.


        Load Settings

        Choose a saved report profile from the dropdown box below:

        About MultiQC

        This report was generated using MultiQC, version 1.11

        You can see a YouTube video describing how to use MultiQC reports here: https://youtu.be/qPbIlO_KWN0

        For more information about MultiQC, including other videos and extensive documentation, please visit http://multiqc.info

        You can report bugs, suggest improvements and find the source code for MultiQC on GitHub: https://github.com/ewels/MultiQC

        MultiQC is published in Bioinformatics:

        MultiQC: Summarize analysis results for multiple tools and samples in a single report
        Philip Ewels, Måns Magnusson, Sverker Lundin and Max Käller
        Bioinformatics (2016)
        doi: 10.1093/bioinformatics/btw354
        PMID: 27312411

        A modular tool to aggregate results from bioinformatics analyses across many samples into a single report.

        This report has been generated by the nf-core/eager analysis pipeline. For information about how to interpret these results, please see the documentation.

        Report generated on 2022-04-04, 15:21 based on data in:


        General Statistics

        Showing 225/225 rows and 16/27 columns.
        Sample Name SeqsLength% GCGC content% Trimmed SeqsLength% GC% Unclassified Reads Reads MappedEndogenous DNA (%) Reads MappedEndogenous DNA Post (%)% Dups% Cutibacterium acnes
        ERR1883419
        83.1%
        872,779
        556
        0.06
        556
        0.06
        0.2%
        0.0%
        ERR1883419_1
        444,894
        76 bp
        42%
        25.0%
        872,779
        77 bp
        42%
        ERR1883419_2
        444,894
        76 bp
        42%
        ERR1883419_L1_polyg
        42.8%
        ERR1883420
        93.4%
        683,589
        317
        0.05
        317
        0.05
        1.3%
        0.0%
        ERR1883420_1
        385,285
        76 bp
        49%
        33.9%
        683,589
        76 bp
        49%
        ERR1883420_2
        385,285
        76 bp
        49%
        ERR1883420_L1_polyg
        49.5%
        ERR1883421
        95.1%
        626,804
        220
        0.04
        220
        0.04
        1.8%
        0.0%
        ERR1883421_1
        348,506
        76 bp
        52%
        29.1%
        626,804
        76 bp
        54%
        ERR1883421_2
        348,506
        76 bp
        53%
        ERR1883421_L1_polyg
        53.1%
        ERR1883422
        93.5%
        759,304
        1,857
        0.24
        1,857
        0.24
        1.8%
        0.0%
        ERR1883422_1
        394,316
        76 bp
        51%
        12.9%
        759,304
        76 bp
        52%
        ERR1883422_2
        394,316
        76 bp
        52%
        ERR1883422_L1_polyg
        52.2%
        ERR1883423
        86.0%
        1,126,641
        125
        0.01
        125
        0.01
        2.4%
        0.0%
        ERR1883423_1
        663,459
        76 bp
        57%
        37.0%
        1,126,641
        76 bp
        60%
        ERR1883423_2
        663,459
        76 bp
        58%
        ERR1883423_L1_polyg
        58.0%
        ERR1883424
        93.7%
        872,039
        79
        0.01
        79
        0.01
        2.5%
        0.0%
        ERR1883424_1
        500,367
        76 bp
        57%
        35.8%
        872,039
        77 bp
        59%
        ERR1883424_2
        500,367
        76 bp
        57%
        ERR1883424_L1_polyg
        57.8%
        ERR1883430
        79.4%
        804,258
        157
        0.02
        157
        0.02
        1.3%
        0.0%
        ERR1883430_1
        421,758
        76 bp
        53%
        26.5%
        804,258
        77 bp
        53%
        ERR1883430_2
        421,758
        76 bp
        53%
        ERR1883430_L1_polyg
        53.3%
        ERR1883436
        81.1%
        1,823,124
        266
        0.01
        266
        0.01
        0.4%
        0.0%
        ERR1883436_1
        947,337
        76 bp
        51%
        27.9%
        1,823,124
        77 bp
        52%
        ERR1883436_2
        947,337
        76 bp
        52%
        ERR1883436_L1_polyg
        52.1%
        ERR1883438
        92.9%
        2,681,239
        137
        0.01
        137
        0.01
        0.0%
        0.0%
        ERR1883438_1
        1,412,620
        76 bp
        57%
        33.3%
        2,681,239
        77 bp
        57%
        ERR1883438_2
        1,412,620
        76 bp
        56%
        ERR1883438_L1_polyg
        56.9%
        SRR059389
        11.1%
        45,100,307
        1,234
        0.00
        1,234
        0.00
        1.4%
        0.0%
        SRR059389_1
        29,204,547
        101 bp
        44%
        42.3%
        45,100,307
        98 bp
        43%
        SRR059389_2
        29,204,547
        101 bp
        44%
        SRR059389_L1_polyg
        43.8%
        SRR059425
        57.2%
        36,490,883
        377
        0.00
        377
        0.00
        1.3%
        0.0%
        SRR059425_1
        29,369,275
        101 bp
        49%
        72.1%
        36,490,883
        114 bp
        48%
        SRR059425_2
        29,369,275
        101 bp
        49%
        SRR059425_L1_polyg
        48.1%
        SRR059455
        49.8%
        36,362,591
        1,003
        0.00
        1,003
        0.00
        30.7%
        0.0%
        SRR059455_1
        29,582,771
        101 bp
        51%
        75.4%
        36,362,591
        114 bp
        50%
        SRR059455_2
        29,582,771
        101 bp
        51%
        SRR059455_L1_polyg
        50.7%
        SRR059917
        36.2%
        41,461,840
        661
        0.00
        661
        0.00
        0.9%
        0.0%
        SRR059917_1
        28,609,803
        101 bp
        46%
        52.6%
        41,461,840
        105 bp
        45%
        SRR059917_2
        28,609,803
        101 bp
        46%
        SRR059917_L1_polyg
        44.8%
        SRR060358
        20.0%
        37,420,368
        1,892
        0.01
        1,892
        0.01
        13.3%
        0.0%
        SRR060358_1
        27,612,846
        101 bp
        42%
        61.2%
        37,420,368
        114 bp
        42%
        SRR060358_2
        27,612,846
        101 bp
        42%
        SRR060358_L1_polyg
        41.1%
        SRR1631060
        38.5%
        30,740,837
        764
        0.00
        764
        0.00
        21.7%
        15.0%
        SRR1631060_1
        24,178,312
        101 bp
        53%
        74.4%
        30,740,837
        110 bp
        54%
        SRR1631060_2
        24,178,312
        101 bp
        53%
        SRR1631060_L1_polyg
        53.5%
        SRR1631061
        38.4%
        31,571,159
        738
        0.00
        738
        0.00
        21.1%
        15.1%
        SRR1631061_1
        24,812,687
        101 bp
        53%
        74.3%
        31,571,159
        110 bp
        54%
        SRR1631061_2
        24,812,687
        101 bp
        53%
        SRR1631061_L1_polyg
        53.5%
        SRR1631063
        38.4%
        34,647,388
        844
        0.00
        844
        0.00
        21.4%
        15.0%
        SRR1631063_1
        27,215,176
        101 bp
        53%
        74.3%
        34,647,388
        110 bp
        54%
        SRR1631063_2
        27,215,176
        101 bp
        53%
        SRR1631063_L1_polyg
        53.6%
        SRR1631064
        38.3%
        34,889,565
        823
        0.00
        823
        0.00
        23.3%
        15.0%
        SRR1631064_1
        27,378,198
        101 bp
        53%
        74.2%
        34,889,565
        110 bp
        54%
        SRR1631064_2
        27,378,198
        101 bp
        54%
        SRR1631064_L1_polyg
        53.6%
        SRR1633008
        25.6%
        19,880,442
        99
        0.00
        99
        0.00
        1.0%
        12.4%
        SRR1633008_1
        13,631,124
        101 bp
        55%
        62.4%
        19,880,442
        111 bp
        55%
        SRR1633008_2
        13,631,124
        101 bp
        55%
        SRR1633008_L1_polyg
        55.2%
        SRR1761677
        65.0%
        37,323,292
        2,420
        0.01
        2,420
        0.01
        5.0%
        0.0%
        SRR1761677_1
        25,925,960
        93 bp
        45%
        66.2%
        37,323,292
        109 bp
        45%
        SRR1761677_2
        25,925,960
        91 bp
        45%
        SRR1761677_L1_polyg
        45.7%
        SRR1761682
        51.2%
        28,774,040
        1,401
        0.00
        1,401
        0.00
        4.4%
        0.0%
        SRR1761682_1
        21,352,814
        94 bp
        46%
        73.9%
        28,774,040
        112 bp
        47%
        SRR1761682_2
        21,352,814
        92 bp
        46%
        SRR1761682_L1_polyg
        46.8%
        SRR1761688
        50.0%
        29,901,324
        1,445
        0.00
        1,445
        0.00
        3.3%
        0.0%
        SRR1761688_1
        23,124,993
        93 bp
        48%
        78.1%
        29,901,324
        112 bp
        48%
        SRR1761688_2
        23,124,993
        91 bp
        48%
        SRR1761688_L1_polyg
        48.5%
        SRR1761692
        65.6%
        31,441,675
        1,809
        0.01
        1,809
        0.01
        8.0%
        0.0%
        SRR1761692_1
        24,479,585
        93 bp
        45%
        78.8%
        31,441,675
        112 bp
        45%
        SRR1761692_2
        24,479,585
        91 bp
        45%
        SRR1761692_L1_polyg
        45.5%
        SRR1761697
        63.3%
        35,930,380
        1,797
        0.01
        1,797
        0.01
        3.2%
        0.0%
        SRR1761697_1
        26,569,854
        93 bp
        47%
        73.3%
        35,930,380
        110 bp
        47%
        SRR1761697_2
        26,569,854
        90 bp
        47%
        SRR1761697_L1_polyg
        47.2%
        SRR1761698
        66.8%
        50,125,497
        2,526
        0.01
        2,526
        0.01
        2.8%
        0.0%
        SRR1761698_1
        35,399,494
        72 bp
        48%
        69.1%
        50,125,497
        83 bp
        48%
        SRR1761698_2
        35,399,494
        70 bp
        48%
        SRR1761698_L1_polyg
        48.4%
        SRR1761705
        83.0%
        43,135,297
        3,340
        0.01
        3,340
        0.01
        0.7%
        0.0%
        SRR1761705_1
        29,226,654
        73 bp
        43%
        64.4%
        43,135,297
        83 bp
        43%
        SRR1761705_2
        29,226,654
        72 bp
        43%
        SRR1761705_L1_polyg
        43.5%
        SRR1761710
        76.5%
        40,041,094
        1,813
        0.00
        1,813
        0.00
        2.5%
        0.0%
        SRR1761710_1
        27,397,584
        73 bp
        45%
        66.0%
        40,041,094
        84 bp
        44%
        SRR1761710_2
        27,397,584
        72 bp
        44%
        SRR1761710_L1_polyg
        44.9%
        SRR1761718
        78.4%
        43,879,976
        4,807
        0.01
        4,807
        0.01
        1.0%
        0.0%
        SRR1761718_1
        29,990,550
        69 bp
        45%
        64.8%
        43,879,976
        80 bp
        45%
        SRR1761718_2
        29,990,550
        69 bp
        44%
        SRR1761718_L1_polyg
        45.0%
        SRR1761721
        82.4%
        43,668,363
        4,573
        0.01
        4,573
        0.01
        2.2%
        0.0%
        SRR1761721_1
        29,053,211
        69 bp
        45%
        61.5%
        43,668,363
        79 bp
        45%
        SRR1761721_2
        29,053,211
        69 bp
        45%
        SRR1761721_L1_polyg
        45.8%
        SRR1929408
        74.1%
        59,782,669
        478
        0.00
        478
        0.00
        0.0%
        0.0%
        SRR1929408_1
        31,843,136
        100 bp
        47%
        29.4%
        59,782,669
        101 bp
        47%
        SRR1929408_2
        31,843,136
        93 bp
        47%
        SRR1929408_L1_polyg
        47.6%
        SRR1930121
        82.6%
        68,362,831
        761
        0.00
        761
        0.00
        1.3%
        0.0%
        SRR1930121_1
        35,645,325
        100 bp
        50%
        25.8%
        68,362,831
        99 bp
        50%
        SRR1930121_2
        35,645,325
        91 bp
        50%
        SRR1930121_L1_polyg
        50.5%
        SRR1930123
        83.5%
        73,941,266
        2,272
        0.00
        2,272
        0.00
        0.5%
        0.0%
        SRR1930123_1
        38,920,714
        100 bp
        48%
        27.5%
        73,941,266
        100 bp
        48%
        SRR1930123_2
        38,920,714
        92 bp
        48%
        SRR1930123_L1_polyg
        48.7%
        SRR1930141
        79.2%
        61,586,457
        259
        0.00
        259
        0.00
        0.8%
        0.0%
        SRR1930141_1
        32,369,669
        100 bp
        49%
        27.1%
        61,586,457
        100 bp
        49%
        SRR1930141_2
        32,369,669
        93 bp
        49%
        SRR1930141_L1_polyg
        49.3%
        SRR1930145
        83.0%
        32,302,251
        3,716
        0.01
        3,716
        0.01
        1.6%
        0.0%
        SRR1930145_1
        16,695,141
        100 bp
        46%
        24.1%
        32,302,251
        97 bp
        46%
        SRR1930145_2
        16,695,141
        88 bp
        46%
        SRR1930145_L1_polyg
        46.5%
        SRR3184100
        31.0%
        26,261,858
        83
        0.00
        83
        0.00
        1.2%
        14.6%
        SRR3184100_1
        15,338,847
        101 bp
        58%
        43.3%
        26,261,858
        108 bp
        59%
        SRR3184100_2
        15,338,847
        101 bp
        58%
        SRR3184100_L1_polyg
        58.8%
        SRR3184876
        13.7%
        40,471,753
        435
        0.00
        435
        0.00
        12.0%
        54.9%
        SRR3184876_1
        28,640,653
        101 bp
        56%
        60.8%
        40,471,753
        104 bp
        56%
        SRR3184876_2
        28,640,653
        101 bp
        56%
        SRR3184876_L1_polyg
        56.6%
        SRR3189411
        18.2%
        19,815,447
        142
        0.00
        142
        0.00
        0.0%
        57.5%
        SRR3189411_1
        13,826,170
        101 bp
        57%
        64.9%
        19,815,447
        112 bp
        57%
        SRR3189411_2
        13,826,170
        101 bp
        57%
        SRR3189411_L1_polyg
        57.3%
        SRR3189416
        17.0%
        18,295,641
        121
        0.00
        121
        0.00
        0.8%
        58.0%
        SRR3189416_1
        12,690,457
        101 bp
        57%
        64.1%
        18,295,641
        112 bp
        57%
        SRR3189416_2
        12,690,457
        101 bp
        57%
        SRR3189416_L1_polyg
        57.4%
        SRR3189418
        16.9%
        16,666,339
        111
        0.00
        111
        0.00
        0.0%
        58.2%
        SRR3189418_1
        11,536,508
        101 bp
        57%
        63.8%
        16,666,339
        111 bp
        57%
        SRR3189418_2
        11,536,508
        101 bp
        57%
        SRR3189418_L1_polyg
        57.4%
        SRS013942
        32.8%
        4,526,403
        25,891
        0.57
        25,891
        0.57
        24.9%
        0.0%
        SRS013942.denovo_duplicates_marked
        3,774,293
        93 bp
        41%
        87.7%
        4,526,403
        121 bp
        41%
        SRS013942_L1_polyg
        41.7%
        SRS013950
        49.6%
        20,772,039
        54,991
        0.26
        54,991
        0.26
        33.1%
        0.0%
        SRS013950.denovo_duplicates_marked
        12,183,857
        97 bp
        44%
        43.0%
        20,772,039
        99 bp
        43%
        SRS013950_L1_polyg
        44.5%
        SRS014107
        54.7%
        7,095,841
        7,547
        0.11
        7,547
        0.11
        12.6%
        0.0%
        SRS014107.denovo_duplicates_marked
        4,258,293
        91 bp
        52%
        47.2%
        7,095,841
        101 bp
        52%
        SRS014107_L1_polyg
        52.7%
        SRS014468
        45.0%
        1,521,100
        27,913
        1.84
        27,913
        1.84
        18.0%
        0.1%
        SRS014468.denovo_duplicates_marked
        1,159,503
        96 bp
        41%
        79.2%
        1,521,100
        120 bp
        41%
        SRS014468_L1_polyg
        41.2%
        SRS014477
        49.4%
        6,855,931
        28,478
        0.42
        28,478
        0.42
        19.3%
        0.0%
        SRS014477.denovo_duplicates_marked
        5,363,955
        94 bp
        46%
        81.7%
        6,855,931
        117 bp
        46%
        SRS014477_L1_polyg
        46.0%
        SRS014691
        51.6%
        21,848,723
        20,706
        0.09
        20,706
        0.09
        25.6%
        0.0%
        SRS014691.denovo_duplicates_marked
        13,122,051
        96 bp
        39%
        50.5%
        21,848,723
        108 bp
        39%
        SRS014691_L1_polyg
        39.7%
        SRS014692
        47.4%
        10,245,169
        10,370
        0.10
        10,370
        0.10
        16.3%
        0.0%
        SRS014692.denovo_duplicates_marked
        7,091,094
        97 bp
        40%
        65.3%
        10,245,169
        116 bp
        40%
        SRS014692_L1_polyg
        40.6%
        SRS015055
        47.8%
        4,767,073
        49,878
        1.05
        49,878
        1.05
        17.8%
        0.1%
        SRS015055.denovo_duplicates_marked
        4,269,735
        98 bp
        45%
        88.4%
        4,767,073
        123 bp
        45%
        SRS015055_L1_polyg
        45.7%
        SRS015064
        31.9%
        18,636,084
        32,722
        0.18
        32,722
        0.18
        11.4%
        0.0%
        SRS015064.denovo_duplicates_marked
        14,758,418
        100 bp
        60%
        76.9%
        18,636,084
        115 bp
        58%
        SRS015064_L1_polyg
        60.5%
        SRS015378
        16.6%
        16,153,060
        15,124
        0.09
        15,124
        0.09
        9.2%
        0.0%
        SRS015378.denovo_duplicates_marked
        11,291,060
        90 bp
        53%
        69.8%
        16,153,060
        109 bp
        53%
        SRS015378_L1_polyg
        53.6%
        SRS015755
        37.1%
        17,532,943
        15,249
        0.09
        15,249
        0.09
        11.4%
        0.0%
        SRS015755.denovo_duplicates_marked
        9,599,719
        89 bp
        52%
        33.7%
        17,532,943
        94 bp
        52%
        SRS015755_L1_polyg
        52.0%
        SRS019029
        52.5%
        23,056,072
        27,300
        0.12
        27,300
        0.12
        26.1%
        0.0%
        SRS019029.denovo_duplicates_marked
        13,485,730
        94 bp
        42%
        45.1%
        23,056,072
        103 bp
        42%
        SRS019029_L1_polyg
        42.8%
        SRS019120
        39.0%
        5,117,750
        21,122
        0.41
        21,122
        0.41
        17.6%
        0.1%
        SRS019120.denovo_duplicates_marked
        3,896,233
        96 bp
        43%
        79.0%
        5,117,750
        121 bp
        43%
        SRS019120_L1_polyg
        43.2%
        SRS019129
        42.0%
        13,958,225
        18,253
        0.13
        18,253
        0.13
        10.4%
        0.0%
        SRS019129.denovo_duplicates_marked
        8,285,470
        96 bp
        50%
        45.3%
        13,958,225
        100 bp
        49%
        SRS019129_L1_polyg
        50.4%
        SRS019333
        23.5%
        15,509,567
        21,585
        0.14
        21,585
        0.14
        11.8%
        0.0%
        SRS019333.denovo_duplicates_marked
        10,670,162
        88 bp
        52%
        67.7%
        15,509,567
        105 bp
        52%
        SRS019333_L1_polyg
        52.2%
        SRS021960
        30.6%
        17,183,094
        20,951
        0.12
        20,951
        0.12
        17.1%
        0.0%
        SRS021960.denovo_duplicates_marked
        12,162,734
        94 bp
        49%
        69.5%
        17,183,094
        108 bp
        49%
        SRS021960_L1_polyg
        49.3%
        SRS022149
        50.5%
        19,675,959
        43,450
        0.22
        43,450
        0.22
        33.6%
        0.0%
        SRS022149.denovo_duplicates_marked
        13,578,187
        95 bp
        49%
        65.1%
        19,675,959
        109 bp
        48%
        SRS022149_L1_polyg
        48.8%
        SRS045313
        32.9%
        10,741,642
        31,354
        0.29
        31,354
        0.29
        12.1%
        0.0%
        SRS045313.denovo_duplicates_marked
        7,689,999
        98 bp
        41%
        58.9%
        10,741,642
        103 bp
        40%
        SRS045313_L1_polyg
        40.2%
        SRS047265
        27.3%
        10,472,975
        39,268
        0.37
        39,268
        0.37
        31.6%
        0.0%
        SRS047265.denovo_duplicates_marked
        6,152,995
        94 bp
        43%
        46.0%
        10,472,975
        100 bp
        43%
        SRS047265_L1_polyg
        43.7%
        SRS051378
        27.6%
        14,870,044
        18,449
        0.12
        18,449
        0.12
        12.1%
        0.0%
        SRS051378.denovo_duplicates_marked
        10,177,880
        94 bp
        46%
        66.1%
        14,870,044
        106 bp
        46%
        SRS051378_L1_polyg
        46.8%
        SRS052604
        33.0%
        11,723,517
        15,688
        0.13
        15,688
        0.13
        10.6%
        0.0%
        SRS052604.denovo_duplicates_marked
        9,051,062
        94 bp
        47%
        80.8%
        11,723,517
        115 bp
        47%
        SRS052604_L1_polyg
        47.3%
        SRS063215
        37.3%
        17,375,807
        35,439
        0.20
        35,439
        0.20
        23.2%
        0.0%
        SRS063215.denovo_duplicates_marked
        11,946,828
        95 bp
        46%
        66.4%
        17,375,807
        108 bp
        46%
        SRS063215_L1_polyg
        46.7%
        SRS064493
        29.4%
        9,054,557
        7,477
        0.08
        7,477
        0.08
        17.0%
        0.0%
        SRS064493.denovo_duplicates_marked
        4,990,994
        90 bp
        56%
        36.5%
        9,054,557
        97 bp
        56%
        SRS064493_L1_polyg
        56.3%

        FastQC (pre-Trimming)

        FastQC (pre-Trimming) is a quality control tool for high throughput sequence data, written by Simon Andrews at the Babraham Institute in Cambridge.

        Sequence Counts

        Sequence counts for each sample. Duplicate read counts are an estimate only.

        This plot show the total number of reads, broken down into unique and duplicate if possible (only more recent versions of FastQC give duplicate info).

        You can read more about duplicate calculation in the FastQC documentation. A small part has been copied here for convenience:

        Only sequences which first appear in the first 100,000 sequences in each file are analysed. This should be enough to get a good impression for the duplication levels in the whole file. Each sequence is tracked to the end of the file to give a representative count of the overall duplication level.

        The duplication detection requires an exact sequence match over the whole length of the sequence. Any reads over 75bp in length are truncated to 50bp for this analysis.

        Flat image plot. Toolbox functions such as highlighting / hiding samples will not work (see the docs).


        Sequence Quality Histograms

        The mean quality value across each base position in the read.

        To enable multiple samples to be plotted on the same graph, only the mean quality scores are plotted (unlike the box plots seen in FastQC reports).

        Taken from the FastQC help:

        The y-axis on the graph shows the quality scores. The higher the score, the better the base call. The background of the graph divides the y axis into very good quality calls (green), calls of reasonable quality (orange), and calls of poor quality (red). The quality of calls on most platforms will degrade as the run progresses, so it is common to see base calls falling into the orange area towards the end of a read.

        Flat image plot. Toolbox functions such as highlighting / hiding samples will not work (see the docs).


        Per Sequence Quality Scores

        The number of reads with average quality scores. Shows if a subset of reads has poor quality.

        From the FastQC help:

        The per sequence quality score report allows you to see if a subset of your sequences have universally low quality values. It is often the case that a subset of sequences will have universally poor quality, however these should represent only a small percentage of the total sequences.

        Flat image plot. Toolbox functions such as highlighting / hiding samples will not work (see the docs).


        Per Base Sequence Content

        The proportion of each base position for which each of the four normal DNA bases has been called.

        To enable multiple samples to be shown in a single plot, the base composition data is shown as a heatmap. The colours represent the balance between the four bases: an even distribution should give an even muddy brown colour. Hover over the plot to see the percentage of the four bases under the cursor.

        To see the data as a line plot, as in the original FastQC graph, click on a sample track.

        From the FastQC help:

        Per Base Sequence Content plots out the proportion of each base position in a file for which each of the four normal DNA bases has been called.

        In a random library you would expect that there would be little to no difference between the different bases of a sequence run, so the lines in this plot should run parallel with each other. The relative amount of each base should reflect the overall amount of these bases in your genome, but in any case they should not be hugely imbalanced from each other.

        It's worth noting that some types of library will always produce biased sequence composition, normally at the start of the read. Libraries produced by priming using random hexamers (including nearly all RNA-Seq libraries) and those which were fragmented using transposases inherit an intrinsic bias in the positions at which reads start. This bias does not concern an absolute sequence, but instead provides enrichement of a number of different K-mers at the 5' end of the reads. Whilst this is a true technical bias, it isn't something which can be corrected by trimming and in most cases doesn't seem to adversely affect the downstream analysis.

        Click a sample row to see a line plot for that dataset.
        Rollover for sample name
        Position: -
        %T: -
        %C: -
        %A: -
        %G: -

        Per Sequence GC Content

        The average GC content of reads. Normal random library typically have a roughly normal distribution of GC content.

        From the FastQC help:

        This module measures the GC content across the whole length of each sequence in a file and compares it to a modelled normal distribution of GC content.

        In a normal random library you would expect to see a roughly normal distribution of GC content where the central peak corresponds to the overall GC content of the underlying genome. Since we don't know the the GC content of the genome the modal GC content is calculated from the observed data and used to build a reference distribution.

        An unusually shaped distribution could indicate a contaminated library or some other kinds of biased subset. A normal distribution which is shifted indicates some systematic bias which is independent of base position. If there is a systematic bias which creates a shifted normal distribution then this won't be flagged as an error by the module since it doesn't know what your genome's GC content should be.

        Flat image plot. Toolbox functions such as highlighting / hiding samples will not work (see the docs).


        Per Base N Content

        The percentage of base calls at each position for which an N was called.

        From the FastQC help:

        If a sequencer is unable to make a base call with sufficient confidence then it will normally substitute an N rather than a conventional base call. This graph shows the percentage of base calls at each position for which an N was called.

        It's not unusual to see a very low proportion of Ns appearing in a sequence, especially nearer the end of a sequence. However, if this proportion rises above a few percent it suggests that the analysis pipeline was unable to interpret the data well enough to make valid base calls.

        Flat image plot. Toolbox functions such as highlighting / hiding samples will not work (see the docs).


        Sequence Length Distribution

        The distribution of fragment sizes (read lengths) found. See the FastQC help

        Flat image plot. Toolbox functions such as highlighting / hiding samples will not work (see the docs).


        Sequence Duplication Levels

        The relative level of duplication found for every sequence.

        From the FastQC Help:

        In a diverse library most sequences will occur only once in the final set. A low level of duplication may indicate a very high level of coverage of the target sequence, but a high level of duplication is more likely to indicate some kind of enrichment bias (eg PCR over amplification). This graph shows the degree of duplication for every sequence in a library: the relative number of sequences with different degrees of duplication.

        Only sequences which first appear in the first 100,000 sequences in each file are analysed. This should be enough to get a good impression for the duplication levels in the whole file. Each sequence is tracked to the end of the file to give a representative count of the overall duplication level.

        The duplication detection requires an exact sequence match over the whole length of the sequence. Any reads over 75bp in length are truncated to 50bp for this analysis.

        In a properly diverse library most sequences should fall into the far left of the plot in both the red and blue lines. A general level of enrichment, indicating broad oversequencing in the library will tend to flatten the lines, lowering the low end and generally raising other categories. More specific enrichments of subsets, or the presence of low complexity contaminants will tend to produce spikes towards the right of the plot.

        Flat image plot. Toolbox functions such as highlighting / hiding samples will not work (see the docs).


        Overrepresented sequences

        The total amount of overrepresented sequences found in each library.

        FastQC calculates and lists overrepresented sequences in FastQ files. It would not be possible to show this for all samples in a MultiQC report, so instead this plot shows the number of sequences categorized as over represented.

        Sometimes, a single sequence may account for a large number of reads in a dataset. To show this, the bars are split into two: the first shows the overrepresented reads that come from the single most common sequence. The second shows the total count from all remaining overrepresented sequences.

        From the FastQC Help:

        A normal high-throughput library will contain a diverse set of sequences, with no individual sequence making up a tiny fraction of the whole. Finding that a single sequence is very overrepresented in the set either means that it is highly biologically significant, or indicates that the library is contaminated, or not as diverse as you expected.

        FastQC lists all of the sequences which make up more than 0.1% of the total. To conserve memory only sequences which appear in the first 100,000 sequences are tracked to the end of the file. It is therefore possible that a sequence which is overrepresented but doesn't appear at the start of the file for some reason could be missed by this module.

        Flat image plot. Toolbox functions such as highlighting / hiding samples will not work (see the docs).


        Adapter Content

        The cumulative percentage count of the proportion of your library which has seen each of the adapter sequences at each position.

        Note that only samples with ≥ 0.1% adapter contamination are shown.

        There may be several lines per sample, as one is shown for each adapter detected in the file.

        From the FastQC Help:

        The plot shows a cumulative percentage count of the proportion of your library which has seen each of the adapter sequences at each position. Once a sequence has been seen in a read it is counted as being present right through to the end of the read so the percentages you see will only increase as the read length goes on.

        loading..

        Status Checks

        Status for each FastQC section showing whether results seem entirely normal (green), slightly abnormal (orange) or very unusual (red).

        FastQC assigns a status for each section of the report. These give a quick evaluation of whether the results of the analysis seem entirely normal (green), slightly abnormal (orange) or very unusual (red).

        It is important to stress that although the analysis results appear to give a pass/fail result, these evaluations must be taken in the context of what you expect from your library. A 'normal' sample as far as FastQC is concerned is random and diverse. Some experiments may be expected to produce libraries which are biased in particular ways. You should treat the summary evaluations therefore as pointers to where you should concentrate your attention and understand why your library may not look random and diverse.

        Specific guidance on how to interpret the output of each module can be found in the relevant report section, or in the FastQC help.

        In this heatmap, we summarise all of these into a single heatmap for a quick overview. Note that not all FastQC sections have plots in MultiQC reports, but all status checks are shown in this heatmap.

        loading..

        fastp

        fastp An ultra-fast all-in-one FASTQ preprocessor (QC, adapters, trimming, filtering, splitting...)

        Filtered Reads

        Filtering statistics of sampled reads.

        loading..

        Duplication Rates

        Duplication rates of sampled reads.

        loading..

        Insert Sizes

        Insert size estimation of sampled reads.

        loading..

        Sequence Quality

        Average sequencing quality over each base of all reads.

        loading..

        GC Content

        Average GC content over each base of all reads.

        loading..

        N content

        Average N content over each base of all reads.

        loading..

        Adapter Removal

        Adapter Removal rapid adapter trimming, identification, and read merging

        Retained and Discarded Paired-End Collapsed

        The number of retained and discarded reads.

        loading..

        Length Distribution Paired End Collapsed

        The length distribution of reads after processing adapter alignment.

        loading..

        FastQC (post-Trimming)

        FastQC (post-Trimming) is a quality control tool for high throughput sequence data, written by Simon Andrews at the Babraham Institute in Cambridge.

        Sequence Counts

        Sequence counts for each sample. Duplicate read counts are an estimate only.

        This plot show the total number of reads, broken down into unique and duplicate if possible (only more recent versions of FastQC give duplicate info).

        You can read more about duplicate calculation in the FastQC documentation. A small part has been copied here for convenience:

        Only sequences which first appear in the first 100,000 sequences in each file are analysed. This should be enough to get a good impression for the duplication levels in the whole file. Each sequence is tracked to the end of the file to give a representative count of the overall duplication level.

        The duplication detection requires an exact sequence match over the whole length of the sequence. Any reads over 75bp in length are truncated to 50bp for this analysis.

        loading..

        Sequence Quality Histograms

        The mean quality value across each base position in the read.

        To enable multiple samples to be plotted on the same graph, only the mean quality scores are plotted (unlike the box plots seen in FastQC reports).

        Taken from the FastQC help:

        The y-axis on the graph shows the quality scores. The higher the score, the better the base call. The background of the graph divides the y axis into very good quality calls (green), calls of reasonable quality (orange), and calls of poor quality (red). The quality of calls on most platforms will degrade as the run progresses, so it is common to see base calls falling into the orange area towards the end of a read.

        loading..

        Per Sequence Quality Scores

        The number of reads with average quality scores. Shows if a subset of reads has poor quality.

        From the FastQC help:

        The per sequence quality score report allows you to see if a subset of your sequences have universally low quality values. It is often the case that a subset of sequences will have universally poor quality, however these should represent only a small percentage of the total sequences.

        loading..

        Per Base Sequence Content

        The proportion of each base position for which each of the four normal DNA bases has been called.

        To enable multiple samples to be shown in a single plot, the base composition data is shown as a heatmap. The colours represent the balance between the four bases: an even distribution should give an even muddy brown colour. Hover over the plot to see the percentage of the four bases under the cursor.

        To see the data as a line plot, as in the original FastQC graph, click on a sample track.

        From the FastQC help:

        Per Base Sequence Content plots out the proportion of each base position in a file for which each of the four normal DNA bases has been called.

        In a random library you would expect that there would be little to no difference between the different bases of a sequence run, so the lines in this plot should run parallel with each other. The relative amount of each base should reflect the overall amount of these bases in your genome, but in any case they should not be hugely imbalanced from each other.

        It's worth noting that some types of library will always produce biased sequence composition, normally at the start of the read. Libraries produced by priming using random hexamers (including nearly all RNA-Seq libraries) and those which were fragmented using transposases inherit an intrinsic bias in the positions at which reads start. This bias does not concern an absolute sequence, but instead provides enrichement of a number of different K-mers at the 5' end of the reads. Whilst this is a true technical bias, it isn't something which can be corrected by trimming and in most cases doesn't seem to adversely affect the downstream analysis.

        Click a sample row to see a line plot for that dataset.
        Rollover for sample name
        Position: -
        %T: -
        %C: -
        %A: -
        %G: -

        Per Sequence GC Content

        The average GC content of reads. Normal random library typically have a roughly normal distribution of GC content.

        From the FastQC help:

        This module measures the GC content across the whole length of each sequence in a file and compares it to a modelled normal distribution of GC content.

        In a normal random library you would expect to see a roughly normal distribution of GC content where the central peak corresponds to the overall GC content of the underlying genome. Since we don't know the the GC content of the genome the modal GC content is calculated from the observed data and used to build a reference distribution.

        An unusually shaped distribution could indicate a contaminated library or some other kinds of biased subset. A normal distribution which is shifted indicates some systematic bias which is independent of base position. If there is a systematic bias which creates a shifted normal distribution then this won't be flagged as an error by the module since it doesn't know what your genome's GC content should be.

        loading..

        Per Base N Content

        The percentage of base calls at each position for which an N was called.

        From the FastQC help:

        If a sequencer is unable to make a base call with sufficient confidence then it will normally substitute an N rather than a conventional base call. This graph shows the percentage of base calls at each position for which an N was called.

        It's not unusual to see a very low proportion of Ns appearing in a sequence, especially nearer the end of a sequence. However, if this proportion rises above a few percent it suggests that the analysis pipeline was unable to interpret the data well enough to make valid base calls.

        loading..

        Sequence Length Distribution

        The distribution of fragment sizes (read lengths) found. See the FastQC help

        loading..

        Sequence Duplication Levels

        The relative level of duplication found for every sequence.

        From the FastQC Help:

        In a diverse library most sequences will occur only once in the final set. A low level of duplication may indicate a very high level of coverage of the target sequence, but a high level of duplication is more likely to indicate some kind of enrichment bias (eg PCR over amplification). This graph shows the degree of duplication for every sequence in a library: the relative number of sequences with different degrees of duplication.

        Only sequences which first appear in the first 100,000 sequences in each file are analysed. This should be enough to get a good impression for the duplication levels in the whole file. Each sequence is tracked to the end of the file to give a representative count of the overall duplication level.

        The duplication detection requires an exact sequence match over the whole length of the sequence. Any reads over 75bp in length are truncated to 50bp for this analysis.

        In a properly diverse library most sequences should fall into the far left of the plot in both the red and blue lines. A general level of enrichment, indicating broad oversequencing in the library will tend to flatten the lines, lowering the low end and generally raising other categories. More specific enrichments of subsets, or the presence of low complexity contaminants will tend to produce spikes towards the right of the plot.

        loading..

        Overrepresented sequences

        The total amount of overrepresented sequences found in each library.

        FastQC calculates and lists overrepresented sequences in FastQ files. It would not be possible to show this for all samples in a MultiQC report, so instead this plot shows the number of sequences categorized as over represented.

        Sometimes, a single sequence may account for a large number of reads in a dataset. To show this, the bars are split into two: the first shows the overrepresented reads that come from the single most common sequence. The second shows the total count from all remaining overrepresented sequences.

        From the FastQC Help:

        A normal high-throughput library will contain a diverse set of sequences, with no individual sequence making up a tiny fraction of the whole. Finding that a single sequence is very overrepresented in the set either means that it is highly biologically significant, or indicates that the library is contaminated, or not as diverse as you expected.

        FastQC lists all of the sequences which make up more than 0.1% of the total. To conserve memory only sequences which appear in the first 100,000 sequences are tracked to the end of the file. It is therefore possible that a sequence which is overrepresented but doesn't appear at the start of the file for some reason could be missed by this module.

        loading..

        Adapter Content

        The cumulative percentage count of the proportion of your library which has seen each of the adapter sequences at each position.

        Note that only samples with ≥ 0.1% adapter contamination are shown.

        There may be several lines per sample, as one is shown for each adapter detected in the file.

        From the FastQC Help:

        The plot shows a cumulative percentage count of the proportion of your library which has seen each of the adapter sequences at each position. Once a sequence has been seen in a read it is counted as being present right through to the end of the read so the percentages you see will only increase as the read length goes on.

        loading..

        Status Checks

        Status for each FastQC section showing whether results seem entirely normal (green), slightly abnormal (orange) or very unusual (red).

        FastQC assigns a status for each section of the report. These give a quick evaluation of whether the results of the analysis seem entirely normal (green), slightly abnormal (orange) or very unusual (red).

        It is important to stress that although the analysis results appear to give a pass/fail result, these evaluations must be taken in the context of what you expect from your library. A 'normal' sample as far as FastQC is concerned is random and diverse. Some experiments may be expected to produce libraries which are biased in particular ways. You should treat the summary evaluations therefore as pointers to where you should concentrate your attention and understand why your library may not look random and diverse.

        Specific guidance on how to interpret the output of each module can be found in the relevant report section, or in the FastQC help.

        In this heatmap, we summarise all of these into a single heatmap for a quick overview. Note that not all FastQC sections have plots in MultiQC reports, but all status checks are shown in this heatmap.

        loading..

        Kraken

        Kraken is a taxonomic classification tool that uses exact k-mer matches to find the lowest common ancestor (LCA) of a given sequence.

        Top taxa

        The number of reads falling into the top 5 taxa across different ranks.

        To make this plot, the percentage of each sample assigned to a given taxa is summed across all samples. The counts for these top five taxa are then plotted for each of the 9 different taxa ranks. The unclassified count is always shown across all taxa ranks.

        The total number of reads is approximated by dividing the number of unclassified reads by the percentage of the library that they account for. Note that this is only an approximation, and that kraken percentages don't always add to exactly 100%.

        The category "Other" shows the difference between the above total read count and the sum of the read counts in the top 5 taxa shown + unclassified. This should cover all taxa not in the top 5, +/- any rounding errors.

        Note that any taxon that does not exactly fit a taxon rank (eg. - or G2) is ignored.

           
        loading..

        Duplication rate of top species

        The duplication rate of minimizer falling into the top 5 species

        To make this plot, the minimizer duplication rate is computed for the top 5 most abundant species in all samples.

        The minimizer duplication rate is defined as: duplication rate = (total number of minimizers / number of distinct minimizers)

        A low coverage and high duplication rate (>> 1) is often sign of read stacking, which probably indicates of false positive hit.

        loading..

        Samtools Flagstat (pre-samtools filter)

        Samtools is a suite of programs for interacting with high-throughput sequencing data.

        Samtools Flagstat

        This module parses the output from samtools flagstat. All numbers in millions.

        loading..

        Samtools Flagstat (post-samtools filter)

        Samtools is a suite of programs for interacting with high-throughput sequencing data.

        Samtools Flagstat

        This module parses the output from samtools flagstat. All numbers in millions.

        loading..

        Picard

        Picard is a set of Java command line tools for manipulating high-throughput sequencing data.

        Mark Duplicates

        Number of reads, categorised by duplication state. Pair counts are doubled - see help text for details.

        The table in the Picard metrics file contains some columns referring read pairs and some referring to single reads.

        To make the numbers in this plot sum correctly, values referring to pairs are doubled according to the scheme below:

        • READS_IN_DUPLICATE_PAIRS = 2 * READ_PAIR_DUPLICATES
        • READS_IN_UNIQUE_PAIRS = 2 * (READ_PAIRS_EXAMINED - READ_PAIR_DUPLICATES)
        • READS_IN_UNIQUE_UNPAIRED = UNPAIRED_READS_EXAMINED - UNPAIRED_READ_DUPLICATES
        • READS_IN_DUPLICATE_PAIRS_OPTICAL = 2 * READ_PAIR_OPTICAL_DUPLICATES
        • READS_IN_DUPLICATE_PAIRS_NONOPTICAL = READS_IN_DUPLICATE_PAIRS - READS_IN_DUPLICATE_PAIRS_OPTICAL
        • READS_IN_DUPLICATE_UNPAIRED = UNPAIRED_READ_DUPLICATES
        • READS_UNMAPPED = UNMAPPED_READS
        loading..

        Preseq

        Preseq estimates the complexity of a library, showing how many additional unique reads are sequenced for increasing total read count. A shallow curve indicates complexity saturation. The dashed line shows a perfectly complex library where total reads = unique reads.

        Complexity curve

        Note that the x axis is trimmed at the point where all the datasets show 80% of their maximum y-value, to avoid ridiculous scales.

        loading..

        nf-core/eager Software Versions

        are collected at run time from the software output.

        nf-core/eager
        v2.4.1
        Nextflow
        v21.03.0.edge
        FastQC
        v0.11.9
        MultiQC
        v1.11
        AdapterRemoval
        v2.3.2
        fastP
        v0.20.1
        BWA
        v0.7.17-r1188
        Bowtie2
        v2.4.4
        circulargenerator
        v1.0
        Samtools
        v1.12
        endorS.py
        v0.4
        DeDup
        v0.12.8
        Picard MarkDuplicates
        vVersion:2.26.0
        Qualimap
        v2.2.2-dev
        Preseq
        v3.1.1
        GATK HaplotypeCaller
        N/A
        GATK UnifiedGenotyper
        v3.5-0-g36282e4
        freebayes
        v1.3.5
        sequenceTools
        v1.4.0.5
        VCF2genome
        v0.91
        MTNucRatioCalculator
        v0.7
        bedtools
        v2.30.0
        DamageProfiler
        v0.4.9
        bamUtil
        v1.0.15
        pmdtools
        v0.50
        angsd
        v0.935
        sexdeterrmine
        v1.1.2
        multivcfanalyzer
        v0.85.2
        malt
        v0.4.1
        kraken
        v2.1.2
        maltextract
        v1.7
        eigenstrat_snp_coverage
        v1.0.2
        mapDamage2
        v2.2.1
        bbduk
        vJanuary 26, 2021
        bcftools
        v1.12

        nf-core/eager Workflow Summary

        - this information is collected when the pipeline is started.

        Pipeline Release
        2.4.1
        Run Name
        source_kraken
        Input
        /home/bartholdybp/byoc_analysis/eager/source_input.tsv
        Fasta Ref
        /home/bartholdybp/data/databases/refseq/human/GRCh38_latest_genomic.fna
        Max Resources
        240 GB memory, 24 cpus, 7d time per job
        Container
        singularity - nfcore/eager:2.4.1
        Output dir
        /home/bartholdybp/data/analysis/205784//results
        Launch dir
        /scratchdata/bartholdybp/205784
        Working dir
        /home/bartholdybp/data/analysis/205784/work
        Script dir
        /home/bartholdybp/.nextflow/assets/nf-core/eager
        User
        bartholdybp
        Config Profile
        alice,byoc
        Config Profile Description
        BYOC profile for use on Leiden University ALICE cluster.
        Config Profile Contact
        Bjorn Peare Bartholdy (@osteobjorn)
        Config Profile URL
        https://wiki.alice.universiteitleiden.nl/
        Config Files
        /home/bartholdybp/.nextflow/assets/nf-core/eager/nextflow.config, /scratchdata/bartholdybp/205784/byoc.config