Hallo,
I am new here and next-gen metagenomic sequencing and registered specifically to ask a question.
I recently did a sequencing project of a metagenomic source, and did an initial FastQC check.
I found the GC distribution plot interesting. I have two peaks in my sequenced data which is something that I expected and can explain. What I dont understand is how the theoretical GC distribution is determined by the software.
In the FAQ of FastQC, the following is written:
In a normal random library you would expect to see a roughly normal distribution of GC content where the central peak corresponds to the overall GC content of the underlying genome. Since we don't know the the GC content of the genome the modal GC content is calculated from the observed data and used to build a reference distribution.
I have no idea how they get to the reference distribution.
Can anybody help?
Thanks in advance.
I am new here and next-gen metagenomic sequencing and registered specifically to ask a question.
I recently did a sequencing project of a metagenomic source, and did an initial FastQC check.
I found the GC distribution plot interesting. I have two peaks in my sequenced data which is something that I expected and can explain. What I dont understand is how the theoretical GC distribution is determined by the software.
In the FAQ of FastQC, the following is written:
In a normal random library you would expect to see a roughly normal distribution of GC content where the central peak corresponds to the overall GC content of the underlying genome. Since we don't know the the GC content of the genome the modal GC content is calculated from the observed data and used to build a reference distribution.
I have no idea how they get to the reference distribution.
Can anybody help?
Thanks in advance.