Dear Sequencing Community,
I am new to this so please be gently.
The ChIP-seq was done by a company on Illumina sequencer, ~ 60 Mb per sample, 4 samples, only 17% - 22 % could be mapped to the mouse genome. There is some known salmon sperm contamination, but there should still be enough reads for the sample.
I did some checks base quality scores using FastQC and was shocked:
http://www.freeimagehosting.net/t4wx8
But the comapany replied that FastQC does sample the first ~200.000 reads. I do know that this is done for e.g. base overrepresentation. I thought the base quality score is calculated from all the reads. .
So I checked with FastX from Galaxy:
http://www.freeimagehosting.net/713i5
Why do the base quality score outputs from FastQC and FastX differ so much? If FastQC only samples a fraction of the reads wouldn't it become useless for bigger runs?
Thanks for you help.
I am new to this so please be gently.
The ChIP-seq was done by a company on Illumina sequencer, ~ 60 Mb per sample, 4 samples, only 17% - 22 % could be mapped to the mouse genome. There is some known salmon sperm contamination, but there should still be enough reads for the sample.
I did some checks base quality scores using FastQC and was shocked:
http://www.freeimagehosting.net/t4wx8
But the comapany replied that FastQC does sample the first ~200.000 reads. I do know that this is done for e.g. base overrepresentation. I thought the base quality score is calculated from all the reads. .
So I checked with FastX from Galaxy:
http://www.freeimagehosting.net/713i5
Why do the base quality score outputs from FastQC and FastX differ so much? If FastQC only samples a fraction of the reads wouldn't it become useless for bigger runs?
Thanks for you help.
Comment