![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
FastQC per base sequence content | analyst | Bioinformatics | 14 | 02-15-2017 07:25 AM |
FastQC,kmer content, per base sequence content: is this good enough | mgg | Bioinformatics | 10 | 11-06-2013 11:45 PM |
Unexpected FastQC results | Rocketknight | Illumina/Solexa | 3 | 04-14-2012 03:37 AM |
FastQC - strange 'per base sequence content' graph | gconcepcion | Bioinformatics | 11 | 10-31-2011 01:39 AM |
Need help for FastQC results. Thanks!! | byou678 | Bioinformatics | 18 | 08-23-2011 02:53 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: Canada Join Date: Sep 2012
Posts: 21
|
![]()
Hi Everyone:
I would be grateful if someone could take a quick look at these FASTQC results. This is exome-seq 100 bp paired-end data. For the most part, FASTQC results seem normal, except a strange distribution for the per sequence GC content graph. From the FASTQC manual, an unusual distribution seems to be suggestive of contamination and a shift in the curve is suggestive of a systematic bias. Could this systematic bias be due to exome enrichment? Also there seems to be homopolymer AAAA and TTTT repeats as indicated by the KMER content warning? Can anyone speculate on the source of these homopolymers? Thanks, MC |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: USA Join Date: Jul 2012
Posts: 184
|
![]()
How long were your inserts? When reads go polyA or polyT it may be a sign that it's read through the entire length of the sequence.
|
![]() |
![]() |
![]() |
#3 |
Member
Location: Canada Join Date: Sep 2012
Posts: 21
|
![]()
The average insert size was ~360 bp. Why do reads go polyA or polyT when they've read through the entire length of the sequence?
|
![]() |
![]() |
![]() |
#4 |
Senior Member
Location: USA Join Date: Jul 2012
Posts: 184
|
![]()
If you're using an Illumina instrument (which I'm assuming right now), a poly A stretch is caused by a zero intensity signal, in which case the base will be called an A with 0 quality. Another way to tell if this the problem is too see if your adapters are showing up a lot in your overabundant fragments. This would indicate a amount of adapter/primer dimers which may also cause this problem of reading through the end of the sequence.
|
![]() |
![]() |
![]() |
#5 |
Member
Location: Canada Join Date: Sep 2012
Posts: 21
|
![]()
Hi Kcchan,
Thanks for the explanation, but I don't see any of my adapter sequences found in the overrepresented sequences section of FASTQC results. I only see "AAAAA" and "TTTTT" in this section with counts of 16919295 and 16616150, OBS/EXP overall of 3.44 and 3.09, respectively. FASTQC does not give me a warning for overrepresented sequences and shows these levels to be normal despite giving me a kmer content warning with these same homopolymers. |
![]() |
![]() |
![]() |
#6 |
Senior Member
Location: UK Join Date: Jan 2010
Posts: 390
|
![]()
I have to say I see this homopolymer signal from FastQC in HiSeq exome runs *a lot* but never had a satisfactory explanation for it. Hasn't seemed to impact downstream applications.
Your GC content will almost always get flagged with FastQC, but I've always assumed part of this is to do with looking only at the exome (rather than unbiased sequence from the genome) however your peak does look unusually distorted from what I would normally expect. |
![]() |
![]() |
![]() |
#7 |
Member
Location: Canada Join Date: Sep 2012
Posts: 21
|
![]()
Thanks for this insight Bukowski. I'll consider the homopolymer signal benign then. But yes the hump is very strange.
|
![]() |
![]() |
![]() |
Thread Tools | |
|
|