SEQanswers

Go Back   SEQanswers > General



Similar Threads
Thread Thread Starter Forum Replies Last Post
Illumina fragment length distribution delphi_ote Genomic Resequencing 3 05-18-2012 01:59 AM
Using Tophat with low quality Illumina Reads sphil Bioinformatics 5 08-02-2011 07:28 AM
Reason for low quality of illumina reads nvteja Illumina/Solexa 2 07-07-2010 09:41 AM
Quality trimmming / Mask low quality bases? bbimber Bioinformatics 9 03-25-2010 01:40 PM
How will trimming low-quality ends of Illumina reads affect TopHat and Cufflinks? ecabot RNA Sequencing 1 02-25-2010 08:31 AM

Reply
 
Thread Tools
Old 12-15-2010, 07:50 PM   #1
luxmare
Member
 
Location: Japan

Join Date: Feb 2009
Posts: 10
Question Periodical illumina read length distribution after trimming of low-quality bases

In my NGS data analysis, before mapping, I trimmed low-quality bases (<Q20) from 3' ends until a high quality (≥Q20) base appears. After that, I plotted the distribution of read length and obtained the weird periodical read length distribution. Please see attached.

In the graph, length distributions from different lanes or tiles were drawn in different colors. Frequencies of reads were oscillated with 5bp intervals.

I also saw this kind of weird length distribution for other our RNA-seq and genome sequence dataset, and RNA-seq data from SRA as well.

Does anyone know the reason why such periodical length distribution was appeared after trimming?

Thanks in advance.
Attached Images
File Type: jpg readlength_after_trimming.jpg (60.4 KB, 114 views)
luxmare is offline   Reply With Quote
Old 12-16-2010, 05:03 AM   #2
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,169
Default

I can't remember any details but I do recall hearing once that there is something about the Illumina quality scoring algorithms which creates these 5bp cycles.
kmcarr is offline   Reply With Quote
Old 12-17-2010, 07:34 PM   #3
luxmare
Member
 
Location: Japan

Join Date: Feb 2009
Posts: 10
Default

Thank you, kmcarr.
Do you mean that such weird distribution is caused by the base calling algorithm in the illumina pipeline?

Can we just ignore the length distribution after trimming of low-quality bases? We would not worry about it?
luxmare is offline   Reply With Quote
Old 12-18-2010, 01:32 AM   #4
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 620
Default

Funny that you mention this, I have done something quite similar recently.

I wanted to find out whether the increase in sequencing errors towards later sequencing cycles (which is equivalent to a drop in Phred quality) can be described by some kind of mathematical formula. I used a couple of sequence files to determine the starting position of poor qualities. Poor qualities were defined as reads which exceeded a certain number of low quality basecalls in total (in the attached figure there had to be at least 8 quality values below 30). I tried various different thresholds (qualities 10, 15, 20, 30) but the graph does not change much.

Interestingly the pattern I got did not increase steadily towards later cycles (as I expected), and I also saw a periodicity of - you might have guessed - 5 bp for poor quality starting positions. This seems to be indeed a feature of the Illumina pipeline algorithms used. Even though it looks artefactual and I found this slightly worrying I don't think one can do much about it, as it is present in all samples irrespective of their origin.

This led me to the conclusion that the increased error rate one sees towards the end of longer reads is not chemistry or run-time related but seems to be largely the cumulative effect of these spikes of low quality basecalls which are introduced into the reads with a periodicity of 5 bp. Quite odd, isn't it?
Attached Images
File Type: jpg poor quality.jpg (85.5 KB, 51 views)
fkrueger is offline   Reply With Quote
Old 12-20-2010, 03:18 PM   #5
luxmare
Member
 
Location: Japan

Join Date: Feb 2009
Posts: 10
Default

Thank you fkrueger.
I really think so, it's weird. I hope this hidden bias will be improved in the near future.
luxmare is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:39 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO