SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Trimming illumina 1.8 reads ssharma Bioinformatics 7 07-18-2013 04:33 AM
trimming illumina reads empyrean Bioinformatics 5 12-20-2011 11:48 PM
Reads mapping of ends of chromosomes ETHANol Bioinformatics 0 11-03-2011 09:09 AM
How will trimming low-quality ends of Illumina reads affect TopHat and Cufflinks? ecabot RNA Sequencing 1 02-25-2010 08:31 AM

Reply
 
Thread Tools
Old 05-12-2011, 11:51 AM   #1
Kotoro
Member
 
Location: Farmington CT

Join Date: May 2011
Posts: 31
Default trimming unreliable ends of reads

What methods are normally used to trim off the unreliable poor-scoring ends of reads? Is there a tool that can statistically assess read scores and make this decision on a per-read basis, or is a cut position globally decided on?

Are there any special considerations for paired-end reads? (they've already been split and de-convoluted by barcode by the pipeline in our university's sequencing core.)
Kotoro is offline   Reply With Quote
Old 05-13-2011, 02:39 AM   #2
tonybolger
Senior Member
 
Location: berlin

Join Date: Feb 2010
Posts: 156
Default

Quote:
Originally Posted by Kotoro View Post
What methods are normally used to trim off the unreliable poor-scoring ends of reads? Is there a tool that can statistically assess read scores and make this decision on a per-read basis, or is a cut position globally decided on?
Something like a sliding window is good - so if you get a single bad cycle you don't cut a high quality region, but persistent rubbish is trimmed.

Generally you want to do it on a per-read basis - 'local' factors often influence a particular read, and there's no advantage to 'one size fits all'.

Quote:
Originally Posted by Kotoro View Post
Are there any special considerations for paired-end reads? (they've already been split and de-convoluted by barcode by the pipeline in our university's sequencing core.)
Normally you might want to handle 'unpaired' reads separately - reads which survive QC but their partners didn't.

Blatant ad: if you're working with illumina data, i've released a tool, Trimmomatic, found here, which does what you need
tonybolger is offline   Reply With Quote
Old 05-13-2011, 04:37 AM   #3
francois.sabot
Member
 
Location: France

Join Date: Dec 2009
Posts: 41
Default

Hi
Have a look at the last version of Cutadapt, which uses a nice system for trimming, and cut only if the fater bases are of better quality than the previous... See their explanation : http://code.google.com/p/cutadapt/
__________________
Francois Sabot, PhD

Be realistic. Demand the Impossible.
www.wikiposon.org
francois.sabot is offline   Reply With Quote
Old 05-13-2011, 04:45 AM   #4
gaffa
Member
 
Location: Gothenburg/Uppsala, Sweden

Join Date: Oct 2010
Posts: 82
Default

A third alternative is SolexaQA (http://solexaqa.sourceforge.net/), which can trim either down to "to the longest contiguous read segment for which the quality score at each base is greater than a user-supplied quality cutoff" or using the BWA trimming algorithm (which also can be performed by BWA in conjunction with read mapping).
gaffa is offline   Reply With Quote
Old 07-20-2011, 07:13 AM   #5
mmartin
Member
 
Location: Stockholm

Join Date: Aug 2009
Posts: 75
Default

Quote:
Originally Posted by gaffa View Post
... or using the BWA trimming algorithm ...
I'd like to point out that the quality trimming in cutadapt is simply a reimplementation of BWA's algorithm. You could use the quality-trimming part of cutadapt without trimming adapters by providing an adapter sequence that's certain to not occur -- just use -a XXXXXXXX or something like this (these are literal "X" characters).

See also file lib/cutadapt/qualtrim.py in the cutadapt distribution which also shows the algorithm and contains an explanation of it.
mmartin is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:35 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO