SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
trimming in tophat frymor Bioinformatics 8 12-06-2013 01:03 AM
MIDs trimming Mali Salmon 454 Pyrosequencing 0 05-08-2011 09:22 PM
What is your trimming strategy? Jasmine Bioinformatics 0 02-09-2011 06:19 AM
trimming issue malatorr 454 Pyrosequencing 3 01-25-2010 11:16 PM
3' Adapter Trimming caddymob Bioinformatics 0 05-27-2009 12:53 PM

Reply
 
Thread Tools
Old 08-19-2011, 10:30 AM   #1
byou678
Member
 
Location: Maryland

Join Date: Aug 2011
Posts: 52
Smile Please Help: What is the differences between standard trimming and adaptive trimming

Hi All,

When I do RNAseq Quality Trimmming using Perl scripts in Terminal, these Options appear:

--type <num> 0=standard trimming, 1=adaptive trimming, 2=windowed adaptive trimming. Default 0

-- qual-threshold <num> quality threshold for trimming, default 20
-- length-threshold <num> length threshold for trimming, default 20
... ...

Could anyboday explain the differences of 0=standard trimming, 1=adaptive trimming, 2=windowed adaptive trimming? and the criteria about setting length-threshold??

Thanks a lot in advance.

Last edited by byou678; 08-19-2011 at 11:34 AM.
byou678 is offline   Reply With Quote
Old 08-19-2011, 11:25 AM   #2
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

Is 'RNAsq' a program? If so (and I can not find it on the web) what does the program's documentation say? I am sure that we could hazard a guess but the program itself is your best bet.

Oh ... I just found what you are probably using. 'Trim.pl' by Nik Joshi. That would have been nice to know. Anyway, yeah, there isn't much documentation to that program, is there? I suspect that you don't read "Perl" and Nik obviously believes that "good code is self-documenting" (e.g., his lack of comments about the basics is appalling although, unfortunately, I've seen worse) so it might take someone to dig into the code to give a definitive answer.
westerman is offline   Reply With Quote
Old 08-19-2011, 11:28 AM   #3
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

For anyone who wants to dig:

http://wiki.bioinformatics.ucdavis.e...ex.php/Trim.pl

Or you could write to Nik Joshi.
westerman is offline   Reply With Quote
Old 08-19-2011, 11:32 AM   #4
byou678
Member
 
Location: Maryland

Join Date: Aug 2011
Posts: 52
Default

Sorry for the confusion. Actually, I use RNA-seq technology here. The data come from Illumina Genomic Analyzer II. Yes, I use this Scripts: 'Trim.pl' http://wiki.bioinformatics.ucdavis.e...ex.php/Trim.pl

westerman, Thanks for your nice reply!!!

Last edited by byou678; 08-19-2011 at 11:44 AM.
byou678 is offline   Reply With Quote
Old 08-19-2011, 01:22 PM   #5
gaffa
Member
 
Location: Gothenburg/Uppsala, Sweden

Join Date: Oct 2010
Posts: 82
Default

So from reading the code, "standard trimming" means that it will trim off a defined number of bases (as given by the "length-threshold" flag) from all reads, regardless of quality. In "adaptive trimming" mode it will use the quality scores to assess each read individually, by finding the first position which has a quality below cutoff (as given by the "qual-threshold" flag) and then trimming away this base and all following bases (unless the remaining read is shorter than the length threshold, in which case it will discard the whole read).

So the adaptive method is slightly more sophisticated than the standard, though it might not always do what you'd want: if a read has a single poor-quality base early on but is otherwise high-quality, this method will throw away the good part of the read (possibly the whole read). The script has a third method which is slightly more sophisticated still, the "windowed adaptive trimming", which tries to combat this problem by running a sliding window over the read and looking at the average quality in this window, rather than at a single base.
gaffa is offline   Reply With Quote
Old 08-19-2011, 02:05 PM   #6
byou678
Member
 
Location: Maryland

Join Date: Aug 2011
Posts: 52
Default Thanks for the reply

Hi gaffa, Thank you very much for the reply. For "standard trimming", from which end of the reads, the 20 bases ( if I use the default number) will be trimmed off? And if "standard trimming" regardless of quality scores, it may not be used often, am i right?

In addition, could you send me the related papers or resources about my question. I need take a deeper look because this project is really important to me.

Thanks again! Have a great weekend!


Quote:
Originally Posted by gaffa View Post
So from reading the code, "standard trimming" means that it will trim off a defined number of bases (as given by the "length-threshold" flag) from all reads, regardless of quality. In "adaptive trimming" mode it will use the quality scores to assess each read individually, by finding the first position which has a quality below cutoff (as given by the "qual-threshold" flag) and then trimming away this base and all following bases (unless the remaining read is shorter than the length threshold, in which case it will discard the whole read).

So the adaptive method is slightly more sophisticated than the standard, though it might not always do what you'd want: if a read has a single poor-quality base early on but is otherwise high-quality, this method will throw away the good part of the read (possibly the whole read). The script has a third method which is slightly more sophisticated still, the "windowed adaptive trimming", which tries to combat this problem by running a sliding window over the read and looking at the average quality in this window, rather than at a single base.
byou678 is offline   Reply With Quote
Old 08-22-2011, 07:17 AM   #7
byou678
Member
 
Location: Maryland

Join Date: Aug 2011
Posts: 52
Default

Is there anybody can offer me the related papers or resources about my urgent question? Thanks!
byou678 is offline   Reply With Quote
Old 08-22-2011, 09:44 AM   #8
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

Quote:
Originally Posted by byou678 View Post
Is there anybody can offer me the related papers or resources about my urgent question? Thanks!
I doubt if there are any papers. As far as I can tell the terms used and the algorithm used by the program are internal to the program. In other words if the author of the program got his idea from somewhere he did not cite those sources. The ideas behind his code are not that unique and have probably been implemented many times.
westerman is offline   Reply With Quote
Old 08-22-2011, 12:05 PM   #9
byou678
Member
 
Location: Maryland

Join Date: Aug 2011
Posts: 52
Default

I think the two adaptive trimming modes will check the bases with quality scores from 5' end to 3' end, and then do trimming when the poor quality base or window is found. For standard trimming, it will directly trim off the defined number bases ( like 10 or 15 ) on the 3' end regardless the quality scores are good or bad (because Most modern sequencing technologies produce reads that have deteriorating quality towards the 3'-end).

Please correct me if i am wrong. Below is a related resouce and all other ideas and help will be greatly appreciated!!

Most modern sequencing technologies produce reads that have deteriorating quality towards the 3'-end. Incorrectly called bases here negatively impact assembles, mapping, and downstream bioinformatics analyses.

Sickle is a tool that uses sliding windows along with quality and length thresholds to determine when quality is sufficiently low to trim the 3'-end of reads. It will also discard reads based upon the length threshold. It takes the quality values and slides a window across them whose length is 0.1 times the length of the read. If this length is less than 1, then the window is set to be equal to the length of the read. Otherwise, the window slides along the quality values until the average quality in the window drops below the threshold. At that point the algorithm determines where in the window the drop occurs and cuts both the read and quality strings there. However, if the cut point is less than the minimum length threshold, then the read is discarded entirely.

Thanks westerman.

Quote:
Originally Posted by westerman View Post
I doubt if there are any papers. As far as I can tell the terms used and the algorithm used by the program are internal to the program. In other words if the author of the program got his idea from somewhere he did not cite those sources. The ideas behind his code are not that unique and have probably been implemented many times.
byou678 is offline   Reply With Quote
Reply

Tags
rnaseq quality trimming

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:12 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO