Seqanswers Leaderboard Ad

**westerman** · 08-19-2011, 11:25 AM

Is 'RNAsq' a program? If so (and I can not find it on the web) what does the program's documentation say? I am sure that we could hazard a guess but the program itself is your best bet.

Oh ... I just found what you are probably using. 'Trim.pl' by Nik Joshi. That would have been nice to know. Anyway, yeah, there isn't much documentation to that program, is there? I suspect that you don't read "Perl" and Nik obviously believes that "good code is self-documenting" (e.g., his lack of comments about the basics is appalling although, unfortunately, I've seen worse) so it might take someone to dig into the code to give a definitive answer.

**westerman** · 08-19-2011, 11:28 AM

For anyone who wants to dig:

http://wiki.bioinformatics.ucdavis.edu/index.php/Trim.pl

Or you could write to Nik Joshi.

**byou678** · 08-19-2011, 11:32 AM

Sorry for the confusion. Actually, I use RNA-seq technology here. The data come from Illumina Genomic Analyzer II. Yes, I use this Scripts: 'Trim.pl' http://wiki.bioinformatics.ucdavis.e...ex.php/Trim.pl

westerman, Thanks for your nice reply!!!

**gaffa** · 08-19-2011, 01:22 PM

So from reading the code, "standard trimming" means that it will trim off a defined number of bases (as given by the "length-threshold" flag) from all reads, regardless of quality. In "adaptive trimming" mode it will use the quality scores to assess each read individually, by finding the first position which has a quality below cutoff (as given by the "qual-threshold" flag) and then trimming away this base and all following bases (unless the remaining read is shorter than the length threshold, in which case it will discard the whole read).

So the adaptive method is slightly more sophisticated than the standard, though it might not always do what you'd want: if a read has a single poor-quality base early on but is otherwise high-quality, this method will throw away the good part of the read (possibly the whole read). The script has a third method which is slightly more sophisticated still, the "windowed adaptive trimming", which tries to combat this problem by running a sliding window over the read and looking at the average quality in this window, rather than at a single base.

**byou678** · 08-19-2011, 02:05 PM

Thanks for the reply

Hi gaffa, Thank you very much for the reply. For "standard trimming", from which end of the reads, the 20 bases ( if I use the default number) will be trimmed off? And if "standard trimming" regardless of quality scores, it may not be used often, am i right?

In addition, could you send me the related papers or resources about my question. I need take a deeper look because this project is really important to me.

Thanks again! Have a great weekend!

Originally posted by gaffa View Post

So from reading the code, "standard trimming" means that it will trim off a defined number of bases (as given by the "length-threshold" flag) from all reads, regardless of quality. In "adaptive trimming" mode it will use the quality scores to assess each read individually, by finding the first position which has a quality below cutoff (as given by the "qual-threshold" flag) and then trimming away this base and all following bases (unless the remaining read is shorter than the length threshold, in which case it will discard the whole read).

So the adaptive method is slightly more sophisticated than the standard, though it might not always do what you'd want: if a read has a single poor-quality base early on but is otherwise high-quality, this method will throw away the good part of the read (possibly the whole read). The script has a third method which is slightly more sophisticated still, the "windowed adaptive trimming", which tries to combat this problem by running a sliding window over the read and looking at the average quality in this window, rather than at a single base.

**byou678** · 08-22-2011, 07:17 AM

Is there anybody can offer me the related papers or resources about my urgent question? Thanks!

**westerman** · 08-22-2011, 09:44 AM

Originally posted by byou678 View Post

Is there anybody can offer me the related papers or resources about my urgent question? Thanks!

I doubt if there are any papers. As far as I can tell the terms used and the algorithm used by the program are internal to the program. In other words if the author of the program got his idea from somewhere he did not cite those sources. The ideas behind his code are not that unique and have probably been implemented many times.

**byou678** · 08-22-2011, 12:05 PM

I think the two adaptive trimming modes will check the bases with quality scores from 5' end to 3' end, and then do trimming when the poor quality base or window is found. For standard trimming, it will directly trim off the defined number bases ( like 10 or 15 ) on the 3' end regardless the quality scores are good or bad (because Most modern sequencing technologies produce reads that have deteriorating quality towards the 3'-end).

Please correct me if i am wrong. Below is a related resouce and all other ideas and help will be greatly appreciated!!

Most modern sequencing technologies produce reads that have deteriorating quality towards the 3'-end. Incorrectly called bases here negatively impact assembles, mapping, and downstream bioinformatics analyses.

Sickle is a tool that uses sliding windows along with quality and length thresholds to determine when quality is sufficiently low to trim the 3'-end of reads. It will also discard reads based upon the length threshold. It takes the quality values and slides a window across them whose length is 0.1 times the length of the read. If this length is less than 1, then the window is set to be equal to the length of the read. Otherwise, the window slides along the quality values until the average quality in the window drops below the threshold. At that point the algorithm determines where in the window the drop occurs and cuts both the read and quality strings there. However, if the cut point is less than the minimum length threshold, then the read is discarded entirely.

Thanks westerman.

Originally posted by westerman View Post

I doubt if there are any papers. As far as I can tell the terms used and the algorithm used by the program are internal to the program. In other words if the author of the program got his idea from somewhere he did not cite those sources. The ideas behind his code are not that unique and have probably been implemented many times.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 24 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 19 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 50 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Please Help: What is the differences between standard trimming and adaptive trimming

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News