Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Please Help: What is the differences between standard trimming and adaptive trimming

    Hi All,

    When I do RNAseq Quality Trimmming using Perl scripts in Terminal, these Options appear:

    --type <num> 0=standard trimming, 1=adaptive trimming, 2=windowed adaptive trimming. Default 0

    -- qual-threshold <num> quality threshold for trimming, default 20
    -- length-threshold <num> length threshold for trimming, default 20
    ... ...

    Could anyboday explain the differences of 0=standard trimming, 1=adaptive trimming, 2=windowed adaptive trimming? and the criteria about setting length-threshold??

    Thanks a lot in advance.
    Last edited by byou678; 08-19-2011, 11:34 AM.

  • #2
    Is 'RNAsq' a program? If so (and I can not find it on the web) what does the program's documentation say? I am sure that we could hazard a guess but the program itself is your best bet.

    Oh ... I just found what you are probably using. 'Trim.pl' by Nik Joshi. That would have been nice to know. Anyway, yeah, there isn't much documentation to that program, is there? I suspect that you don't read "Perl" and Nik obviously believes that "good code is self-documenting" (e.g., his lack of comments about the basics is appalling although, unfortunately, I've seen worse) so it might take someone to dig into the code to give a definitive answer.

    Comment


    • #3
      For anyone who wants to dig:



      Or you could write to Nik Joshi.

      Comment


      • #4
        Sorry for the confusion. Actually, I use RNA-seq technology here. The data come from Illumina Genomic Analyzer II. Yes, I use this Scripts: 'Trim.pl' http://wiki.bioinformatics.ucdavis.e...ex.php/Trim.pl

        westerman, Thanks for your nice reply!!!
        Last edited by byou678; 08-19-2011, 11:44 AM.

        Comment


        • #5
          So from reading the code, "standard trimming" means that it will trim off a defined number of bases (as given by the "length-threshold" flag) from all reads, regardless of quality. In "adaptive trimming" mode it will use the quality scores to assess each read individually, by finding the first position which has a quality below cutoff (as given by the "qual-threshold" flag) and then trimming away this base and all following bases (unless the remaining read is shorter than the length threshold, in which case it will discard the whole read).

          So the adaptive method is slightly more sophisticated than the standard, though it might not always do what you'd want: if a read has a single poor-quality base early on but is otherwise high-quality, this method will throw away the good part of the read (possibly the whole read). The script has a third method which is slightly more sophisticated still, the "windowed adaptive trimming", which tries to combat this problem by running a sliding window over the read and looking at the average quality in this window, rather than at a single base.

          Comment


          • #6
            Thanks for the reply

            Hi gaffa, Thank you very much for the reply. For "standard trimming", from which end of the reads, the 20 bases ( if I use the default number) will be trimmed off? And if "standard trimming" regardless of quality scores, it may not be used often, am i right?

            In addition, could you send me the related papers or resources about my question. I need take a deeper look because this project is really important to me.

            Thanks again! Have a great weekend!


            Originally posted by gaffa View Post
            So from reading the code, "standard trimming" means that it will trim off a defined number of bases (as given by the "length-threshold" flag) from all reads, regardless of quality. In "adaptive trimming" mode it will use the quality scores to assess each read individually, by finding the first position which has a quality below cutoff (as given by the "qual-threshold" flag) and then trimming away this base and all following bases (unless the remaining read is shorter than the length threshold, in which case it will discard the whole read).

            So the adaptive method is slightly more sophisticated than the standard, though it might not always do what you'd want: if a read has a single poor-quality base early on but is otherwise high-quality, this method will throw away the good part of the read (possibly the whole read). The script has a third method which is slightly more sophisticated still, the "windowed adaptive trimming", which tries to combat this problem by running a sliding window over the read and looking at the average quality in this window, rather than at a single base.

            Comment


            • #7
              Is there anybody can offer me the related papers or resources about my urgent question? Thanks!

              Comment


              • #8
                Originally posted by byou678 View Post
                Is there anybody can offer me the related papers or resources about my urgent question? Thanks!
                I doubt if there are any papers. As far as I can tell the terms used and the algorithm used by the program are internal to the program. In other words if the author of the program got his idea from somewhere he did not cite those sources. The ideas behind his code are not that unique and have probably been implemented many times.

                Comment


                • #9
                  I think the two adaptive trimming modes will check the bases with quality scores from 5' end to 3' end, and then do trimming when the poor quality base or window is found. For standard trimming, it will directly trim off the defined number bases ( like 10 or 15 ) on the 3' end regardless the quality scores are good or bad (because Most modern sequencing technologies produce reads that have deteriorating quality towards the 3'-end).

                  Please correct me if i am wrong. Below is a related resouce and all other ideas and help will be greatly appreciated!!

                  Most modern sequencing technologies produce reads that have deteriorating quality towards the 3'-end. Incorrectly called bases here negatively impact assembles, mapping, and downstream bioinformatics analyses.

                  Sickle is a tool that uses sliding windows along with quality and length thresholds to determine when quality is sufficiently low to trim the 3'-end of reads. It will also discard reads based upon the length threshold. It takes the quality values and slides a window across them whose length is 0.1 times the length of the read. If this length is less than 1, then the window is set to be equal to the length of the read. Otherwise, the window slides along the quality values until the average quality in the window drops below the threshold. At that point the algorithm determines where in the window the drop occurs and cuts both the read and quality strings there. However, if the cut point is less than the minimum length threshold, then the read is discarded entirely.

                  Thanks westerman.

                  Originally posted by westerman View Post
                  I doubt if there are any papers. As far as I can tell the terms used and the algorithm used by the program are internal to the program. In other words if the author of the program got his idea from somewhere he did not cite those sources. The ideas behind his code are not that unique and have probably been implemented many times.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM
                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  22 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  24 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  19 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  50 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X