Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Looking for a trimming software that does these things

    Hello,

    I'm looking for a trimming/filtering software that can do the following:

    1) Trim both ends until there's at least a certain number of consecutive bases higher than a specific quality score.

    2) Remove the 3'-regions of a certain length if they contained a certain percentage of bp below a specific quality score. For example, remove 3' ends of 200 bp if they were made of more than 10% of bp below 20 phred score.

    3) Filter out reads with a certain percentage of bp below a specific quality score.

    4) Remove reads with a certain number of consecutive Ns.

    5) Be paired-end-aware, i.e. if one read was removed, remove its pair (there're several of these available, but without the other features).

    6) If a read was identical to the reverse compliment to its pair, remove it.

    I'd really appreciate your help.

  • #2
    BBduk.sh (part of BBMap), Trimmomatic, Cutadapt (and perhaps others that I am missing) should fit the bill. Though they may not check every box you have up there they should get the job done.

    Comment


    • #3
      Thanks. I tried Trimmomatic but not the other two. BBduk.sh seems promising (so does the BBMap package), but I'm gonna have to take a while before understanding its syntax. I'll post back if it does what I want.

      Comment


      • #4
        Originally posted by antifolate View Post
        Hello,

        I'm looking for a trimming/filtering software that can do the following:

        1) Trim both ends until there's at least a certain number of consecutive bases higher than a specific quality score.
        BBDuk used to use this strategy, but it's not optimal so I don't really recommend it. I was able to demonstrate empirically that it was not too good, either. So, BBDuk currently uses the Phred algorithm for quality trimming, which is optimal, though it's technically possible to disable that with a flag and use the old method instead. BBDuk also supports windowed trimming (trim until the average in a sliding window exceeds some threshold).

        3) Filter out reads with a certain percentage of bp below a specific quality score.
        The "maq" flag filters by average quality, where average quality is calculated by transforming the quality scores into probabilities, so basically if you set "maq=20" it removes reads with an expected error rate greater than 1%. I don't recommend setting it that high, though.

        4) Remove reads with a certain number of consecutive Ns.
        The "maxns=X" flag will filter reads with at least X Ns, but it doesn't care whether they are consecutive.

        5) Be paired-end-aware, i.e. if one read was removed, remove its pair (there're several of these available, but without the other features).
        Check.

        6) If a read was identical to the reverse compliment to its pair, remove it.
        You can do this with BBMerge, by running it but telling it not to join overlapping reads (using the "join=f" flag), and using the "maxlength" flag plus the "out" and "outu" streams. "maxlength=X" will send reads with insert sizes longer than X to outu rather than out. So:

        bbmerge.sh in=reads.fq out=short.fq outu=long.fq join=f maxlen=150

        (this command assumes pairs are interleaved in one file)

        Comment


        • #5
          I just got around to trying these commands and- although they're not exactly what I'm trying to do- they worked pretty well. bbmerge would merge my reads so I avoided it.

          Thank you!

          Comment


          • #6
            try skewer

            Another option is skewer. Good luck!

            Originally posted by antifolate View Post
            I just got around to trying these commands and- although they're not exactly what I'm trying to do- they worked pretty well. bbmerge would merge my reads so I avoided it.

            Thank you!

            Comment


            • #7
              @Brian

              "... though it's technically possible to disable that with a flag and use the old method instead."

              How can I do this?

              Comment


              • #8
                Originally posted by antifolate View Post
                @Brian

                "... though it's technically possible to disable that with a flag and use the old method instead."

                How can I do this?
                Add the flag "otm=f" (otm stands for "optimal trimming mode").

                Comment


                • #9
                  otm=f (outputtrimmedtomatch) Output reads trimmed to shorter
                  than minlength to outm rather than discarding.


                  What bbduk you talking about?

                  Comment


                  • #10
                    Ooops, looks like I have an overloaded flag. Thanks for spotting that! I'll rename that one to "ottm" in the next release. Currently, "otm" acts on the quality trimming, so "outputtrimmedtomatch" would have to be fully spelled out in order to function according to that description. To be more specific for now, use the flag "optitrim=f" to turn off optimal trimming, and "outputtrimmedtomatch" to dictate whether trimmed reads shorter than minlen go to outm.

                    Comment


                    • #11
                      I didn't know bbduk was your work. Thanks for the help and the tool!

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Current Approaches to Protein Sequencing
                        by seqadmin


                        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                        04-04-2024, 04:25 PM
                      • seqadmin
                        Strategies for Sequencing Challenging Samples
                        by seqadmin


                        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                        03-22-2024, 06:39 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 04-11-2024, 12:08 PM
                      0 responses
                      30 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 10:19 PM
                      0 responses
                      32 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 09:21 AM
                      0 responses
                      28 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-04-2024, 09:00 AM
                      0 responses
                      53 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X