Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Trim off variable length 'N' strings at the end of the read

    Hi,

    I need to remove all 'N' strings in a fastq file. I have paired end files and there are N strings at the end of some reads of variable length (both the reads and the N strings are of variable length). I can't find any tool to do this. Trimmomatic will remove bases based on their quality score. Fastx_trimmer will keep 'x' first bases.

    Anyone has a script for this or knows of a tool? It is important that the tool deals with paired files and keeps the pairs 'alive' after trimming in both files.

    ps: I tried to install nesoni but I am uncapable to do this in the server without root permisions and an older python version.

    Thanx
    Illinu

  • #2
    Aren't the quality scores of the Ns very low? Trimming by quality will normally remove stretches of them (at least unless they're in the middle, which happens).

    Comment


    • #3
      As dpryan said, quality-trimming should suffice; Ns should have a quality of zero, so you can just set the quality-trim threshold at 1. For example, with BBTools:

      reformat.sh in1=read1.fq in2=read2.fq out1=trimmed1.fq out2=trimmed2.fq qtrim=rl trimq=1

      That will keep the pairs together. That program will also automatically convert the quality of Ns to zero, if they happen to be non-zero.

      Comment


      • #4
        You can trim poly-Ns with PRINSEQ. There are (at least) three options to control the trimming, one to specify the minimum length of Ns at the 3-prime end, another option to specify the maximum N percentage to allow, and one option to specify the max number of Ns to allow.

        Code:
        -trim_ns_right <integer>
                    Trim poly-N tail with a minimum length of trim_ns_right at the
                    3'-end.
        
        -ns_max_p <integer>
                    Filter sequence with more than ns_max_p percentage of Ns.
        
        -ns_max_n <integer>
                    Filter sequence with more than ns_max_n Ns.

        Comment


        • #5
          SES, thank you for this tip I think it's the approach I was looking for.
          To answer the previous posts, I checked precisely this and surprisingly the scores are high. I don't understand why but I was expecting them to be null if nothing.

          Comment


          • #6
            Originally posted by SES View Post
            You can trim poly-Ns with PRINSEQ. There are (at least) three options to control the trimming, one to specify the minimum length of Ns at the 3-prime end, another option to specify the maximum N percentage to allow, and one option to specify the max number of Ns to allow.

            Code:
            -trim_ns_right <integer>
                        Trim poly-N tail with a minimum length of trim_ns_right at the
                        3'-end.
            
            -ns_max_p <integer>
                        Filter sequence with more than ns_max_p percentage of Ns.
            
            -ns_max_n <integer>
                        Filter sequence with more than ns_max_n Ns.
            SES, I am thinking now... this will not handle paired files right?

            Comment


            • #7
              Originally posted by illinu View Post
              To answer the previous posts, I checked precisely this and surprisingly the scores are high. I don't understand why but I was expecting them to be null if nothing.
              But like I said, BBTools will automatically change the quality of N bases to 0, because it makes no sense for them to have any other quality. So they will be trimmed anyway.

              Comment


              • #8
                Originally posted by illinu View Post
                SES, I am thinking now... this will not handle paired files right?
                Yes, I believe recent versions of PRINSEQ will handle paired-end files correctly. If you run into issues, you could use Pairfq to re-pair your reads and separate the singletons after trimming.

                Comment


                • #9
                  I tried the bbmap option and it worked beautifully! Only 5 minutes wallclock time. The programs needs no installation it runs with java.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Essential Discoveries and Tools in Epitranscriptomics
                    by seqadmin


                    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                    Yesterday, 07:01 AM
                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  39 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  41 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  35 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  55 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X