Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Do I need to trim the sequences like this?

    When I checked my solexa sequencing reads, i found that some of them are like this.


    NNNNNNNNAGGNNNNNGGAGNGNNGNNNCAGNGNTGNNNNNNNNNNNNNANNNNNNGNNNNNNNTGGNGGNNNNNNNN
    +
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


    First, there are poly "N" in the middle of the sequence as well as at the end.
    Second, all of the base calls are in low quality (I guess that % is the lowest quality score in this format, right?)
    Third, in some other cases, I can see poly "A" at the end of a sequence.

    How should I deal with the reads having those features? Should I just get rid of them, or do some trimming? If trimming is recommended in some cases, what software is suitable for solexa reads?

  • #2
    I count about 21 bases that are not N's in that sequence. You may not have enough bases for a unique match to your genome (depends on the genome size). A score of '%' is the fifth-from-lowest score possible (on Phred-33), which makes it likely either a 5 or a -1.
    Personally I would throw this read out. because most of the bases aren't called, and none of them have reasonable scores.

    Comment


    • #3
      Hi mrawlins,

      THanks for answering. Do you have any idea about the common quality score people use to trim sequences?

      Originally posted by mrawlins View Post
      I count about 21 bases that are not N's in that sequence. You may not have enough bases for a unique match to your genome (depends on the genome size). A score of '%' is the fifth-from-lowest score possible (on Phred-33), which makes it likely either a 5 or a -1.
      Personally I would throw this read out. because most of the bases aren't called, and none of them have reasonable scores.

      Comment


      • #4
        I don't know what scores people would use to trim/reject reads. We use SOLiD machines, so the calling is done differently than in Solexa, and the scores are different. For one thing, we never see N's. I would probably throw out any read where there wasn't at least 20 contiguous base calls and 25 base calls total (though I may expect at least 25 contiguous base calls to be safe). That makes it unlikely to match to the genome by random chance, so if the low quality reads are mis-called they will likely not map to the genome.

        Comment


        • #5
          Originally posted by mrawlins View Post
          I would probably throw out any read where there wasn't at least 20 contiguous base calls and 25 base calls total (though I may expect at least 25 contiguous base calls to be safe). That makes it unlikely to match to the genome by random chance, so if the low quality reads are mis-called they will likely not map to the genome.
          This is reasonable BUT you have to make sure your software can actually handle ambiguous/unknown bases like 'N. For example, some fast read aligners will NOT align the read if it has an 'N', and some assembly software ignores them or converts them to 'A'.

          We throw away all our reads with any N in them at all after trimming from 3' end. This usually only rejects about 1% to 5% of the total.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM
          • seqadmin
            Techniques and Challenges in Conservation Genomics
            by seqadmin



            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

            Avian Conservation
            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
            03-08-2024, 10:41 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 06:37 PM
          0 responses
          10 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, Yesterday, 06:07 PM
          0 responses
          10 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-22-2024, 10:03 AM
          0 responses
          51 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-21-2024, 07:32 AM
          0 responses
          67 views
          0 likes
          Last Post seqadmin  
          Working...
          X