Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • milesgr
    Member
    • Jun 2010
    • 34

    Trimming adapter sequence in the middle of a read

    I am analyzing microRNA sequencing data (50 BP/read, single end, Illumina) and I have a sequence like this:

    TAGTAGGTTGCATAGTTTGGAATTCTCGGGTGCCAAGGAACTCCAG

    The underlined portion is standard Illumina adapter sequence. I am pretty sure the rest of the 3' end is artifact, but the standard adapter trimming tool that I was using doesn't remove adapters that occur in the middle of the read. I was hoping to get some help on this - are there any tools available that can essentially trim the adapter sequence and the junk after it? In this case, I only want to keep the TAGTAGGTTGCATAGTT. I tried blasting the read sequences and the 5' regions are indeed microRNA sequences but will not align properly because the adapters AND the 3' regions are not microRNA sequence. Any help would be greatly appreciated. Thank you very much.
  • dpryan
    Devon Ryan
    • Jul 2011
    • 3478

    #2
    Cutadapt (or trim_galore, which I find to be a nice wrapper) can do that.

    Comment

    • mastal
      Senior Member
      • Mar 2009
      • 666

      #3
      Originally posted by milesgr View Post

      I am pretty sure the rest of the 3' end is artifact,
      I have not worked with Illumina miRNA reads, but in general the Illumina adapters are close to 60 bp long. After that you do indeed get unpredictable sequences from the flow cell or other 'artifacts', but what you are seeing is probably just more of the adapter, possibly including a multiplex barcode if one was used on your samples.

      See this webpage from U Texas at Austin:

      Comment

      • milesgr
        Member
        • Jun 2010
        • 34

        #4
        Thanks for the info - it was very helpful. As a follow-up, I used the following command:
        cutadapt -e 0.05 -a TGGAATTCTCGGGTGCCAAGG 001.fastq > 001_CLIPPED.fastq

        I found the output was still retaining some adapters. For instance, one sequence left was (underlined)
        GAGACCGCCTGGGAATACCGGGTGCTGTAGGCTTTGGAATTCTCGGGTG

        This sequence is 49 bases (remember, read lengths were 50 bp), making me think that the trimmer removed the last base and missed the big picture. Another one is here (underlined), where a single base deletion (T between bolded bases) seems to have ruined the trimming procedure here:

        CCCCCCACTGCTAACTTTGACTGGCTTTGGAATTCCGGGTGCAAGGAAC

        I wanted to leave some error (0.05 error allows for one base error out of 21 total on the adapter sequence), but cutadapt seems to be missing a lot, reducing my miRNA coverage significantly. Any suggestions would be greatly appreciated. Thanks in advance.

        Comment

        • mastal
          Senior Member
          • Mar 2009
          • 666

          #5
          I use trimmomatic, but i find that it doesn't recognize adapters with indels either.

          if the value of -e that you are using should allow for 1 mismatch out of 21 bases, it's possible that the adapter sequence you are giving cutadapt is too short, and the score is not high enough for it to recognize adapters in your reads. Maybe you should try allowing a higher error level.

          Comment

          • tonybolger
            Senior Member
            • Feb 2010
            • 156

            #6
            Originally posted by mastal View Post
            I use trimmomatic, but i find that it doesn't recognize adapters with indels either.
            You are correct - right now, trimmomatic doesn't perform matching with INDELs, since it is relatively rare to find them in the standard illumina datasets, and trimmomatic was very much designed to meet our own requirements rather than cover all possible tasks.

            That said, we are currently evaluating what additional alignment (or other) features are needed for more special case applications, so if anyone has any suggestions, please let us know (email on the trimmomatic web page).

            Thanks,

            Tony.

            Comment

            • mastal
              Senior Member
              • Mar 2009
              • 666

              #7
              Hi Tony,

              I am finding the occasional 1-base insertions or deletions in the Illumina adapter sequences. In the case of the insertions, it is sort of a homopolymer effect, and the inserted base is almost always the same as the previous base (on the 5' side) in the sequence.

              By the way, I think trimmomatic is great, even if it took me a while to understand how palindrome clipping works.

              Best wishes,
              Maria

              Comment

              • relipmoc
                Member
                • Jul 2011
                • 58

                #8
                try skewer

                if you put all the sequences in test.fasta as below:
                >1
                TAGTAGGTTGCATAGTTTGGAATTCTCGGGTGCCAAGGAACTCCAG
                >2
                GAGACCGCCTGGGAATACCGGGTGCTGTAGGCTTTGGAATTCTCGGGTG
                >3
                CCCCCCACTGCTAACTTTGACTGGCTTTGGAATTCCGGGTGCAAGGAAC

                and use the following command:
                $ skewer -r 0.2 -d 0.06 -x TGGAATTCTCGGGTGCCAAGG test.fasta -1 -l 16 2>/dev/null

                you may get the following output:
                >1
                TAGTAGGTTGCATAGTT
                >2
                GAGACCGCCTGGGAATACCGGGTGCTGTAGGCTT
                >3
                CCCCCCACTGCTAACTTTGACTGGCTT

                In your case, the error rate is higher than usual case, so a higher error rate (-r 0.2) and a higher indel error rate (-d 0.06) are chosen.

                BTW: indel error occurs in illumina reads, though pretty rare.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Pathogen Surveillance with Advanced Genomic Tools
                  by seqadmin




                  The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                  03-24-2025, 11:48 AM
                • seqadmin
                  New Genomics Tools and Methods Shared at AGBT 2025
                  by seqadmin


                  This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                  The Headliner
                  The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                  03-03-2025, 01:39 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 03-20-2025, 05:03 AM
                0 responses
                49 views
                0 reactions
                Last Post seqadmin  
                Started by seqadmin, 03-19-2025, 07:27 AM
                0 responses
                57 views
                0 reactions
                Last Post seqadmin  
                Started by seqadmin, 03-18-2025, 12:50 PM
                0 responses
                50 views
                0 reactions
                Last Post seqadmin  
                Started by seqadmin, 03-03-2025, 01:15 PM
                0 responses
                201 views
                0 reactions
                Last Post seqadmin  
                Working...