Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Trimming adapter sequence in the middle of a read

    I am analyzing microRNA sequencing data (50 BP/read, single end, Illumina) and I have a sequence like this:

    TAGTAGGTTGCATAGTTTGGAATTCTCGGGTGCCAAGGAACTCCAG

    The underlined portion is standard Illumina adapter sequence. I am pretty sure the rest of the 3' end is artifact, but the standard adapter trimming tool that I was using doesn't remove adapters that occur in the middle of the read. I was hoping to get some help on this - are there any tools available that can essentially trim the adapter sequence and the junk after it? In this case, I only want to keep the TAGTAGGTTGCATAGTT. I tried blasting the read sequences and the 5' regions are indeed microRNA sequences but will not align properly because the adapters AND the 3' regions are not microRNA sequence. Any help would be greatly appreciated. Thank you very much.

  • #2
    Cutadapt (or trim_galore, which I find to be a nice wrapper) can do that.

    Comment


    • #3
      Originally posted by milesgr View Post

      I am pretty sure the rest of the 3' end is artifact,
      I have not worked with Illumina miRNA reads, but in general the Illumina adapters are close to 60 bp long. After that you do indeed get unpredictable sequences from the flow cell or other 'artifacts', but what you are seeing is probably just more of the adapter, possibly including a multiplex barcode if one was used on your samples.

      See this webpage from U Texas at Austin:

      Comment


      • #4
        Thanks for the info - it was very helpful. As a follow-up, I used the following command:
        cutadapt -e 0.05 -a TGGAATTCTCGGGTGCCAAGG 001.fastq > 001_CLIPPED.fastq

        I found the output was still retaining some adapters. For instance, one sequence left was (underlined)
        GAGACCGCCTGGGAATACCGGGTGCTGTAGGCTTTGGAATTCTCGGGTG

        This sequence is 49 bases (remember, read lengths were 50 bp), making me think that the trimmer removed the last base and missed the big picture. Another one is here (underlined), where a single base deletion (T between bolded bases) seems to have ruined the trimming procedure here:

        CCCCCCACTGCTAACTTTGACTGGCTTTGGAATTCCGGGTGCAAGGAAC

        I wanted to leave some error (0.05 error allows for one base error out of 21 total on the adapter sequence), but cutadapt seems to be missing a lot, reducing my miRNA coverage significantly. Any suggestions would be greatly appreciated. Thanks in advance.

        Comment


        • #5
          I use trimmomatic, but i find that it doesn't recognize adapters with indels either.

          if the value of -e that you are using should allow for 1 mismatch out of 21 bases, it's possible that the adapter sequence you are giving cutadapt is too short, and the score is not high enough for it to recognize adapters in your reads. Maybe you should try allowing a higher error level.

          Comment


          • #6
            Originally posted by mastal View Post
            I use trimmomatic, but i find that it doesn't recognize adapters with indels either.
            You are correct - right now, trimmomatic doesn't perform matching with INDELs, since it is relatively rare to find them in the standard illumina datasets, and trimmomatic was very much designed to meet our own requirements rather than cover all possible tasks.

            That said, we are currently evaluating what additional alignment (or other) features are needed for more special case applications, so if anyone has any suggestions, please let us know (email on the trimmomatic web page).

            Thanks,

            Tony.

            Comment


            • #7
              Hi Tony,

              I am finding the occasional 1-base insertions or deletions in the Illumina adapter sequences. In the case of the insertions, it is sort of a homopolymer effect, and the inserted base is almost always the same as the previous base (on the 5' side) in the sequence.

              By the way, I think trimmomatic is great, even if it took me a while to understand how palindrome clipping works.

              Best wishes,
              Maria

              Comment


              • #8
                try skewer

                if you put all the sequences in test.fasta as below:
                >1
                TAGTAGGTTGCATAGTTTGGAATTCTCGGGTGCCAAGGAACTCCAG
                >2
                GAGACCGCCTGGGAATACCGGGTGCTGTAGGCTTTGGAATTCTCGGGTG
                >3
                CCCCCCACTGCTAACTTTGACTGGCTTTGGAATTCCGGGTGCAAGGAAC

                and use the following command:
                $ skewer -r 0.2 -d 0.06 -x TGGAATTCTCGGGTGCCAAGG test.fasta -1 -l 16 2>/dev/null

                you may get the following output:
                >1
                TAGTAGGTTGCATAGTT
                >2
                GAGACCGCCTGGGAATACCGGGTGCTGTAGGCTT
                >3
                CCCCCCACTGCTAACTTTGACTGGCTT

                In your case, the error rate is higher than usual case, so a higher error rate (-r 0.2) and a higher indel error rate (-d 0.06) are chosen.

                BTW: indel error occurs in illumina reads, though pretty rare.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Recent Innovations in Spatial Biology
                  by seqadmin


                  Spatial biology is an exciting field that encompasses a wide range of techniques and technologies aimed at mapping the organization and interactions of various biomolecules in their native environments. As this area of research progresses, new tools and methodologies are being introduced, accompanied by efforts to establish benchmarking standards and drive technological innovation.

                  3D Genomics
                  While spatial biology often involves studying proteins and RNAs in their...
                  01-01-2025, 07:30 PM
                • seqadmin
                  Advancing Precision Medicine for Rare Diseases in Children
                  by seqadmin




                  Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                  12-16-2024, 07:57 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Yesterday, 11:18 AM
                0 responses
                16 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 12-30-2024, 01:35 PM
                0 responses
                33 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 12-17-2024, 10:28 AM
                0 responses
                41 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 12-13-2024, 08:24 AM
                0 responses
                57 views
                0 likes
                Last Post seqadmin  
                Working...
                X