Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Miseq:Trimming, and sequencing primers at the beginning of a read

    I noticed that when I trim my Miseq reads for adapter contamination (getting rid of the 3' portion of the read), I could still grep the trimmed reads for ACACTCTTTCCCTACACGAC (the sequencing primer/adapter sequence) and find several thousand at the 5' end of Read1 reads. These shouldn't be there, right? What am I missing?

    I used fastq-mcf to trim the 13bp common TruSeq sequence AGATCGGAAGAGC.

    Primer sequences do not appear in the beginning of Read2 reads. In the sample sheet, I did not request that the MiSeq do any onboard trimming. For library prep, I used NEBNext Ultra, whose adapters, seq primers, and indicies are the same as the TruSeq stuff.

    So, my questions are 1) why am I getting primer sequences in read 1? and 2) Is the 13bp sequence sufficient for trimming Illumina reads (and should I be doing this differently--the reads are used for de novo assembly and blast-based binning, so aggressively getting rid of adapter sequences is important to me)?

  • #2
    First part could be explained by having adapter/primer dimers without any insert.

    As for trimming give "trimmomatic" (http://www.usadellab.org/cms/?page=trimmomatic) or cutadapt (http://code.google.com/p/cutadapt/)/trimgalore (http://www.bioinformatics.babraham.a...s/trim_galore/) a try. Recent comparison of trimmers available http://www.plosone.org/article/info:...l.pone.0085024.

    Comment


    • #3
      I like the idea of trimmomatic, but I can't seem to make it trim the adapters--they still show up after the following:

      Code:
      java -classpath /opt/Trimmomatic-0.32/trimmomatic-0.32.jar org.usadellab.trimmomatic.TrimmomaticPE -threads 8 -trimlog TT.log Pool1_S1_L001_R1_001.fastq Pool1_S1_L001_R2_001.fastq p1r1_TT.fastq p1r1_To.fastq p1r2_TT.fastq p1r2_To.fastq LEADING:3 TRAILING:3 ILLUMINACLIP:adapter_13.fa:2:30:10 SLIDINGWINDOW:4:15 MINLEN:16
      I may have the parameters set funny, but I don't know the best way to set it. My adapter sequence is the 13bp common Illumina sequence--is 13bp not scoring high enough to get trimmed?

      Comment


      • #4
        Are you using the raw data for trimming? Why not use the TruSeq3 (PE) adapters that Trimmomatic includes (you will find those files in "Trimmomatic-0.30/adapters/") for the ILLUMINACLIP input.

        Comment


        • #5
          @clintp

          I think the parameters you are using for the IlluminaClip step (2:30:10 ) are too high for trimmomatic to recognize a match to a 13-base adapter sequence;

          You need to either change the values or use a longer adapter sequence.

          See the trimmomatic web page,



          particularly the discussion under the heading 'Adapter Fasta', from which I have extracted this quote:

          'The thresholds used are a simplified log-likelihood approach. Each matching base adds just over 0.6, while each mismatch reduces the alignment score by Q/10. Therefore, a perfect match of a 12 base sequence will score just over 7"

          Comment


          • #6
            @mastal
            Yep, understanding the cutoff scores helped a lot (durrr). Somehow I missed that discussion on the trimmomatic page.

            @GenoMax
            Thanks for that reference--very useful. It's too bad they didn't include ea-utils/FastqMcf in that analysis, though.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM
            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            18 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            22 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            17 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            49 views
            0 likes
            Last Post seqadmin  
            Working...
            X