Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • MiSeq gDNA reads still fail "Kmer content" and "per base seq content" after trimming"

    I have genomic DNA that was PE sequenced on the MiSeq platform. I understand there must've been some adapter read through due to the large read sizes. Even after trimming, I still get some enriched kmers and skewed GC content on either end of both pairs of reads. Here are some Kmer content graphs: , , , , ,

    Here are some examples of per base GC content: ,


    I ran trimmomatic with
    PE -phred33 ILLUMINACLIP:TruSeq2-PE.fa:2:20:7:2 LEADING:13 TRAILING:13 SLIDINGWINDOW:4:15 MINLEN:36

    My adapter file
    $ cat TruSeq2-PE.fa
    >PrefixPE/1
    AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT
    >PrefixPE/2
    CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTGCTGAACCGCTCTTCCGATCT
    >PCR_Primer1
    AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT
    >PCR_Primer1_rc
    AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT
    >PCR_Primer2
    CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTGCTGAACCGCTCTTCCGATCT
    >PCR_Primer2_rc
    AGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTG
    >FlowCell1
    TTTTTTTTTTAATGATACGGCGACCACCGAGATCTACAC
    >FlowCell2
    TTTTTTTTTTCAAGCAGAAGACGGCATACGA

  • #2
    First, do you know what kind library prep was used? If it was Nextera, that would explain the biased sequence near the beginning, and also why some adapters are not being removed, since you're trimming for TruSeq sequences. But if it was in fact TruSeq, then I'm not really sure about the biased composition near the beginning.

    Unfortunately, because of the way FastQC compresses the base positions after base 9, it's impossible to get a good idea of what's going on at the end of the read from those graphs. But note that typical adapter-trimming will not remove adapters shorter than X bp at the very end, because it becomes too short to match the sequence confidently (X is usually a parameter). However, BBDuk can still remove those very short adapter sequences from PE reads by overlapping them to determine the insert size, so you might give that a try; just use the "tbo" flag.

    Comment


    • #3
      Just trim off the ends.
      Is probably less of a headache than trying to figure out the problem.

      For the high GC at the end: It seems to be that in general the longer reads have a higher chance to have GC at the end, not AT.
      So if your reads are of inequal length, then you'll just get an increase of GC content at the end, because all the AT is more likely to be removed.

      Comment


      • #4
        Originally posted by ysnapus View Post
        I ran trimmomatic with
        PE -phred33 ILLUMINACLIP:TruSeq2-PE.fa:2:20:7:2 LEADING:13 TRAILING:13 SLIDINGWINDOW:4:15 MINLEN:36
        I agree with Brian. Are you sure it is a TruSeq2 library? We often see this kind of sequence content plots for Nextera libraries. In this case you should just use the NexteraPE-PE.fa adapter file.

        Comment


        • #5
          Originally posted by avo View Post
          I agree with Brian. Are you sure it is a TruSeq2 library? We often see this kind of sequence content plots for Nextera libraries. In this case you should just use the NexteraPE-PE.fa adapter file.
          It definitely looks like a TruSeq (or other mechanically fragmented) library to me. Nextera (tagmentase fragmented) have a very distinct and more exaggerated base composition bias at the 5' end. TruSeq or other libraries in which the input DNA is fragmented in a Covaris still show a slight bias in their 5' base composition due to base composition influencing fragmentation sensitivity.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM
          • seqadmin
            Techniques and Challenges in Conservation Genomics
            by seqadmin



            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

            Avian Conservation
            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
            03-08-2024, 10:41 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 06:37 PM
          0 responses
          8 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, Yesterday, 06:07 PM
          0 responses
          8 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-22-2024, 10:03 AM
          0 responses
          49 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-21-2024, 07:32 AM
          0 responses
          66 views
          0 likes
          Last Post seqadmin  
          Working...
          X