Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    A couple of points:
    (1) Transposases commonly have target site preferences. Already said, but apparently needs to be repeated. There is nothing surprising about a transposase retaining those site preferences as it inserts into the DNA of a variety of different species. DNA is DNA, right?
    (2) I think this preference makes it non-ideal for the construction of genomic shotgun libraries. But, let's not exaggerate the situation. The deflections from perfect randomness look to be in the 10-20% range. Most assemblers probably work better with less biased end points. But there are lots of fluctuations from the non-ideal in our data sets. You assess the pros and cons and move on.

    --
    Phillip
    Last edited by pmiguel; 05-05-2014, 04:33 AM.

    Comment


    • #17
      Originally posted by roliwilhelm View Post
      Hello All,

      I summarized all of the information in a blog post.

      Thanks!
      By the way, the image from your blog:


      shows an increase in A composition towards the end of your reads. I think this usually means that there are a high frequency of very short amplicons reads in your data set. That is, many of them have read through the insert, the right adapter and into the polyA (or polyT, depending on your strand of reference) attachment of the flow cell oligos to the surface of the flowcell.

      Did you run FastQC on the clipped reads? If so, my guess is that your clipper is missing lots of adapters.

      By the way, one factor that makes the default settings for FastQC a poor choice for this sort of analysis are the unequal bin widths it uses. Yeah, I know it isn't convenient to scroll right really far in your browser to see the whole image, but given the distortion it causes I prefer to have to do that.

      --
      Phillip

      Comment


      • #18
        @kmcarr: That paper was very useful; thanks for sharing it. It is also the same paper the Illumina representative referenced. It enabled me to match some of the recurring sequences in the first 14bp of my reads to the Tn5 recognition site they cite.

        I also realized that the proportion of reads with this bias is quite small (0.3%), though initially I thought it was far greater of an effect. This misconception was due to a miscalculation on my part. I summed the "counts" column for the top 7 overrepresented k-mer in the FastQC report and divided by the totoal number of sequences in my library and came up with > 95% of reads containing "over-represented" sequences. In reality, the "counts" column is the total observed frequency, not the number of occurrences at the start of the read, so this was a vast overestimate.

        Thank you all for your thoughtful responses.

        Comment


        • #19
          Kmers in mid part of sequence

          Is there an explanation for Kmers in the mid part of sequence?
          The capture is Nextera whole exome, sequenced in Illumina Hiseq pairend 100bp.
          The Kmers persist after Trimmomatic. The quality of the data from fastqc after the trimming is better. Such appearance occurs in multiple samples. I have asked Illumina 2 weeks ago but still pending answers.

          Thanks
          Attached Files

          Comment


          • #20
            Hi,

            we are seeing a similar issue using the Agilent QXT kit, on captured and whole genome experiments. This kit also uses transposases.

            HTH

            Dave

            Comment


            • #21
              The capture is Nextera whole exome, sequenced in Illumina Hiseq pairend 100bp.
              I wonder if reads with over-represented Kmers map to genome or target exons.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                Yesterday, 07:01 AM
              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              55 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              52 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              45 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              55 views
              0 likes
              Last Post seqadmin  
              Working...
              X