Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • High duplicates in mRNA-seq data

    I extracted total RNA from drug and vehicle treated primary neurons (mouse) and used Kapa Stranded mRNA-Seq kit to generate libraries.

    Goal is differential expression analysis - primarily looking at roughly 60 neuronal genes and also a more general effect of our drugs on transcriptional output of neuronal genes.

    Input RNA: 1.5ug, PCR cycles - 8x - RNA RIN was always over 8 with good electropherogram trace

    Sequencing info: Illumina HiSeq2100 - 5 libraries multiplexed into 1 lane.

    So the problem: between 55-60% duplication rate for all libraries - very consistent across the board. The highest number of duplicates are from poly-A and poly-T tracts according to QC data from the sequencing core.

    I could really use some advice here. Is this rate of duplication a problem for a DE experiment such as this? What rate of duplication would be more acceptable?

    Thanks so much for any input, I'm really worried that my whole PhD project is toast...

  • #2
    This sounds fairly typical, one expects a high level of apparent duplications in RNAseq. Note that I wrote "apparent duplications", since these are likely not real PCR or optical duplicates. A bias toward the 3' end is also not that uncommon, at least if you did any polyA enrichment (I'm not familiar with the kapa kit).

    BTW, it's a bit premature to worry that your PhD is toast after one experiment (hint, most experiments don't work).

    Comment


    • #3
      Originally posted by dpryan View Post
      Note that I wrote "apparent duplications", since these are likely not real PCR or optical duplicates
      Slightly off-topic... I've been wondering why Illumina or any other company didn't commercialize a library prep kit where each read gets its own random barcode. In principle it shouldn't be that difficult to generate adapters with a random kmer long enough to distinguish millions of reads. Not saying that it's going to be easy in practice but this issue of what to do with positional duplicates recurs so often and it seems to me that any work around it is not ideal.

      Comment


      • #4
        In a sense that's what 10x is doing, but for whole genome sequencing, so presumably it's possible.

        Comment


        • #5
          Originally posted by dariober View Post
          Slightly off-topic... I've been wondering why Illumina or any other company didn't commercialize a library prep kit where each read gets its own random barcode. In principle it shouldn't be that difficult to generate adapters with a random kmer long enough to distinguish millions of reads. Not saying that it's going to be easy in practice but this issue of what to do with positional duplicates recurs so often and it seems to me that any work around it is not ideal.
          At least there is a kit that has implemented molecular tagging but I can think of few reasons for less wide adaptation of this approach:
          1- With majority of current kits, adapter ends that ligate to insert are double stranded thus using random sequences would result in less complementary ends and low ligation efficiency
          2- It seems logical approach at first look but the practical value of such approach is questionable. For more info look at these: http://journals.plos.org/plosone/art...l.pone.0119123 and http://www.pnas.org/content/109/21/E1330.full

          Comment


          • #6
            Back on topic plz

            [QUOTE=dpryan;172542]This sounds fairly typical, one expects a high level of apparent duplications in RNAseq. Note that I wrote "apparent duplications", since these are likely not real PCR or optical duplicates. A bias toward the 3' end is also not that uncommon, at least if you did any polyA enrichment (I'm not familiar with the kapa kit).

            I realize that especially with SE RNA-seq high duplication rates are very common. Except that When I spoke to Kapa, they told me the duplication rate they expect to see is in the 25-32% range and nowhere near the 55-60% duplication that I'm seeing...I'm kind of stumped...

            Comment


            • #7
              % of duplicates per gene

              One thing I've looked at is the % of duplicates per gene. If you have a high number of duplicates only in a few genes you should be fine, but if you have low expression genes with high duplication then you should look a bit more closely into this, you might have PCR amplification biases. This all is relative to PE and coverage but calculating the % of duplicates per gene (as opposed to library total) should help elucidate if you have a problem or not.

              Check this out:


              By the way here they use the "random" barcode method mentioned above (better known as a UMI or unique molecular identifier)
              Last edited by aleferna; 11-28-2016, 02:05 AM.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:37 PM
              0 responses
              10 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 06:07 PM
              0 responses
              9 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              51 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              67 views
              0 likes
              Last Post seqadmin  
              Working...
              X