Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • High duplication levels in FASTQC

    Hi,

    I was using FASTQC to QC my directional mRNA-Seq data obtained from suspension culture cells. I have about 30 million reads.

    Although most of my QC stats are fine, I see a big uptick in the "Duplicate sequences" section of sequences with duplication levels > 10 (see below). Sequence Duplication Level >84.56%.

    I was wondering what could be wrong. There were 2 possibilities I could think of:
    1) Some amplification bias in PCR and/or
    2) Since the RNA is not very diverse (its from suspension cells - same cell type) and sequenced to a high coverage, many sequences got sequenced multiple times.

    Wonder if the second reason makes sense? If it is true, by extension, it also means that we have successfully sequenced even the very low abundance transcripts. However, if it was PCR bias, that wouldnt be true. Wonder if there is a way to distinguish between these two possibilities?

    I'd appreciate any suggestions.

    Thanks

  • #2
    have a look at the alignments and you will know. generally i wouldnt trust fastqc duplication levels for mRNA seq too much ..

    Comment


    • #3
      The overall duplication level reported by FastQC needs to be taken in context with the shape of the profile you're seeing and also the results of the overrepresented sequence plot. There's a big difference between having a generally oversequenced sample (which often happens with RNA-Seq so you can see low expressed transcripts), and having a small number of sequences accounting for large chunks of your library.

      What FastQC can't do is to put the duplication in any kind of context. For libraries with expected uneven coverage (such as RNA-Seq) you'd need to look at the positions of the mapped data to see if you were getting even coverage over highly duplicated regions, which would suggest you simply have really high coverage, or duplicated patchy coverage which would indicate a techinical problem.

      If you haven't seen it already I wrote up a more detailed explanation of this on my blog since this is such a common thing to come up (the duplicate sequence plot is probably the least intuitive module to interpret in the FastQC output).

      Comment


      • #4
        Fast QC Duplication

        Hello.

        I read your blog http://proteo.me.uk/2011/05/interpre...lot-in-fastqc/

        and find it helpful. I have the same problem.

        So at the end of the blog you mentioned to consider the per base quality plot to gain a realistic assessment of the duplication.

        In my case: My per base sequence quality is great. but I have the same image posted above, what does this imply?

        If my per base sequence quality passes, and I have a high sequence duplication levels, caused by the overrepresented sequence TrueSeq Adapter, can I then conclude that the quality is okay?

        Thank you

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM
        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 06:37 PM
        0 responses
        12 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Yesterday, 06:07 PM
        0 responses
        10 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-22-2024, 10:03 AM
        0 responses
        51 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-21-2024, 07:32 AM
        0 responses
        68 views
        0 likes
        Last Post seqadmin  
        Working...
        X