Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Per Base Quality scores in FastQC

    Hello,

    I'm very new to analysing next gen data, and I was hoping someone might be able to help me interpret a failed diagnostic plot I've just obtained in FastQC.

    I was wondering if anyone had come across anything similar to the per base score quality read I just got in FastQC (see attached picture)? It doesn't look anything like the other 'failed' example plots, and I'm at a bit of a loss as to why base 11 in particular is so poor... is this indicative of contamination/sequencing reaction failure/short sequences?

    Secondly, is it actually possible to assemble such 'failed' data? And is there anything I should specifically be doing in terms of assembly/quality filtering/trimming? I'd be very grateful of any suggestions!

    This is single end, ancient DNA sequence data, and this is the plot for all the sequences with the correct tag sequence.

    Many thanks for your help.
    Attached Files

  • #2
    Originally posted by mittymat View Post
    Hello,

    I'm very new to analysing next gen data, and I was hoping someone might be able to help me interpret a failed diagnostic plot I've just obtained in FastQC.

    I was wondering if anyone had come across anything similar to the per base score quality read I just got in FastQC (see attached picture)? It doesn't look anything like the other 'failed' example plots, and I'm at a bit of a loss as to why base 11 in particular is so poor... is this indicative of contamination/sequencing reaction failure/short sequences?

    Secondly, is it actually possible to assemble such 'failed' data? And is there anything I should specifically be doing in terms of assembly/quality filtering/trimming? I'd be very grateful of any suggestions!

    This is single end, ancient DNA sequence data, and this is the plot for all the sequences with the correct tag sequence.

    Many thanks for your help.
    Was this sample run on a HiSeq with a v3 flow cell? I ask because I suspect that this run experienced the "Bottom Middle Swath" problem. This manifests itself as a complete lack of useful image data for the bottom middle swath of a HS v3 flow cell and can occur randomly. This results in all clusters from that swath, for that cycle having N's in their sequence. Now this would normally mean 1/6th of the reads (16.7%) since there are six swaths per lane, but I see that your data shows 33% Ns at this position during cycle 11 which suggest to me that 2 swaths produced unusable data.

    No as to whether you can use these reads the answer is, "That depends." What fraction of your total does this lane represent? What is your experimental objective (e.g. de novo assembly, SNP detection, etc.)?

    Comment


    • #3
      Kmcarr's suggestion above is probably right, but we've seen a couple of other things cause similar profiles in the past. You might have a transient problem with the flowcell (bubbles stuck in the lane for example), which then later cleared. We've also seen loss of quality where we hit a fixed biased position in the data. For example if base 5 was a T in every read then the qualities for that position would suddenly fall (and the N count would rise), but things would return to normal in later cycles.

      Comment


      • #4
        Hello,
        Thanks very much for your responses! (Sorry for the delay in replying). It seems that yes, the data was generated on a HiSeq, probably with a v3 flow cell, and thus it could be the Bottom Middle Swath problem occurring. I hadn't heard of it before, nor had the people doing the sequencing, as far as I'm aware.
        However, in regards to my second question, it doesn't seem to effect assembly, or rather it has still been possible to assemble the data, I'm not sure how affected it has been. We're waiting for other analyses from the same machine, so it will be interesting to compare the quality info for these analysis.
        Thanks again!

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM
        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 06:37 PM
        0 responses
        11 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Yesterday, 06:07 PM
        0 responses
        10 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-22-2024, 10:03 AM
        0 responses
        51 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-21-2024, 07:32 AM
        0 responses
        67 views
        0 likes
        Last Post seqadmin  
        Working...
        X