Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Strange per base sequence content from FastQC report

    Hi all ,

    I am new to bioinformatics. I have encountered some problems with the data analysis(RNA-seq for human, Pair-end 101cycle). The per base sequence content and Kmer content look pretty strange from what i expected.
    Here are my questions (please use the attached file for reference)

    1.There is a sudden rise of %A around 50bp. Would it be adapter contamination? but the adapter content keeps low throughout the whole run.

    2. What is the possible cause of the A/T imbalance?

    3. What is the possible cause of peaks around 40-49bp from the Kmer content?

    4. why the base quality drops after 50bp?

    Can anyone give me some clue on these questions, it's been puzzling me for a week.

    Thank you
    Attached Files

  • #2
    For question 1, the increase in %A is not too bad. I think it can be explained that for some reads you start running in the polyA tail, which you might want to trim off.

    Comment


    • #3
      Would you know what kit was used for library prep. Some kits adapter are different from the ones that FastQC can detect.

      Comment


      • #4
        Hi nucacidhunter and wdecoster,

        TruSeq RNA Library Prep Kit v2 was used, the link below shows the overpresented adapters sequence

        Discover the magic of the internet at Imgur, a community powered entertainment destination. Lift your spirits with funny jokes, trending memes, entertaining gifs, inspiring stories, viral videos, and so much more from users.



        This is the result from bioanalyzer, the peak lies around 250-300bp. Subtracting the length of the adapter (60bp), the insert should be around 120-130bp. In my opinion, it is less likely for a adatper sequence to be read at 50 cycle .

        Discover the magic of the internet at Imgur, a community powered entertainment destination. Lift your spirits with funny jokes, trending memes, entertaining gifs, inspiring stories, viral videos, and so much more from users.



        Thank you
        Last edited by windnature03; 03-01-2017, 01:51 AM. Reason: correcting image link

        Comment


        • #5
          Originally posted by windnature03 View Post
          Hi all ,

          I am new to bioinformatics. I have encountered some problems with the data analysis(RNA-seq for human, Pair-end 101cycle). The per base sequence content and Kmer content look pretty strange from what i expected.
          Here are my questions (please use the attached file for reference)

          1.There is a sudden rise of %A around 50bp. Would it be adapter contamination? but the adapter content keeps low throughout the whole run.

          2. What is the possible cause of the A/T imbalance?

          3. What is the possible cause of peaks around 40-49bp from the Kmer content?

          4. why the base quality drops after 50bp?
          1 and 2- One possible explanation is 3' bias due to input RNA low quality which has increased polyA representation.

          3- Sequences TATGCCG and CGTATGC are over-represented Kmers with TATGC overlap. You might check to see if they are from a particular highly expressed gene or spike in RNA if it was used.

          4- It does not seem to be library related. You can ask the sequencing centre for an explanation. They can look at other lanes in the same flow cell to see if sequencing reagent or sequencer had any issues.

          Comment


          • #6
            Originally posted by windnature03 View Post

            4. why the base quality drops after 50bp?
            It is possible that inserts in your library are smaller than what you had expected. This generally causes adapter read-through and results in Q-score drops.

            Have you scanned/trimmed this data for presence of adapters? I recommend you try bbduk.sh from BBMap suite for that purpose. There are threads on SeqAnswers that will guide you on how to use bbduk. You can also use bbmerge.sh or bbmap.sh (if you have a reference genome) from the same suite to estimate your library insert size.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM
            • seqadmin
              Techniques and Challenges in Conservation Genomics
              by seqadmin



              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

              Avian Conservation
              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
              03-08-2024, 10:41 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 06:37 PM
            0 responses
            10 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, Yesterday, 06:07 PM
            0 responses
            10 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-22-2024, 10:03 AM
            0 responses
            51 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-21-2024, 07:32 AM
            0 responses
            67 views
            0 likes
            Last Post seqadmin  
            Working...
            X