Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Strange fastqc per base sequence content 3'end

    Hi

    I have been processing some 150bp paired end Nextera XT reads from viral cDNA. The first 20 or so bases at the 5'end are due to Nextera which I just trim off, but I also have an unusual base content at the 3'end of all sequences, consistently in all samples. I have attached the fastqc images for the forward reads of one sample, which is representative of all my samples. Does anyone know what this might be?
    Attached Files

  • #2
    Are the inserts smaller than the read lengths (150 cycles in this case)? If this is a paired-end experiment then you can easily see if the two reads overlap to a large degree by using a tool such as FLASH

    Comment


    • #3
      Thanks for your reply . The average size of my libraries is usually >350bp, which I assumed was ok for 150bp PE reads? Looking in Tablet there is a large degree of overlap between paired reads- is this indicative of the library being too short? I am mapping the reads to a reference sequence and looking for SNPs so I am not worried about overlap in the read pairs. How would this affect the 3'end of my reads?

      Comment


      • #4
        Originally posted by kirstyn View Post
        Thanks for your reply . The average size of my libraries is usually >350bp, which I assumed was ok for 150bp PE reads? Looking in Tablet there is a large degree of overlap between paired reads- is this indicative of the library being too short? I am mapping the reads to a reference sequence and looking for SNPs so I am not worried about overlap in the read pairs. How would this affect the 3'end of my reads?
        You can easily determine how big the inserts are by looking at the extent of overlap (it sounds like a fraction of your library is no where near the expected 350 bp size). If some of the inserts are smaller than 150 bp then you will start reading into the adapter at the other end and beyond. If these reads are not aligning well on the 3'-end then you may need to trim them.

        Comment


        • #5
          Originally posted by kirstyn View Post
          The first 20 or so bases at the 5'end are due to Nextera which I just trim off
          I know this wasn't the question you were asking but it's not really necessary to trim off those 5' bases. The sequence is not incorrect. It simply represents the slight bias that the Nextera tagmentase has for certain sequence composition.

          Comment


          • #6
            Originally posted by kmcarr View Post
            I know this wasn't the question you were asking but it's not really necessary to trim off those 5' bases. The sequence is not incorrect. It simply represents the slight bias that the Nextera tagmentase has for certain sequence composition.
            Yes thanks for that comment. I wasn't quite sure if I should trim off the 5' bases, especially since I am mapping my reads but I had read about both random hexamer and nextera transposome bias so I decided to trim! I think I will try it without too!

            Comment


            • #7
              I observe the same 3' characteristic as previously reported in this thread. A representative plot is provided. The plot shown is a DNA library with insert size >350bp. However, we see this in all our FastQC plots. It is independent of library type, instrument (NextSeq or HiSeq), or read length (50, 75, or 100 bases). Therefore, I'm not inclined to see this as a library prep/chemistry issue. Has anyone also encountered this characteristic and identified a reason? Thank you in advance for comments.
              Attached Files

              Comment


              • #8
                It is either library or possibly demultiplexing issue. Could you post plots from other runs with similar pattern(s) with the library electropherogram.

                Comment


                • #9
                  Did you see this pattern also with the "--nogroup" option?
                  The bases are binned without that option; which let the distribution may look smoother than it is. The last base, shown in your figure, is just a single bin.

                  Comment


                  • #10
                    Originally posted by MU Core View Post
                    I observe the same 3' characteristic as previously reported in this thread. A representative plot is provided. The plot shown is a DNA library with insert size >350bp. However, we see this in all our FastQC plots. It is independent of library type, instrument (NextSeq or HiSeq), or read length (50, 75, or 100 bases). Therefore, I'm not inclined to see this as a library prep/chemistry issue. Has anyone also encountered this characteristic and identified a reason? Thank you in advance for comments.
                    Are you using Nextera for fragmentation? That's been identified as the cause of severe bias on the 5' (left) end. However, yours looks very sedate, so if I were to guess, I would say this is NOT a Nextera library. Have you looked into the empirical error rates (from mapping) of the left end to see if there is a corresponding increase? That will indicate whether this is bias, or an actual base-calling/non-genomic sequence issue.

                    The 3' end is just showing the normal Illumina biased/low-quality last base due to a lack of a subsequent base call needed for calibration; I always trim the last base in 76/101/151/etc. runs.

                    Comment


                    • #11
                      Here are a couple more examples. Sample M is that of a DNA PCR-free library sequenced on a HiSeq. Sample W is a TruSeq mRNA library sequenced on a NextSeq.

                      Brian and Michael's suggestions both offer an explanation that I think explains these observations. It would also suggest that trimming the reads prior to the FastQC report being generated that the bias in the 3'end will removed. I'll give this a try and share the results.

                      Thank you again for your comments.
                      Attached Files

                      Comment


                      • #12
                        Sample W looks very typical of Nextera. Trimming the 3' end is not recommended in these cases because the bases are correct. It will not change the bias, just hide the bias so that your FastQC report looks better.

                        Comment


                        • #13
                          Brian, you were correct though that these libraries were not Nextera.

                          Comment


                          • #14
                            Oh, that's odd, then. There are some other things like random-hexamer-primed libraries that also have similar issues. I think it would be worthwhile generating an error-rate histogram to verify whether the mismatch rate is increased in that region. You can do so with BBMap like this:

                            bbmap.sh in=reads.fq ref=ref.fa mhist=mhist.txt bhist=bhist.txt whist=qhist.txt

                            If the error rate is not increased, I recommend against trimming.

                            Comment


                            • #15
                              I have seen this pattern in low diversity amplicons only and their FastQC pattern matches the Data By Cycle (%Base) in SAV of run.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 03-27-2024, 06:37 PM
                              0 responses
                              13 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-27-2024, 06:07 PM
                              0 responses
                              12 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              53 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              69 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X