Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • kirstyn
    Junior Member
    • May 2012
    • 8

    Strange fastqc per base sequence content 3'end

    Hi

    I have been processing some 150bp paired end Nextera XT reads from viral cDNA. The first 20 or so bases at the 5'end are due to Nextera which I just trim off, but I also have an unusual base content at the 3'end of all sequences, consistently in all samples. I have attached the fastqc images for the forward reads of one sample, which is representative of all my samples. Does anyone know what this might be?
    Attached Files
  • GenoMax
    Senior Member
    • Feb 2008
    • 7142

    #2
    Are the inserts smaller than the read lengths (150 cycles in this case)? If this is a paired-end experiment then you can easily see if the two reads overlap to a large degree by using a tool such as FLASH

    Comment

    • kirstyn
      Junior Member
      • May 2012
      • 8

      #3
      Thanks for your reply . The average size of my libraries is usually >350bp, which I assumed was ok for 150bp PE reads? Looking in Tablet there is a large degree of overlap between paired reads- is this indicative of the library being too short? I am mapping the reads to a reference sequence and looking for SNPs so I am not worried about overlap in the read pairs. How would this affect the 3'end of my reads?

      Comment

      • GenoMax
        Senior Member
        • Feb 2008
        • 7142

        #4
        Originally posted by kirstyn View Post
        Thanks for your reply . The average size of my libraries is usually >350bp, which I assumed was ok for 150bp PE reads? Looking in Tablet there is a large degree of overlap between paired reads- is this indicative of the library being too short? I am mapping the reads to a reference sequence and looking for SNPs so I am not worried about overlap in the read pairs. How would this affect the 3'end of my reads?
        You can easily determine how big the inserts are by looking at the extent of overlap (it sounds like a fraction of your library is no where near the expected 350 bp size). If some of the inserts are smaller than 150 bp then you will start reading into the adapter at the other end and beyond. If these reads are not aligning well on the 3'-end then you may need to trim them.

        Comment

        • kmcarr
          Senior Member
          • May 2008
          • 1181

          #5
          Originally posted by kirstyn View Post
          The first 20 or so bases at the 5'end are due to Nextera which I just trim off
          I know this wasn't the question you were asking but it's not really necessary to trim off those 5' bases. The sequence is not incorrect. It simply represents the slight bias that the Nextera tagmentase has for certain sequence composition.

          Comment

          • kirstyn
            Junior Member
            • May 2012
            • 8

            #6
            Originally posted by kmcarr View Post
            I know this wasn't the question you were asking but it's not really necessary to trim off those 5' bases. The sequence is not incorrect. It simply represents the slight bias that the Nextera tagmentase has for certain sequence composition.
            Yes thanks for that comment. I wasn't quite sure if I should trim off the 5' bases, especially since I am mapping my reads but I had read about both random hexamer and nextera transposome bias so I decided to trim! I think I will try it without too!

            Comment

            • MU Core
              Member
              • Apr 2008
              • 60

              #7
              I observe the same 3' characteristic as previously reported in this thread. A representative plot is provided. The plot shown is a DNA library with insert size >350bp. However, we see this in all our FastQC plots. It is independent of library type, instrument (NextSeq or HiSeq), or read length (50, 75, or 100 bases). Therefore, I'm not inclined to see this as a library prep/chemistry issue. Has anyone also encountered this characteristic and identified a reason? Thank you in advance for comments.
              Attached Files

              Comment

              • nucacidhunter
                Jafar Jabbari
                • Jan 2013
                • 1250

                #8
                It is either library or possibly demultiplexing issue. Could you post plots from other runs with similar pattern(s) with the library electropherogram.

                Comment

                • Michael.Ante
                  Senior Member
                  • Oct 2011
                  • 127

                  #9
                  Did you see this pattern also with the "--nogroup" option?
                  The bases are binned without that option; which let the distribution may look smoother than it is. The last base, shown in your figure, is just a single bin.

                  Comment

                  • Brian Bushnell
                    Super Moderator
                    • Jan 2014
                    • 2709

                    #10
                    Originally posted by MU Core View Post
                    I observe the same 3' characteristic as previously reported in this thread. A representative plot is provided. The plot shown is a DNA library with insert size >350bp. However, we see this in all our FastQC plots. It is independent of library type, instrument (NextSeq or HiSeq), or read length (50, 75, or 100 bases). Therefore, I'm not inclined to see this as a library prep/chemistry issue. Has anyone also encountered this characteristic and identified a reason? Thank you in advance for comments.
                    Are you using Nextera for fragmentation? That's been identified as the cause of severe bias on the 5' (left) end. However, yours looks very sedate, so if I were to guess, I would say this is NOT a Nextera library. Have you looked into the empirical error rates (from mapping) of the left end to see if there is a corresponding increase? That will indicate whether this is bias, or an actual base-calling/non-genomic sequence issue.

                    The 3' end is just showing the normal Illumina biased/low-quality last base due to a lack of a subsequent base call needed for calibration; I always trim the last base in 76/101/151/etc. runs.

                    Comment

                    • MU Core
                      Member
                      • Apr 2008
                      • 60

                      #11
                      Here are a couple more examples. Sample M is that of a DNA PCR-free library sequenced on a HiSeq. Sample W is a TruSeq mRNA library sequenced on a NextSeq.

                      Brian and Michael's suggestions both offer an explanation that I think explains these observations. It would also suggest that trimming the reads prior to the FastQC report being generated that the bias in the 3'end will removed. I'll give this a try and share the results.

                      Thank you again for your comments.
                      Attached Files

                      Comment

                      • Brian Bushnell
                        Super Moderator
                        • Jan 2014
                        • 2709

                        #12
                        Sample W looks very typical of Nextera. Trimming the 3' end is not recommended in these cases because the bases are correct. It will not change the bias, just hide the bias so that your FastQC report looks better.

                        Comment

                        • MU Core
                          Member
                          • Apr 2008
                          • 60

                          #13
                          Brian, you were correct though that these libraries were not Nextera.

                          Comment

                          • Brian Bushnell
                            Super Moderator
                            • Jan 2014
                            • 2709

                            #14
                            Oh, that's odd, then. There are some other things like random-hexamer-primed libraries that also have similar issues. I think it would be worthwhile generating an error-rate histogram to verify whether the mismatch rate is increased in that region. You can do so with BBMap like this:

                            bbmap.sh in=reads.fq ref=ref.fa mhist=mhist.txt bhist=bhist.txt whist=qhist.txt

                            If the error rate is not increased, I recommend against trimming.

                            Comment

                            • nucacidhunter
                              Jafar Jabbari
                              • Jan 2013
                              • 1250

                              #15
                              I have seen this pattern in low diversity amplicons only and their FastQC pattern matches the Data By Cycle (%Base) in SAV of run.

                              Comment

                              Latest Articles

                              Collapse

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, 06-05-2026, 10:09 AM
                              0 responses
                              14 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-04-2026, 08:59 AM
                              0 responses
                              28 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 12:03 PM
                              0 responses
                              33 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 11:40 AM
                              0 responses
                              23 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...