Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Base distribution plot split issue

    Hi, all. I am a freshman using illumina machines.
    We have a Hiseq2500 and did PE125 runs. But the results to me are curious.
    Please see the images uploaded. The base distribution plots are splited at the very beginning of read 1 and the late cycles of read 2(technical saying read 3). Are these causing by the library themselves or our operation faults or bad reagent lots? How to solve these?
    Attached Files
    Last edited by acdan; 11-18-2015, 05:24 PM.

  • #2
    Edit your post and go to "manage attachments" - the images did not show up.

    But anyway - base composition divergence near the end can mean a couple things, either adapter contamination or a problem with the sequencer. Anomalies at the beginning of the read are often due to nonrandom shearing.

    I suggest you try adapter-trimming and/or examining the insert size distribution to see if this is caused by adapter sequence.

    Comment


    • #3
      Thank you very much! I am not yet familiour with the tools.
      The images are showing now.

      Comment


      • #4
        If I had a penny everytime someone asked this question, I'd be rich.

        This pattern is caused by a not so random hexamer priming.
        It is normal and expected.
        The first bases are biased towards sequences that prime more efficiently.

        Do no trim off the first 13 bases.
        You will just be cutting off good quality bases.

        Every single one of my runs for the past 4 years has had this bias.

        You can find more details about this bias in this widely cited article.
        Note that they do propose a correction that no one that I know uses.

        Biases in Illumina transcriptome sequencing caused by random hexamer priming


        This bias should really be documented more clearly by Illumina, to avoid people wasting too much time searching for the cause of a very well-known bias.

        Comment


        • #5
          You'll be rich dear , thank you for the referrence.
          And in you experience, is the former image showing the plots slightly split at the late cycles of read 2 usual?

          Comment


          • #6
            Short answer yes. It will be library/sample dependent.

            In addition to random priming the nextera transposes also shows a similar bias.
            Last edited by GenoMax; 11-18-2015, 06:30 PM.

            Comment


            • #7
              There is one thing I still confused. Why random 6-mers priming generates 13 bases bias?

              Comment


              • #8
                It's a good question, and no one seems to have been able to come up with an entirely satisfactory answer.
                Here is the answer from the Illumina FAQ, stating that twelve is the length of "the length of two hexamers", which is not very helpful, since I can't see how there could there be 2 hexamers binding.
                This document is no longer available on Illumina's website.
                Luckily, the FAQ was archived on an older seqanswers thread.

                Q482. Why is GC high in the first few bases?
                It is perfectly normal to observe both a slight GC bias and a distinctly non-random base composition over the first 12 bases of the data. This is observed when looking, for instance, at the IVC (intensity versus cycle number) plots which are part of the output of the Pipeline. In genomic DNA sequencing, the base composition is usually quite uniform across all bases; but in mRNA-Seq, the base composition is noticeably uneven across the first 10 to 12 bases. Illumina believes this effect is caused by the "not so random" nature of the random priming process used in the protocol. This may explain why there is a slight overall G/C bias in the starting positions of each read. The first 12 bases probably represent the sites that were being primed by the hexamers used in the random priming process. The first twelve bases in the random priming full-length cDNA sequencing protocol (mRNA-seq) always have IVC plots that look like what has been described. This is because the random priming is not truly random and the first twelve bases (the length of two hexamers) are biased towards sequences that prime more efficiently.This is entirely normal and expected.
                Application of sequencing to RNA analysis (RNA-Seq, whole transcriptome, SAGE, expression analysis, novel organism mining, splice variants)
                The Hansen paper makes an attempt at answering your question more directly.
                It is surprising that the pattern extends well beyond the hexamer primer, out to 13 bases. The length of the pattern could potentially be explained by a strong bias in the first 6 bases of the reads, coupled with dependencies between adjacent nucleotides in the transcriptome. Two observations contradict this explanation. First, the pattern in the nucleotide frequencies ends immediately upstream of the first base of the reads, indicating that the dependence between adjacent nucleotides in the transcriptome is weak (Figure 1a). Note that it is possible for a pattern to extend upstream of the reads, as seen with DNase I fragmentation (Figure 1c). Second, dinucleotide transition probabilities appear biased throughout all 13 initial bases (Supplementary Figure S5). The fact that the 5′ bias extends over 13 bases could be explained by the sequence specificity of the polymerase. Alternately, due to the end repair performed as part of the standard DNA sequencing protocol, the first sequenced base of a read may not be where the primer binds.
                The author of this blog also makes a more amateurish attempt to explain the bias more clearly, but abandons his efforts in frustration.


                So, none of the explanations are entirely satisfactory.
                What is certain is that the overall results remain valid, despite this bias.
                Otherwise, one would have to question the entire body of literature on RNA-Seq.
                Trimming the bases is also clearly the wrong approach.

                I suppose there might be material for another paper for anyone can come up with a sound demonstration for the reason that the bias extends all the way to the first 12 (or 13) bases.
                Last edited by blancha; 11-19-2015, 06:57 AM.

                Comment


                • #9
                  blancha's post need to be made a sticky, and every time there is a new post with "RNA-Seq bias" anywhere in the text some one can simply post a reply with a link to it.

                  Comment


                  • #10
                    Originally posted by kmcarr View Post
                    blancha's post need to be made a sticky, and every time there is a new post with "RNA-Seq bias" anywhere in the text some one can simply post a reply with a link to it.
                    Done - "Illumina/solexa" sub-forum.

                    @blancha: I picked a title to describe your post when I made it sticky. If you want alternate wording then send me a PM (or you may be able to edit it yourself).
                    Last edited by GenoMax; 11-19-2015, 05:51 AM.

                    Comment


                    • #11
                      Is there anyone counted the ratio of hexa sequences distribution in the transcriptome?
                      In my consideration, such as AAAAA will have a much higher distriobution, which exhausts hexamer primer "TTTTT" the fastest, but blocked by nearby primed "TTTTTT". So such cDNA will be much smaller than others and will get lost in the following steps.

                      Comment


                      • #12
                        Hi, just to be sure:

                        It is NOT necessary to clip the first 13 bases when doing de novo transcriptome assembly neither?

                        Comment


                        • #13
                          Originally posted by Andres_Ribone View Post
                          Hi, just to be sure:

                          It is NOT necessary to clip the first 13 bases when doing de novo transcriptome assembly neither?
                          Very likely no. See this blog post for more.

                          Comment


                          • #14
                            Hi!, thanks for the quick answer!
                            I checked the link, but it doesn't states explicitly if clipping is necessary or not for de novo transcriptome assembly; nor could I find any paper that states it.

                            Right now I'm checking clipping and not clipping on my data, but of course it wouldn't be enough for a good answer.

                            Have a nice day!

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Strategies for Sequencing Challenging Samples
                              by seqadmin


                              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                              03-22-2024, 06:39 AM
                            • seqadmin
                              Techniques and Challenges in Conservation Genomics
                              by seqadmin



                              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                              Avian Conservation
                              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                              03-08-2024, 10:41 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, Yesterday, 06:37 PM
                            0 responses
                            8 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, Yesterday, 06:07 PM
                            0 responses
                            8 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 03-22-2024, 10:03 AM
                            0 responses
                            49 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 03-21-2024, 07:32 AM
                            0 responses
                            66 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X