Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • blindtiger454
    Member
    • Oct 2010
    • 30

    Trimming left end (5') of reads??

    Can anyone explain why there is a sequence bias in the first 15bp of Illumina reads? I am pretty sure this is not an adapter leftover. The researchers who did lettuce transcriptome identified the same issue, with results at:

    And we saw the same bias in the first 15bp of our reads also. I think I read somewhere that it's caused by GC content. Even after removing low & medium quality reads, we still see the bias in the first 10-15nt. Can anyone explain?
  • kmcarr
    Senior Member
    • May 2008
    • 1181

    #2
    Short answer, the random hexamer priming is "not so random". Illumina has acknowledged this in one of their FAQs:

    Q482. Why is GC high in the first few bases?
    It is perfectly normal to observe both a slight GC bias and a distinctly non-random base composition over the first 12 bases of the data. This is observed when looking, for instance, at the IVC (intensity versus cycle number) plots which are part of the output of the Pipeline. In genomic DNA sequencing, the base composition is usually quite uniform across all bases; but in mRNA-Seq, the base composition is noticeably uneven across the first 10 to 12 bases. Illumina believes this effect is caused by the "not so random" nature of the random priming process used in the protocol. This may explain why there is a slight overall G/C bias in the starting positions of each read. The first 12 bases probably represent the sites that were being primed by the hexamers used in the random priming process. The first twelve bases in the random priming full-length cDNA sequencing protocol (mRNA-seq) always have IVC plots that look like what has been described. This is because the random priming is not truly random and the first twelve bases (the length of two hexamers) are biased towards sequences that prime more efficiently.This is entirely normal and expected.
    There was also a publication which investigated this:

    Hansen KD, Brenner SE, Dudoit S. Biases in Illumina transcriptome sequencing caused by random hexamer priming. Nucleic Acids Res 2010 Apr.;
    Last edited by kmcarr; 04-12-2013, 12:35 PM. Reason: Hyperlink reference

    Comment

    • blindtiger454
      Member
      • Oct 2010
      • 30

      #3
      Is it recommended to trim these first bases then? It sounds like they are valid mRNA sequence, even though there is a preference to certain reads from the "random" priming. The researchers who did lettuce transcriptome created better assemblies when they trimmed this region. I don't understand why this occurred. Maybe in the process of trimming the reads they removed some poor quality regions in the 5' end??

      Comment

      • kmcarr
        Senior Member
        • May 2008
        • 1181

        #4
        Originally posted by blindtiger454 View Post
        Is it recommended to trim these first bases then? It sounds like they are valid mRNA sequence, even though there is a preference to certain reads from the "random" priming. The researchers who did lettuce transcriptome created better assemblies when they trimmed this region. I don't understand why this occurred. Maybe in the process of trimming the reads they removed some poor quality regions in the 5' end??
        I have carefully studied the UC Davis poster in the past and what strikes me is that the effect of trimming the 5' end appears nearly identical to that of trimming the 3' end so I'm not convinced of their conclusion that it is important to trim the initial 15nt. However I have heard from other researchers that they do present a particular problem for de novo assembly with de bruijn graph assemblers (which is just about all of the most popular short read assemblers, including velvet). The thinking is that the k-mer diversity of the first 15nt is significantly lower than the remainder of the read which seems to cause problems for the assembler.

        If you are doing a de novo assembly why not give it a try both ways and see what your results are?

        On the other hand if I am mapping the reads to a genome (vs de novo) I never trim the 5' ends of RNA-Seq reads and I find they map perfectly well.

        Comment

        • blindtiger454
          Member
          • Oct 2010
          • 30

          #5
          Thanks for the information. Our reads are 55bp, and it is from a tetraploid plant. Given the large amount paralogues and allelic diversity in plants, I want to do minimal trimming for the assembly. It's bad enough having 55bp. The UC Davis folks had 80bp reads. If I trimmed my reads down to 40bp, I'm afraid the assembler will incorrectly assembly paralogues. Sometimes 15 nucleotides is all the difference between 2 closely related transcripts/genes.

          Comment

          • IBseq
            Member
            • Jul 2012
            • 56

            #6
            FASTQ Trimmer tool

            hi guys,
            I'm new to this forum...can anyone tell how do I know homa many bases should I trim with FASTQ Trimmer?Wht is the ideal score and which values do I have to look at?(Q1, median or Q3)

            Thanks!

            Comment

            • carmeyeii
              Senior Member
              • Mar 2011
              • 137

              #7
              bump

              Comment

              • IBseq
                Member
                • Jul 2012
                • 56

                #8
                I sorted that out...if anyone needs info glad to help

                Comment

                • blanco
                  Member
                  • Apr 2012
                  • 28

                  #9
                  Hi folks - hope some of you can help me clarify something about adapter contamination and adapter trimming.

                  I made TruSeq Illumina libraries and sequenced them for 100bp paired end reads.

                  When I view the 'per base sequence content' with fastQC I get something that looks like adapter contamination. I then used cutadapt to remove the adapter sequence. The 'per base sequence content' before and after cutadapt is shown in the attached pdf.

                  Now this is all fine and dandy but what I find a bit confusing is why the adapter sequence is at the beginning of the read. My understanding was that adapter contamination mainly arises when the read is too short so at the end of the read the sequencer starts to sequence the adapter.

                  So why does the adapter appear at the beginning of the read and not at the end?

                  Am I misunderstanding something? I would love to have a clarification of this.

                  Thanks,
                  blanco
                  Attached Files

                  Comment

                  • TonyBrooks
                    Senior Member
                    • Jun 2009
                    • 303

                    #10
                    Originally posted by blanco View Post
                    Hi folks - hope some of you can help me clarify something about adapter contamination and adapter trimming.

                    I made TruSeq Illumina libraries and sequenced them for 100bp paired end reads.

                    When I view the 'per base sequence content' with fastQC I get something that looks like adapter contamination. I then used cutadapt to remove the adapter sequence. The 'per base sequence content' before and after cutadapt is shown in the attached pdf.

                    Now this is all fine and dandy but what I find a bit confusing is why the adapter sequence is at the beginning of the read. My understanding was that adapter contamination mainly arises when the read is too short so at the end of the read the sequencer starts to sequence the adapter.

                    So why does the adapter appear at the beginning of the read and not at the end?


                    Am I misunderstanding something? I would love to have a clarification of this.

                    Thanks,
                    blanco
                    You can get adapter-dimer (where the DNA insert size is effectively 0) meaning that you only sequence adapter (hence it appears at the 5' end). If this is the case, I believe using cutadapt willl just remove those reads from your fastq file (maybe someone can confirm).
                    Those peaks don't look like dimer to me, more the random priming issue. When you get bad adapter, you can actually read the adapter sequence in your %base graph (see attached plot of a run that had 10% adapter dimer).
                    Attached Files

                    Comment

                    • rmred
                      Junior Member
                      • Mar 2013
                      • 1

                      #11
                      I got the same problem to and produce exactly the same ACGT bias for the first 15bp/cycle. And I've asked the representative for Illumina and they mentioned that this is due to the hexamer random priming as mentioned above.

                      Comment

                      • isett
                        Junior Member
                        • Nov 2012
                        • 1

                        #12
                        What if it's WGS and not RNA-Seq. I see the same thing with the NexteraXT kit on the MiSeq. Is it a non-random recognition site for the Tagmentation enzyme?

                        Comment

                        • nareshvasani
                          Member
                          • Apr 2013
                          • 57

                          #13
                          Hi IBseq

                          Originally posted by IBseq View Post
                          I sorted that out...if anyone needs info glad to help
                          I need help. Can you please help me to trim both ends 5' and 3'?

                          Thanks in advance.

                          Comment

                          • Tengfei Liu
                            Junior Member
                            • Aug 2013
                            • 1

                            #14
                            Originally posted by nareshvasani View Post
                            I need help. Can you please help me to trim both ends 5' and 3'?

                            Thanks in advance.

                            You can use cutadapt to trim both 5' and 3' bps. The fastx_clipper can only trim 3' end. When you use cutadapt, you must use cutadapt -g firstly, and use the processed sequence to do cutadapt -a. If you use -g and -a at the same time, it will only cut one end.

                            Comment

                            • Michael.Ante
                              Senior Member
                              • Oct 2011
                              • 127

                              #15
                              Originally posted by nareshvasani View Post
                              I need help. Can you please help me to trim both ends 5' and 3'?

                              Thanks in advance.
                              I always use the fastx_trimmer; you can use the -f and -l options to set the first and the last base to be kept.
                              Last edited by Michael.Ante; 09-25-2013, 07:23 AM. Reason: typo

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                06-02-2026, 10:05 AM
                              • SEQadmin2
                                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                by SEQadmin2


                                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                Introduction

                                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                05-22-2026, 06:42 AM
                              • SEQadmin2
                                Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                by SEQadmin2

                                Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                05-06-2026, 09:04 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, 06-02-2026, 12:03 PM
                              0 responses
                              21 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 11:40 AM
                              0 responses
                              14 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-28-2026, 11:40 AM
                              0 responses
                              29 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-26-2026, 10:12 AM
                              0 responses
                              31 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...