Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • chkuo
    Member
    • May 2010
    • 11

    Puzzling result from Illumina 150bp PE reads

    Hi all,

    We are using Illumina 150bp paired-end reads to perform de novo assembly for a bacterial genome (~5Mb). Our procedure goes like this:
    1. merge the paired-end reads into a single file
    2. trim the reads using Q20 as the cutoff (i.e., remove all positions following the first low quality base)
    3. discard reads that are <70bp after trimming
    4. separate the reads into two files, one for paired-end reads and one for single-end reads (i.e., one of the PE reads was removed in the previous step)
    5. feed the two files to velvet (v1.1.02), test all possible k-mer values and find one that produces best n50/max

    The initial result looks reasonably good. However, when we tried to simulate the effects of using shorter reads by first trimming all reads to 100bp, we found the assembly actually becomes much better! The n50 increased from ~175kb to ~341kp and the max increased from ~512kb to ~937kb (the total genome size and the number of reads used didn't change much). Blastn confirmed that the improvement comes from merging of contigs.

    I found this really puzzling because I was expecting the opposite result. Can this be due to higher error rates toward the 3' end (even though the quality scores look just fine)?
    Last edited by chkuo; 06-06-2011, 10:54 PM. Reason: typo
  • tonybolger
    Senior Member
    • Feb 2010
    • 156

    #2
    Originally posted by chkuo View Post
    I found this really puzzling because I was expecting the opposite result. Can this be due to higher error rates toward the 3' end (even though the quality scores look just fine)?
    Adapters perhaps? Unless you're very strict with the size separation, it's easy to have some fraction of the library with <150 base fragments (especially if you're making a <300bp library). When you sequence these short fragments, you read into the adapters on the 'other' end of the read.

    You're right that longer reads (if they are correct) should help in general.

    Comment

    • chkuo
      Member
      • May 2010
      • 11

      #3
      Originally posted by tonybolger View Post
      Adapters perhaps?
      Not sure if this was the problem. Velvet estimated the fragment size to be 325 +/- 41 bp (same for trimmed/untrimmed) and Bioanalyzer result showed average size of ~350bp.

      Comment

      • tonybolger
        Senior Member
        • Feb 2010
        • 156

        #4
        Originally posted by chkuo View Post
        Not sure if this was the problem. Velvet estimated the fragment size to be 325 +/- 41 bp (same for trimmed/untrimmed) and Bioanalyzer result showed average size of ~350bp.
        Lies, damned lies and library length statistics

        I've always found a fairly significant number of adapters in our data, even with 600bp libraries, but naturally YMMV.

        Comment

        • chkuo
          Member
          • May 2010
          • 11

          #5
          Originally posted by tonybolger View Post
          I've always found a fairly significant number of adapters in our data, even with 600bp libraries, but naturally YMMV.
          Any pointer on a quick and easy way to check for adapters? Many thanks!

          Comment

          • GenoMax
            Senior Member
            • Feb 2008
            • 7142

            #6
            Originally posted by chkuo View Post
            Any pointer on a quick and easy way to check for adapters? Many thanks!
            Have you tried the FASTX-toolkit: http://hannonlab.cshl.edu/fastx_toolkit/

            Comment

            • flxlex
              Moderator
              • Nov 2008
              • 412

              #7
              Please keep us updated on whether adaptor removal solved the problem!

              Comment

              • chkuo
                Member
                • May 2010
                • 11

                #8
                Will need to talk with the sequencing facility to figure out the adapter sequence to do the trimming. In the mean time, I've tried different length cutoff for the trimming and 100bp performed better than longer ones.

                Comment

                • tonybolger
                  Senior Member
                  • Feb 2010
                  • 156

                  #9
                  Originally posted by chkuo View Post
                  Any pointer on a quick and easy way to check for adapters? Many thanks!
                  I've created a tool to do all the various pre-processing steps with illumina data aka Trimmomatic - you can find it here.

                  You'll need to make a fasta file of all the adapter sequences though - we're not allowed to distribute them, which is rather annoying. If you're having problems, email me at the link on the trimmomatic page.

                  BTW, you can find links to the adapter sequences in a sticky on the illumina board.

                  Comment

                  Latest Articles

                  Collapse

                  • SEQadmin2
                    From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                    by SEQadmin2


                    Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                    The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                    ...
                    06-02-2026, 10:05 AM
                  • SEQadmin2
                    Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                    by SEQadmin2


                    With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                    Introduction

                    Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                    05-22-2026, 06:42 AM
                  • SEQadmin2
                    Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                    by SEQadmin2

                    Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                    Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                    05-06-2026, 09:04 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by SEQadmin2, 06-02-2026, 12:03 PM
                  0 responses
                  21 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-02-2026, 11:40 AM
                  0 responses
                  14 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 05-28-2026, 11:40 AM
                  0 responses
                  29 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 05-26-2026, 10:12 AM
                  0 responses
                  31 views
                  0 reactions
                  Last Post SEQadmin2  
                  Working...