Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Puzzling result from Illumina 150bp PE reads

    Hi all,

    We are using Illumina 150bp paired-end reads to perform de novo assembly for a bacterial genome (~5Mb). Our procedure goes like this:
    1. merge the paired-end reads into a single file
    2. trim the reads using Q20 as the cutoff (i.e., remove all positions following the first low quality base)
    3. discard reads that are <70bp after trimming
    4. separate the reads into two files, one for paired-end reads and one for single-end reads (i.e., one of the PE reads was removed in the previous step)
    5. feed the two files to velvet (v1.1.02), test all possible k-mer values and find one that produces best n50/max

    The initial result looks reasonably good. However, when we tried to simulate the effects of using shorter reads by first trimming all reads to 100bp, we found the assembly actually becomes much better! The n50 increased from ~175kb to ~341kp and the max increased from ~512kb to ~937kb (the total genome size and the number of reads used didn't change much). Blastn confirmed that the improvement comes from merging of contigs.

    I found this really puzzling because I was expecting the opposite result. Can this be due to higher error rates toward the 3' end (even though the quality scores look just fine)?
    Last edited by chkuo; 06-06-2011, 10:54 PM. Reason: typo

  • #2
    Originally posted by chkuo View Post
    I found this really puzzling because I was expecting the opposite result. Can this be due to higher error rates toward the 3' end (even though the quality scores look just fine)?
    Adapters perhaps? Unless you're very strict with the size separation, it's easy to have some fraction of the library with <150 base fragments (especially if you're making a <300bp library). When you sequence these short fragments, you read into the adapters on the 'other' end of the read.

    You're right that longer reads (if they are correct) should help in general.

    Comment


    • #3
      Originally posted by tonybolger View Post
      Adapters perhaps?
      Not sure if this was the problem. Velvet estimated the fragment size to be 325 +/- 41 bp (same for trimmed/untrimmed) and Bioanalyzer result showed average size of ~350bp.

      Comment


      • #4
        Originally posted by chkuo View Post
        Not sure if this was the problem. Velvet estimated the fragment size to be 325 +/- 41 bp (same for trimmed/untrimmed) and Bioanalyzer result showed average size of ~350bp.
        Lies, damned lies and library length statistics

        I've always found a fairly significant number of adapters in our data, even with 600bp libraries, but naturally YMMV.

        Comment


        • #5
          Originally posted by tonybolger View Post
          I've always found a fairly significant number of adapters in our data, even with 600bp libraries, but naturally YMMV.
          Any pointer on a quick and easy way to check for adapters? Many thanks!

          Comment


          • #6
            Originally posted by chkuo View Post
            Any pointer on a quick and easy way to check for adapters? Many thanks!
            Have you tried the FASTX-toolkit: http://hannonlab.cshl.edu/fastx_toolkit/

            Comment


            • #7
              Please keep us updated on whether adaptor removal solved the problem!

              Comment


              • #8
                Will need to talk with the sequencing facility to figure out the adapter sequence to do the trimming. In the mean time, I've tried different length cutoff for the trimming and 100bp performed better than longer ones.

                Comment


                • #9
                  Originally posted by chkuo View Post
                  Any pointer on a quick and easy way to check for adapters? Many thanks!
                  I've created a tool to do all the various pre-processing steps with illumina data aka Trimmomatic - you can find it here.

                  You'll need to make a fasta file of all the adapter sequences though - we're not allowed to distribute them, which is rather annoying. If you're having problems, email me at the link on the trimmomatic page.

                  BTW, you can find links to the adapter sequences in a sticky on the illumina board.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM
                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  18 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  22 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  16 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  46 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X