Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Very high coverage w/Ion Torrent

    Hi, just want to say that I'm a total newb so bear with me!

    I have some bacterial WGS data from Ion Torrent system, and my lab has Geneious software which I'm using to do de novo assembly (among other things). I downloaded the new MIRA plugin for Geneious, and ran it with my reads but it quit on me because it detected very high coverage (>80x). Also the Geneious default assembler is taking muuuuch longer than usual to assemble it. I looked back at the data and this strain had a lot more reads than the other strains, so I've decided to throw out some of the reads.

    I had trimmed the data already, but what I really need is for someone to tell me how to randomly select ~2 million reads to throw out! Can I just delete the first 2 million in the fastq file? For some reason I haven't been able to find any info on how to do this, kinda feel like it's a dumb question haha.

  • #2
    Originally posted by megster View Post
    I had trimmed the data already, but what I really need is for someone to tell me how to randomly select ~2 million reads to throw out! Can I just delete the first 2 million in the fastq file? For some reason I haven't been able to find any info on how to do this, kinda feel like it's a dumb question haha.
    I don't know about Ion Torrent, but some platforms have low quality reads concentrated into one part of the file (for example, there might be a bubble on the Illumina platform) so I would recommend subsampling randomly, or normalizing rather subsampling.

    To subsample randomly or normalize, you can use BBTools:

    reformat.sh in=reads.fq out=sampled.fq samplebases=100000000

    That will sample exactly 100 megabases (plus at most 1 read length) randomly from the entire file. Requires reading the file twice. You can alternately get an approximate sampling like this:

    reformat.sh in=reads.fq out=sampled.fq samplerate=0.25

    ...which will sample 25% of the reads, and only requires reading the file once. Well, either way it's very fast.

    To normalize the reads to some target coverage depth:
    bbnorm.sh in=reads.fq out=normalized.fq target=20 min=2

    ...which will normalize to 20x, and throw away reads with under 2x depth (assuming them to be full errors). This way, high peaks will go down, but areas with low coverage will not be reduced, which is better for assembly. This is a lot slower and requires more memory than sampling, but in my tests, greatly improves Soap and Velvet assemblies over sampling or just using raw data.
    Last edited by Brian Bushnell; 03-07-2014, 09:45 PM.

    Comment


    • #3
      Update your Geneious to r7.1 if you haven't already. The new Geneious de novo assembler handles Ion Torrent reads much, much better than previous versions, and really helps with the homopolymer errors in reducing the number of contigs. I re-ran a dozen plasmid assemblies just today and the results were incredible.

      And, to answer your question, it has a check box at the top to downsample your reads. The quality trimming is nice too because it's an annotation instead of a clipping, so it's easy to re-run with different stringencies.

      Comment


      • #4
        Thanks! I missed that option at the top of the box, I ran it with the MIRA plugin and it worked beautifully. I'll also retry it with the new Geneious assembler program and see if it works any better for me.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        18 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        22 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        17 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        49 views
        0 likes
        Last Post seqadmin  
        Working...
        X