SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Ion torrent: bacterial WGS coverage roshanbernard Ion Torrent 7 07-01-2013 04:24 PM
Ion Torrent $1000 Genome!? Benchtop Ion Proton Sequencer aeonsim Ion Torrent 88 10-28-2012 04:50 AM
ion torrent herrroaa Introductions 5 07-25-2011 05:36 AM
Ion Torrent in Cambridge! asr Ion Torrent 6 03-17-2011 11:32 AM
CLC bio partners with Ion Torrent to expand high-throughput sequencing support CLC bio Vendor Forum 0 04-16-2010 12:27 PM

Reply
 
Thread Tools
Old 03-07-2014, 03:26 PM   #1
megster
Junior Member
 
Location: California

Join Date: Mar 2014
Posts: 2
Default Very high coverage w/Ion Torrent

Hi, just want to say that I'm a total newb so bear with me!

I have some bacterial WGS data from Ion Torrent system, and my lab has Geneious software which I'm using to do de novo assembly (among other things). I downloaded the new MIRA plugin for Geneious, and ran it with my reads but it quit on me because it detected very high coverage (>80x). Also the Geneious default assembler is taking muuuuch longer than usual to assemble it. I looked back at the data and this strain had a lot more reads than the other strains, so I've decided to throw out some of the reads.

I had trimmed the data already, but what I really need is for someone to tell me how to randomly select ~2 million reads to throw out! Can I just delete the first 2 million in the fastq file? For some reason I haven't been able to find any info on how to do this, kinda feel like it's a dumb question haha.
megster is offline   Reply With Quote
Old 03-07-2014, 03:50 PM   #2
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Quote:
Originally Posted by megster View Post
I had trimmed the data already, but what I really need is for someone to tell me how to randomly select ~2 million reads to throw out! Can I just delete the first 2 million in the fastq file? For some reason I haven't been able to find any info on how to do this, kinda feel like it's a dumb question haha.
I don't know about Ion Torrent, but some platforms have low quality reads concentrated into one part of the file (for example, there might be a bubble on the Illumina platform) so I would recommend subsampling randomly, or normalizing rather subsampling.

To subsample randomly or normalize, you can use BBTools:

reformat.sh in=reads.fq out=sampled.fq samplebases=100000000

That will sample exactly 100 megabases (plus at most 1 read length) randomly from the entire file. Requires reading the file twice. You can alternately get an approximate sampling like this:

reformat.sh in=reads.fq out=sampled.fq samplerate=0.25

...which will sample 25% of the reads, and only requires reading the file once. Well, either way it's very fast.

To normalize the reads to some target coverage depth:
bbnorm.sh in=reads.fq out=normalized.fq target=20 min=2

...which will normalize to 20x, and throw away reads with under 2x depth (assuming them to be full errors). This way, high peaks will go down, but areas with low coverage will not be reduced, which is better for assembly. This is a lot slower and requires more memory than sampling, but in my tests, greatly improves Soap and Velvet assemblies over sampling or just using raw data.

Last edited by Brian Bushnell; 03-07-2014 at 08:45 PM.
Brian Bushnell is offline   Reply With Quote
Old 03-07-2014, 05:24 PM   #3
andylemire
Member
 
Location: Virginia

Join Date: Jan 2014
Posts: 15
Default

Update your Geneious to r7.1 if you haven't already. The new Geneious de novo assembler handles Ion Torrent reads much, much better than previous versions, and really helps with the homopolymer errors in reducing the number of contigs. I re-ran a dozen plasmid assemblies just today and the results were incredible.

And, to answer your question, it has a check box at the top to downsample your reads. The quality trimming is nice too because it's an annotation instead of a clipping, so it's easy to re-run with different stringencies.
andylemire is offline   Reply With Quote
Old 03-11-2014, 03:43 PM   #4
megster
Junior Member
 
Location: California

Join Date: Mar 2014
Posts: 2
Default

Thanks! I missed that option at the top of the box, I ran it with the MIRA plugin and it worked beautifully. I'll also retry it with the new Geneious assembler program and see if it works any better for me.
megster is offline   Reply With Quote
Reply

Tags
coverage, geneious, ion torrent, mira

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:25 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO