SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
tophat Error running running 'prep_reads' victoryhe Bioinformatics 2 10-17-2011 04:53 AM
problem running TopHat anecsulea Bioinformatics 1 03-28-2011 03:47 AM
Depositing half a billion beads pmiguel SOLiD 4 02-22-2010 03:15 AM
Running TopHat with 32bp or 36bp reads statsteam Bioinformatics 1 11-20-2009 06:55 PM
TopHat running error pfranchini Bioinformatics 2 08-10-2009 06:46 AM

Reply
 
Thread Tools
Old 09-24-2012, 03:16 PM   #1
shurjo
Senior Member
 
Location: Rockville, MD

Join Date: Jan 2009
Posts: 126
Default running TopHat with ~1.2 billion reads

Any tips for the quickest way to get TopHat to complete alignment of a very large dataset (~1.2 billion 75bp Single end HiSeq reads)? I considered splitting it into 10 but then realized this would risk treating identical reads in the ten subsets as unique in the final file. I'm looking to keep anything that maps up to 10 locations, with up to three mismatches and three bp indels:
Code:
 -p 16 -g 10 --read-mismatches 3 --read-gap-length 3 --read-edit-dist 3
.

Thanks,

Shurjo
shurjo is offline   Reply With Quote
Old 09-24-2012, 03:54 PM   #2
ECO
--Site Admin--
 
Location: SF Bay Area, CA, USA

Join Date: Oct 2007
Posts: 1,358
Default

Split it, and dedup it later?
ECO is offline   Reply With Quote
Old 09-24-2012, 04:15 PM   #3
pbluescript
Senior Member
 
Location: Boston

Join Date: Nov 2009
Posts: 224
Default

Do you have to use Tophat?
STAR will do split read alignment in a small fraction of the time that it takes Tophat.
http://gingeraslab.cshl.edu/STAR/
pbluescript is offline   Reply With Quote
Old 09-24-2012, 05:48 PM   #4
severin
Genome Informatics Facility
 
Location: Iowa @isugif

Join Date: Sep 2009
Posts: 105
Default GPU option

SOAP3-dp is very fast as long as you are ok with only up to 4 mismatches.
severin is offline   Reply With Quote
Old 09-24-2012, 06:52 PM   #5
shurjo
Senior Member
 
Location: Rockville, MD

Join Date: Jan 2009
Posts: 126
Default

All of these sound very logical. I'm going to step out of my TopHat-based comfort zone and try STAR and SOAP3-dp.

ECO, nice to see the forum doing so well. I've been away from this world for a bit but am coming back soon.

Thanks everyone,

Shurjo
shurjo is offline   Reply With Quote
Old 09-24-2012, 07:25 PM   #6
ECO
--Site Admin--
 
Location: SF Bay Area, CA, USA

Join Date: Oct 2007
Posts: 1,358
Default

Shurjo: nice to see you back here. Always wished I'd been able to follow our collaboration through. Good luck in your current work.
ECO is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:17 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO