View Single Post
Old 05-25-2012, 04:14 AM   #2
pbluescript
Senior Member
 
Location: Boston

Join Date: Nov 2009
Posts: 224
Default

Quote:
Originally Posted by EBER View Post
Hello I am analyzing a couple of paired-end datasets (75bp), each containing about 450 million reads.
TopHat 1.4.1 does well, as long as I have the "--non-novel-juncs" flag.
TopHat 2 however, FAILs during the "merge all bam files" step, right at the end.
I am using a 12 core server with 64GB RAM memory.

I have been suggested to partition each dataset and run TopHat with these bits and then merge the accepted_hits.bam files of each, before Cufflinks.

I have two questions:
1) Will running TopHat on my dataset partitioned compromise the quality of the alignment and therefore of the transcript assemblage done by Cufflinks?
2) What's the best tool to merge these accepted_hits.bam files? Will the Picard tools do this appropriately? Any considerations when doing this?

Many thanks.
EBER
I would recommend you try STAR. It works much better for large datasets like this in my experience, and it is MUCH faster than any Tophat version. It requires a good amount of RAM, but you have enough.
http://gingeraslab.cshl.edu/STAR/

If you want to stick with Tophat, you could merge them in a number of ways. Picard works, so would bamtools merge, or even converting them to sam and using the cat command in UNIX.
pbluescript is offline   Reply With Quote