View Single Post
Old 05-24-2012, 05:23 PM   #1
Location: New Haven, CT

Join Date: May 2012
Posts: 10
Default TopHat doesnt handle 450million reads

Hello I am analyzing a couple of paired-end datasets (75bp), each containing about 450 million reads.
TopHat 1.4.1 does well, as long as I have the "--non-novel-juncs" flag.
TopHat 2 however, FAILs during the "merge all bam files" step, right at the end.
I am using a 12 core server with 64GB RAM memory.

I have been suggested to partition each dataset and run TopHat with these bits and then merge the accepted_hits.bam files of each, before Cufflinks.

I have two questions:
1) Will running TopHat on my dataset partitioned compromise the quality of the alignment and therefore of the transcript assemblage done by Cufflinks?
2) What's the best tool to merge these accepted_hits.bam files? Will the Picard tools do this appropriately? Any considerations when doing this?

Many thanks.
EBER is offline   Reply With Quote