SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Help: how to handle .bam file lewewoo Bioinformatics 3 11-07-2011 04:14 PM
How to handle this dataset? ips Bioinformatics 6 03-21-2011 04:29 PM
how does bowtie handle n's in reads? ref sequence? nashp Bioinformatics 0 01-10-2011 04:09 PM
Tophat v1.1.4 bug - cannot handle non-numeric versions of Samtools adumitri Bioinformatics 4 01-10-2011 05:18 AM
Software to handle BACs jie Bioinformatics 0 09-15-2009 12:29 PM

Reply
 
Thread Tools
Old 05-24-2012, 05:23 PM   #1
EBER
Member
 
Location: New Haven, CT

Join Date: May 2012
Posts: 10
Default TopHat doesnt handle 450million reads

Hello I am analyzing a couple of paired-end datasets (75bp), each containing about 450 million reads.
TopHat 1.4.1 does well, as long as I have the "--non-novel-juncs" flag.
TopHat 2 however, FAILs during the "merge all bam files" step, right at the end.
I am using a 12 core server with 64GB RAM memory.

I have been suggested to partition each dataset and run TopHat with these bits and then merge the accepted_hits.bam files of each, before Cufflinks.

I have two questions:
1) Will running TopHat on my dataset partitioned compromise the quality of the alignment and therefore of the transcript assemblage done by Cufflinks?
2) What's the best tool to merge these accepted_hits.bam files? Will the Picard tools do this appropriately? Any considerations when doing this?

Many thanks.
EBER
EBER is offline   Reply With Quote
Old 05-25-2012, 04:14 AM   #2
pbluescript
Senior Member
 
Location: Boston

Join Date: Nov 2009
Posts: 224
Default

Quote:
Originally Posted by EBER View Post
Hello I am analyzing a couple of paired-end datasets (75bp), each containing about 450 million reads.
TopHat 1.4.1 does well, as long as I have the "--non-novel-juncs" flag.
TopHat 2 however, FAILs during the "merge all bam files" step, right at the end.
I am using a 12 core server with 64GB RAM memory.

I have been suggested to partition each dataset and run TopHat with these bits and then merge the accepted_hits.bam files of each, before Cufflinks.

I have two questions:
1) Will running TopHat on my dataset partitioned compromise the quality of the alignment and therefore of the transcript assemblage done by Cufflinks?
2) What's the best tool to merge these accepted_hits.bam files? Will the Picard tools do this appropriately? Any considerations when doing this?

Many thanks.
EBER
I would recommend you try STAR. It works much better for large datasets like this in my experience, and it is MUCH faster than any Tophat version. It requires a good amount of RAM, but you have enough.
http://gingeraslab.cshl.edu/STAR/

If you want to stick with Tophat, you could merge them in a number of ways. Picard works, so would bamtools merge, or even converting them to sam and using the cat command in UNIX.
pbluescript is offline   Reply With Quote
Old 05-25-2012, 05:09 AM   #3
EBER
Member
 
Location: New Haven, CT

Join Date: May 2012
Posts: 10
Default

Thanks for you answer!
EBER is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:57 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO