SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
tophat problem: no accepted_hits.bam generated RNAer Bioinformatics 9 08-30-2013 09:41 AM
Split up a Bam file with chrom-bed dmacmillan Bioinformatics 2 02-02-2012 02:25 PM
tophat problem: no accepted_hits.bam generated RNAer Bioinformatics 0 07-19-2011 12:18 PM
TOPHAT EMPTY accepted_hits.bam ISSUE waterboy Bioinformatics 1 11-16-2010 08:48 AM
tophat accepted_hits.sam file Mark Bioinformatics 1 09-23-2009 03:33 PM

Reply
 
Thread Tools
Old 10-11-2010, 12:56 AM   #1
hong_sunwoo
Member
 
Location: Suwon, Korea

Join Date: Jan 2010
Posts: 11
Default Split accepted_hits.bam file after Tophat run?

Hello.

I am working on two RNA seq data from different conditions.
Now I am using tophat - cufflinks pipe line.

At first time, I ran tophat seperately each sam files (previous version Tophat was used).

Yesterday, I learned that two sample can be pooled in tophat process and ran tophat(v 1.1.0) as below.
$ tophat /rnaseq/bowtie/indexes/hg19 s_1_sequence.txt,s_2_sequence.txt

After run, I found that only one Bam file (accepted_hits.bam).
Because two RNA-seq data was processed, I guess that Tophat might report to Bam files.
Does it mean that I misunderstood something?
Or is it possible to split up again as quote below?
Quote:
If you do pool the reads, you could also rename them to tag them by sample, so you can split the sample alignments up again after the TopHat run if needed.
hong_sunwoo is offline   Reply With Quote
Old 10-15-2010, 09:43 AM   #2
rocksd
Member
 
Location: Houston, TX

Join Date: Jul 2010
Posts: 14
Default

I believe you need to remove the "," between the two sequencing file if they are from two different samples.
rocksd is offline   Reply With Quote
Old 10-15-2010, 10:14 AM   #3
krobison
Senior Member
 
Location: Boston area

Join Date: Nov 2007
Posts: 747
Default

Removing the comma between files is not compatible with the manual; supplying two lists like that will mean they will be interpreted as the two sides of paired end reads & if the ids don't match up I suspect the program will error out.

Code:
Usage: tophat [options]* <index_base> <reads1_1[,...,readsN_1]> [reads1_2,...readsN_2] 


<reads1_1[,...,readsN_1]>	 A comma-separated list of files containing reads in FASTQ or FASTA format. When running TopHat with paired-end reads, this should be the *_1 ("left") set of files.
<[reads1_2,...readsN_2]>	 A comma-separated list of files containing reads in FASTA or FASTA format. Only used when running TopHat with paired end reads, and contains the *_2 ("right") set of files. The *_2 files MUST appear in the same order as the *_1 files.
krobison is offline   Reply With Quote
Old 10-15-2010, 11:46 PM   #4
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 870
Default

We've had the same problem. We'd like to process multiple files in the same batch - using the combined evidence from all files to do junction detection. At the moment we've run things like you said, and then parsed the sam output file to use the ids to decide which original file the hit came from. However this won't work in all cases since the ids aren't always unique between different files.

I think adding this functionality would be really useful, and did suggest this to the developers, but haven't heard anything back as yet.
simonandrews is offline   Reply With Quote
Old 10-17-2010, 01:05 PM   #5
krobison
Senior Member
 
Location: Boston area

Join Date: Nov 2007
Posts: 747
Default

Could you ensure uniqueness of ids by prefixing them prior to running through TopHat? (Not that I love one more preprocessing step)
krobison is offline   Reply With Quote
Old 10-17-2010, 01:15 PM   #6
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 609
Default

Yes prefixing to ensure uniqueness would work.
fkrueger is offline   Reply With Quote
Old 10-18-2010, 12:06 AM   #7
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 870
Default

Quote:
Originally Posted by krobison View Post
Could you ensure uniqueness of ids by prefixing them prior to running through TopHat? (Not that I love one more preprocessing step)
You could, but it's a pain because you'd have to duplicate all of your original files to do this. Also, when we've tried this off larger numbers of samples (more than 8 lanes worth) then tophat seems to come to a grinding halt when it's trying to do the junction detection. It seems to create enormous temporary files which it then spends ages trying to sort. We ended up killing it after 24hours at this step.

If this is to work efficiently I suspect there'd need to be some more structural changes inside the program.
simonandrews is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:03 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO