SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Repost: tophat-fusion outputs empty result mrfox Bioinformatics 31 10-13-2016 08:03 AM
Dindel empty variant result file land_NGS Bioinformatics 3 09-25-2013 02:57 PM
RNA-Seq: TopHat-Fusion: an algorithm for discovery of novel fusion transcripts. Newsbot! Literature Watch 5 07-13-2013 01:02 AM
tophat-fusion-post:known fusions? mrfox Bioinformatics 0 01-19-2012 09:04 AM
tophat fusion --fusion-min-dist MerFer Bioinformatics 1 07-24-2011 08:09 PM

Reply
 
Thread Tools
Old 10-04-2012, 10:09 AM   #21
Emilie
Member
 
Location: Toronto

Join Date: Nov 2010
Posts: 21
Default

Sure, I just sent you a private message with my email address.
Quote:
Originally Posted by tankman View Post
Hi Emilie,

The only think I can think of now is that the fusions.out file is somehow very different than yours. Is there a way we could exchange fusions.out files and you try it on mine and try it on yours to see if there's any difference in result?

thanks
tm
Emilie is offline   Reply With Quote
Old 11-26-2012, 12:43 PM   #22
yingzhang
Junior Member
 
Location: Minneapolis

Join Date: Feb 2012
Posts: 9
Default extreme memory requirement for Tophat-fusion-post

After 3 failed tophat-fusion-post runs on the sample MCF data, I finally got the program to work.

This is the setting of my successful job:

#PBS -l mem=64gb,nodes=1pn=8,walltime=24:00:00

It finished in 21 minutes.

The last failed job used 32G mem.
yingzhang is offline   Reply With Quote
Old 01-09-2013, 08:59 AM   #23
kokonech
Curious Character
 
Location: Berlin, Germany

Join Date: Sep 2010
Posts: 13
Default

Hi,
does anybody have a response from the authors or any clues how the filtering of juncions.out occur?

In my simulation experiments I have a large list of fusion candidates (about 500), and this list contains the simulated fusions. These fusions should pass the basic filtering: number of fusions reads, number of read pairs etc. But after running tophat-fusion-post I have 0 fusions.
kokonech is offline   Reply With Quote
Old 02-26-2013, 05:19 PM   #24
Charitra
Member
 
Location: Seoul, Korea

Join Date: Feb 2013
Posts: 57
Default tophat-fusion-post result empty

Answer is:
My experience.. There is only one reason for empty tophat fusion post empty and that is: you did not prefix tophat_(your sample name) to your output directory while commanding tophat (pre)fusion i.e. first command. That is why tophat can not read your output file after first tophat (pre)fusion run....if not sure then try changing your output file name and again run tophat fusion post (not pre fusion)... you will get empty result file with 0 fusion gene.
If there is other reason, tophat will give you error... if there is no error that mean it can not read your output file.

Please let me explain you in detail.

Follow these commands: (available online at broadinstitute) How to run Tophat-fusion? https://confluence.broadinstitute.or...ageId=46531375

Step 1: Install
bowtie i.e. bowtie1
tophat2
samtools
ncbi.blast

Step 2: download
ensGene.txt
refGene.txt

Step 3:
make a directory of your sample, for example your sample name is John then your directory is John. Place your .fastq files (n=2, for pair end).

Step 4: transfer the downloaded files ensGene.txt and refGene.txt into John directory which is your sample directory.

Step 4: Login to putty
PATH=$PATHplease give the path of bowtie)
PATH=$PATHplease give the path of samtols)
PATH=$PATHplease give the path of tophat2)
PATH=$PATHplease give the path of blast)
export PATH

Step 5: change directory and come to your sample directory i.e. John
cd (please give the path of John)
Now you are here in you working directory like below:
John$

Step 6:run the tophat fusion using following standard commands. Most important thing is -o command which makes the your output directory, So always make your output folder name starting from tophat_(your output name). For example; sample name John so I will give command like this: -o tophat_John. Thats all.

Standard commands are: for sample name John (remember now I have sample name John which has two .fastq files i.e. John_1.fastq John_2.fastq).

tophat -o tophat_John -p 8 --fusion-search --keep-fasta-order --bowtie 1 --no-coverage-search -r 0 --mate-std-dev 80 --max-intron-length 100000 --fusion-min-dist 100000 --fusion-anchor-length 13 --fusion-ignore-chromoso mes chrM (Path of hg19 which should be in bowtie1) John_1.fastq John_2.fastq
run it....

Step 7:Post Fusion
tophat-fusion-post -p 8 --num-fusion-reads 1 --num-fusion-pairs 2 --num-fusion-both 5 (Path of hg19 which should be in bowtie1)
run it....

You will get the fusion genes.
If still there is proble... reply to me please....
Charitra is offline   Reply With Quote
Old 02-26-2013, 08:24 PM   #25
Charitra
Member
 
Location: Seoul, Korea

Join Date: Feb 2013
Posts: 57
Default

Answer is:
My experience.. There is only one reason for empty tophat fusion post empty and that is: you did not prefix tophat_(your sample name) to your output directory while commanding tophat (pre)fusion i.e. first command. That is why tophat can not read your output file after first tophat (pre)fusion run....if not sure then try changing your output file name and again run tophat fusion post (not pre fusion)... you will get empty result file with 0 fusion gene.
If there is other reason, tophat will give you error... if there is no error that mean it can not read your output file.

Please let me explain you in detail.

Follow these commands: (available online at broadinstitute) How to run Tophat-fusion? https://confluence.broadinstitute.or...ageId=46531375

Step 1: Install
bowtie i.e. bowtie1
tophat2
samtools
ncbi.blast

Step 2: download
ensGene.txt
refGene.txt

Step 3:
make a directory of your sample, for example your sample name is John then your directory is John. Place your .fastq files (n=2, for pair end).

Step 4: transfer the downloaded files ensGene.txt and refGene.txt into John directory which is your sample directory.

Step 4: Login to putty
PATH=$PATHplease give the path of bowtie)
PATH=$PATHplease give the path of samtols)
PATH=$PATHplease give the path of tophat2)
PATH=$PATHplease give the path of blast)
export PATH

Step 5: change directory and come to your sample directory i.e. John
cd (please give the path of John)
Now you are here in you working directory like below:
John$

Step 6:run the tophat fusion using following standard commands. Most important thing is -o command which makes the your output directory, So always make your output folder name starting from tophat_(your output name). For example; sample name John so I will give command like this: -o tophat_John. Thats all.

Standard commands are: for sample name John (remember now I have sample name John which has two .fastq files i.e. John_1.fastq John_2.fastq).

tophat -o tophat_John -p 8 --fusion-search --keep-fasta-order --bowtie 1 --no-coverage-search -r 0 --mate-std-dev 80 --max-intron-length 100000 --fusion-min-dist 100000 --fusion-anchor-length 13 --fusion-ignore-chromoso mes chrM (Path of hg19 which should be in bowtie1) John_1.fastq John_2.fastq
run it....

Step 7:Post Fusion
tophat-fusion-post -p 8 --num-fusion-reads 1 --num-fusion-pairs 2 --num-fusion-both 5 (Path of hg19 which should be in bowtie1)
run it....

You will get the fusion genes.
Charitra is offline   Reply With Quote
Old 06-04-2013, 05:22 AM   #26
samar
Junior Member
 
Location: USA

Join Date: May 2013
Posts: 8
Default tophat fusion post 0 fusions

Hi,
I have similar problem, but when I run tophat-fusion-post with the developers samples, I am getting fusions, but looks dufferent.
However, when I tried many samples all are giving me 0 fusions.
Any idea , please!

Thanks

Quote:
Originally Posted by Charitra View Post
Answer is:
My experience.. There is only one reason for empty tophat fusion post empty and that is: you did not prefix tophat_(your sample name) to your output directory while commanding tophat (pre)fusion i.e. first command. That is why tophat can not read your output file after first tophat (pre)fusion run....if not sure then try changing your output file name and again run tophat fusion post (not pre fusion)... you will get empty result file with 0 fusion gene.
If there is other reason, tophat will give you error... if there is no error that mean it can not read your output file.

Please let me explain you in detail.

Follow these commands: (available online at broadinstitute) How to run Tophat-fusion? https://confluence.broadinstitute.or...ageId=46531375

Step 1: Install
bowtie i.e. bowtie1
tophat2
samtools
ncbi.blast

Step 2: download
ensGene.txt
refGene.txt

Step 3:
make a directory of your sample, for example your sample name is John then your directory is John. Place your .fastq files (n=2, for pair end).

Step 4: transfer the downloaded files ensGene.txt and refGene.txt into John directory which is your sample directory.

Step 4: Login to putty
PATH=$PATHplease give the path of bowtie)
PATH=$PATHplease give the path of samtols)
PATH=$PATHplease give the path of tophat2)
PATH=$PATHplease give the path of blast)
export PATH

Step 5: change directory and come to your sample directory i.e. John
cd (please give the path of John)
Now you are here in you working directory like below:
John$

Step 6:run the tophat fusion using following standard commands. Most important thing is -o command which makes the your output directory, So always make your output folder name starting from tophat_(your output name). For example; sample name John so I will give command like this: -o tophat_John. Thats all.

Standard commands are: for sample name John (remember now I have sample name John which has two .fastq files i.e. John_1.fastq John_2.fastq).

tophat -o tophat_John -p 8 --fusion-search --keep-fasta-order --bowtie 1 --no-coverage-search -r 0 --mate-std-dev 80 --max-intron-length 100000 --fusion-min-dist 100000 --fusion-anchor-length 13 --fusion-ignore-chromoso mes chrM (Path of hg19 which should be in bowtie1) John_1.fastq John_2.fastq
run it....

Step 7:Post Fusion
tophat-fusion-post -p 8 --num-fusion-reads 1 --num-fusion-pairs 2 --num-fusion-both 5 (Path of hg19 which should be in bowtie1)
run it....

You will get the fusion genes.
samar is offline   Reply With Quote
Old 06-04-2013, 05:08 PM   #27
Charitra
Member
 
Location: Seoul, Korea

Join Date: Feb 2013
Posts: 57
Default

Please read my last post. check the output directory name, if it is different then you will get 0 fusion. and.. try chimerascan is the best for fusion.
Please reply if you get error again.
Charitra
Charitra is offline   Reply With Quote
Old 06-04-2013, 06:34 PM   #28
samar
Junior Member
 
Location: USA

Join Date: May 2013
Posts: 8
Default

Thank you so much for replying!
Yes, what I did is very similar to yours.
And, I did the prefix:
tophat -o /path/to/sample/tophat_PRO -p 8 --fusion-search --keep-fasta-order --bowtie1 --no-coverage-search -r 0 --mate-std-dev 80 --max-intron-length 100000 --fusion-min-dist 100000 --fusion-anchor-length 13 --fusion-ignore-chromosomes chrM /../UCSC/hg19/Sequence/BowtieIndex/genome sample.fastq

The odd thing is that when I run the test sample from tophat fusion website, I did get the fusions.
However, with the 4 samples I did not get anything.





Quote:
Originally Posted by Charitra View Post
Please read my last post. check the output directory name, if it is different then you will get 0 fusion. and.. try chimerascan is the best for fusion.
Please reply if you get error again.
Charitra
samar is offline   Reply With Quote
Old 06-04-2013, 09:38 PM   #29
Charitra
Member
 
Location: Seoul, Korea

Join Date: Feb 2013
Posts: 57
Default

Hi there,
Can you compare the two commands:

tophat -o /path/to/sample/tophat_PRO -p 8 --fusion-search --keep-fasta-order --bowtie1 --no-coverage-search -r 0 --mate-std-dev 80 --max-intron-length 100000 --fusion-min-dist 100000 --fusion-anchor-length 13 --fusion-ignore-chromosomes chrM /../UCSC/hg19/Sequence/BowtieIndex/genome sample.fastq

tophat -o tophat_John -p 8 --fusion-search --keep-fasta-order --bowtie 1 --no-coverage-search -r 0 --mate-std-dev 80 --max-intron-length 100000 --fusion-min-dist 100000 --fusion-anchor-length 13 --fusion-ignore-chromoso mes chrM (Path of hg19 which should be in bowtie1) John_1.fastq John_2.fastq

Well, the problems may be sorted out if you:
1. -o tophat_PRO
2. remove spaces from file or dir names i.e. genome_sample.fastq (genome sample.fastq ?)
3. Where is your two .fastq files. pair end ?




Quote:
Originally Posted by samar View Post
Thank you so much for replying!
Yes, what I did is very similar to yours.
And, I did the prefix:
tophat -o /path/to/sample/tophat_PRO -p 8 --fusion-search --keep-fasta-order --bowtie1 --no-coverage-search -r 0 --mate-std-dev 80 --max-intron-length 100000 --fusion-min-dist 100000 --fusion-anchor-length 13 --fusion-ignore-chromosomes chrM /../UCSC/hg19/Sequence/BowtieIndex/genome sample.fastq

The odd thing is that when I run the test sample from tophat fusion website, I did get the fusions.
However, with the 4 samples I did not get anything.
Charitra is offline   Reply With Quote
Old 06-04-2013, 10:17 PM   #30
samar
Junior Member
 
Location: USA

Join Date: May 2013
Posts: 8
Default

Hi Charita,,
1. Yeah. This is only the path for the output directory
2. The space because the genome is a completion from the index directory path, and the sample.fastq is the sample.
3. Yeah I think this is an important point, it is single end. And it's looks that I am running paired end. However, I am not sure if should I run it in the same way or should it be different?
In the manual it says that tophat-fusion can run single or paired end.

I am really appreciating your help!!
Thank you Charita




Quote:
Originally Posted by Charitra View Post
Hi there,
Can you compare the two commands:

tophat -o /path/to/sample/tophat_PRO -p 8 --fusion-search --keep-fasta-order --bowtie1 --no-coverage-search -r 0 --mate-std-dev 80 --max-intron-length 100000 --fusion-min-dist 100000 --fusion-anchor-length 13 --fusion-ignore-chromosomes chrM /../UCSC/hg19/Sequence/BowtieIndex/genome sample.fastq

tophat -o tophat_John -p 8 --fusion-search --keep-fasta-order --bowtie 1 --no-coverage-search -r 0 --mate-std-dev 80 --max-intron-length 100000 --fusion-min-dist 100000 --fusion-anchor-length 13 --fusion-ignore-chromoso mes chrM (Path of hg19 which should be in bowtie1) John_1.fastq John_2.fastq

Well, the problems may be sorted out if you:
1. -o tophat_PRO
2. remove spaces from file or dir names i.e. genome_sample.fastq (genome sample.fastq ?)
3. Where is your two .fastq files. pair end ?
samar is offline   Reply With Quote
Old 06-05-2013, 01:05 AM   #31
Charitra
Member
 
Location: Seoul, Korea

Join Date: Feb 2013
Posts: 57
Default

For the Paired end, I am not able to say anything.
prefer to use -o tophat_PRO

nothing more to say/
Charitra is offline   Reply With Quote
Old 06-05-2013, 01:20 AM   #32
samar
Junior Member
 
Location: USA

Join Date: May 2013
Posts: 8
Default

Thank you so much.!
No problem,I will try to figure it out!

Quote:
Originally Posted by Charitra View Post
For the Paired end, I am not able to say anything.
prefer to use -o tophat_PRO

nothing more to say/
samar is offline   Reply With Quote
Old 06-05-2013, 08:41 PM   #33
samar
Junior Member
 
Location: USA

Join Date: May 2013
Posts: 8
Default

Finally I solved the problem. Just for general benefits here is the solution:
tophat-fusion-post -p 8 --num-fusion-reads 1 --num-fusion-pairs 2 --num-fusion-both 5 bowtie

so, since I am using single end reads, what I did I change
--num-fusion-pairs 2 to --num-fusion-pairs 0

It is working!!

Thanks
samar is offline   Reply With Quote
Old 06-17-2013, 11:21 AM   #34
nbahlis
Member
 
Location: Canada

Join Date: May 2013
Posts: 25
Default

I was getting an empty tophat-fusion-post results. I am working with Ion Proton reads, ie single stranded reads. The problem was in the command for the top hat-fusion-post
"tophat-fusion-post -p 8 --num-fusion-reads 1 --num-fusion-pairs 2 --num-fusion-both 5 /path/to/h_sapiens/bowtie_index".
With single reads I had to specify --num-fusion-pairs 0 This solved the problem. I hope this helps save lots of stress to someone else dealing with single end reads!

Last edited by nbahlis; 06-17-2013 at 11:23 AM. Reason: typos
nbahlis is offline   Reply With Quote
Old 05-07-2015, 11:01 AM   #35
arun
Junior Member
 
Location: India

Join Date: Nov 2010
Posts: 5
Default

Hi All,

The problem of getting 0 fusions can be overcome by following the method as described:

Download the known annotations from the following link:
http://tophat-fusion.sourceforge.net...n-0.1.0.tar.gz
Download, extract and copy the ensGene.txt, ensGtp.txt, mcl and refGene_sorted.txt files to your working tophat_directory.

Retain the directory (folder and files) structure as suggested in the website
http://ccb.jhu.edu/software/tophat/fusion_tutorial.html

1. Directory structure should contain the following:

(top_dir) or other wise called your working directory should contain the following:
a) tophat_sample_1 (sample number one) - which contains the output of tophat fusion, i.e it contains accepted_hits.bam, align_summary.txt, deletions.bed, fusions.out, insertions.bed, junctions.bed, logs (folder), prep_reads.info and unmapped.bam.
(NOTE: your output name should be "tophat_sample_name", you can have tophat fusion-search output for 'n' samples)
b) ensGene.txt
c) ensGtp.txt
d) mcl
e) refGene_sorted.txt
f) blast_human (folder) - contains human_genomic*, other_genomic*, and nt* from blast database

2. Running tophat-fusion-post

Use tophat-fusion-post.py program located in the folder "tophatfusioin-0.1.0/src/" for identifying potential fusions (http://tophat-fusion.sourceforge.net...n-0.1.0.tar.gz)

Usage: /home/user/Downloads/tophatfusion-0.1.0/src/tophat-fusion-post.py -p 8 --num-fusion-reads 1 --num-fusion-pairs 2 --num-fusion-both 5 /home/user/Databases/hg19/hg19 (index_files)

After using the python script, I could get potential fusions from my data. Hope this will help you.

Regards,
Arun
arun is offline   Reply With Quote
Old 11-25-2015, 05:12 PM   #36
Alex Lee
Junior Member
 
Location: N. Cal

Join Date: Apr 2014
Posts: 10
Default

Hi, so I actually got fusion, however, I still confused about the blast database, from the instruction it wants a ./blast_human directory containing

human_genomic* and nt*

What does the wildcard * imply? because the there are at least 16 gz files for human_genomics alone, human_genomic.00.tar.gz, when expanded there are 11 files per gz file, for example .00.tar.gz has, .nhd, nhi, nhr, nog etc etc up to 16 files. So do we literally download every single file exand all 11 and put them into the blast subdirectory?

I estimate that there will be at least 539 files in the sub directory alone. Am I missing something. Thanks.
Alex Lee is offline   Reply With Quote
Old 11-26-2015, 10:59 AM   #37
arun
Junior Member
 
Location: India

Join Date: Nov 2010
Posts: 5
Default BLAST database for TopHat fusion

Hi Alex,

Yes. You need to download all the files and extract it. However its simple; in the Linux terminal just execute the following commands.

Create a directory named: "blast", inside the directory "TopHat" (where you performed tophat fusion command). Change directory to "blast".
i.e., (top_dir)/blast

Next, download the all databases with simple command as follows:

$ wget -c "ftp://ftp.ncbi.nlm.nih.gov/blast/db/human_genomic*"

The above command will download all the files (one after the other) with the name starting as "human_genomic" in to your folder.

* - the wildcard '*' matches zero or any character followed by human_genomic, hence all the files that follow the name human_genomic are downloaded.

Similarly, you can use the above command to download other databases (other_genomic* and nt* in the same folder by replacing "human_genomic" to "other_genomic" and later "nt").

NOTE: When you index a database, many index files are produced such as .nhd, .nhi, .nhr etc. NCBI have indexed these databases and are stored in each corresponding compressed files. You just need to download them (which is what you are doing in the previous steps) and extract the .tar.gz files (compressed files). Use the following Linux command to unzip all files at once.

$ tar -xvzf *.gz

If you have any further queries, let me know.

Regards,
Arun

Last edited by arun; 11-26-2015 at 11:04 AM.
arun is offline   Reply With Quote
Old 11-29-2015, 09:56 AM   #38
Alex Lee
Junior Member
 
Location: N. Cal

Join Date: Apr 2014
Posts: 10
Default

Arun, thanks for the clarification and example. Just in case someone else might find this useful, I also found that a symbolic link works once the blastdb is prep. This way I just download it once. It works soemthing like this:

ln -s /path/to/blastdb top_dir/blast

thanks.

Last edited by Alex Lee; 11-30-2015 at 08:33 AM.
Alex Lee is offline   Reply With Quote
Old 11-30-2015, 08:47 AM   #39
Alex Lee
Junior Member
 
Location: N. Cal

Join Date: Apr 2014
Posts: 10
Default

Hi everyone, I was able to solve a similar problem to the original post, failed due to too many files open. This is my experience and some new stuff I learned.

1. My reads are about 40 GB per sample so they are quite large. I solve my problem by balancing the number of threads and memory requested
2. I learned about the logs. They are useful and there are many different to diagnose problem areas. One log I saw was run.log; I think it might be possible to continue the run by copying the commands from a succesful run and changing parameters like file name, dir, etc... I have not tried this method yet but would be interesting.
3. BLAST did not seem to add much other than making my post analysis longer. I would still do it but so far I don't see much difference.

Last edited by Alex Lee; 05-21-2016 at 09:42 PM.
Alex Lee is offline   Reply With Quote
Old 05-18-2016, 12:23 AM   #40
bibi
Junior Member
 
Location: France

Join Date: Aug 2015
Posts: 3
Default

Hello,

For blast databases, is it necessary to have the 3 databases : human_genomic, other_genomic and nt ?

Quote:
Originally Posted by arun View Post
Hi All,

The problem of getting 0 fusions can be overcome by following the method as described:

Download the known annotations from the following link:
http://tophat-fusion.sourceforge.net...n-0.1.0.tar.gz
Download, extract and copy the ensGene.txt, ensGtp.txt, mcl and refGene_sorted.txt files to your working tophat_directory.

Retain the directory (folder and files) structure as suggested in the website
http://ccb.jhu.edu/software/tophat/fusion_tutorial.html

1. Directory structure should contain the following:

(top_dir) or other wise called your working directory should contain the following:
a) tophat_sample_1 (sample number one) - which contains the output of tophat fusion, i.e it contains accepted_hits.bam, align_summary.txt, deletions.bed, fusions.out, insertions.bed, junctions.bed, logs (folder), prep_reads.info and unmapped.bam.
(NOTE: your output name should be "tophat_sample_name", you can have tophat fusion-search output for 'n' samples)
b) ensGene.txt
c) ensGtp.txt
d) mcl
e) refGene_sorted.txt
f) blast_human (folder) - contains human_genomic*, other_genomic*, and nt* from blast database

2. Running tophat-fusion-post

Use tophat-fusion-post.py program located in the folder "tophatfusioin-0.1.0/src/" for identifying potential fusions (http://tophat-fusion.sourceforge.net...n-0.1.0.tar.gz)

Usage: /home/user/Downloads/tophatfusion-0.1.0/src/tophat-fusion-post.py -p 8 --num-fusion-reads 1 --num-fusion-pairs 2 --num-fusion-both 5 /home/user/Databases/hg19/hg19 (index_files)

After using the python script, I could get potential fusions from my data. Hope this will help you.

Regards,
Arun
bibi is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:29 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO