Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tophat fusion does not produce desired fusions?

    Hi everyone,
    We have rna-seq data from our samples in mouse. The reads are paired end reads each of 76bp length. We wanted to detect fusions in them using Tophat fusion. We already know certain major fusions that exist between two chromosomes by FISH,SKY etc. So, we want to test tophat fusion if it can detect the same for us. I get the output of tophat fusion and tophat fusion post but the important fusion is still missing. I wanted to know if I am doing something wrong as I am not experienced in using the fusion search in tophat.
    I am providing the code that i used to run it. Kindly let me know if I should change it and try.
    Here is the options I used :
    tophat2 -o tophat_out --fusion-min-dist 100000 --fusion-search --keep-fasta-order --bowtie1 --no-coverage-search --mate-inner-distance 11 --mate-std-dev 57 --fusion-ignore-chromosomes chrM -p 8 /path/to/bowtie/mm9 seq1 seq2

    Any help is much appreciated.
    Thanks,
    Himanshu

  • #2
    What flags did you use to run tophat-fusion-post? Did you use the --non-human flag?

    -Dinesh

    Comment


    • #3
      Hi Dinesh,
      Thanks a lot for your reply. I did use --non-human flag. Also, when I manually filter out the fusions.out file, I find fusions in our wild type samples which is not expected. One question I have is how do you find what is real and what is artifact in the output. Also, what I understand is that tophat -fusion -post gives us the known fusions and not the novel fusions?. So, I decided to manually filter the fusions.out produced at the first step. Can you suggest me a better tool or what options can I change to get better results?.

      Comment


      • #4
        Yes, finding fusions in wild type is strange. Not sure what is happening there. But I would look at the alignment that tophat-fusion-post generates (easier to look at the html output) showing the supporting reads. Quoting the TopHat-Fusion paper, "As reported in Edgren et al., true fusion transcripts have reads mapping uniformly in a wide window across the fusion point, whereas false positive fusions are narrowly covered." I would also watch out for readthrough gene fusions but some might be real. The paper below describes readthrough fusions...
        Background Readthrough fusions across adjacent genes in the genome, or transcription-induced chimeras (TICs), have been estimated using expressed sequence tag (EST) libraries to involve 4-6% of all genes. Deep transcriptional sequencing (RNA-Seq) now makes it possible to study the occurrence and expression levels of TICs in individual samples across the genome. Methods We performed single-end RNA-Seq on three human prostate adenocarcinoma samples and their corresponding normal tissues, as well as brain and universal reference samples. We developed two bioinformatics methods to specifically identify TIC events: a targeted alignment method using artificial exon-exon junctions within 200,000 bp from adjacent genes, and genomic alignment allowing splicing within individual reads. We performed further experimental verification and characterization of selected TIC and fusion events using quantitative RT-PCR and comparative genomic hybridization microarrays. Results Targeted alignment against artificial exon-exon junctions yielded 339 distinct TIC events, including 32 gene pairs with multiple isoforms. The false discovery rate was estimated to be 1.5%. Spliced alignment to the genome was less sensitive, finding only 18% of those found by targeted alignment in 33-nt reads and 59% of those in 50-nt reads. However, spliced alignment revealed 30 cases of TICs with intervening exons, in addition to distant inversions, scrambled genes, and translocations. Our findings increase the catalog of observed TIC gene pairs by 66%. We verified 6 of 6 predicted TICs in all prostate samples, and 2 of 5 predicted novel distant gene fusions, both private events among 54 prostate tumor samples tested. Expression of TICs correlates with that of the upstream gene, which can explain the prostate-specific pattern of some TIC events and the restriction of the SLC45A3-ELK4 e4-e2 TIC to ERG-negative prostate samples, as confirmed in 20 matched prostate tumor and normal samples and 9 lung cancer cell lines. Conclusions Deep transcriptional sequencing and analysis with targeted and spliced alignment methods can effectively identify TIC events across the genome in individual tissues. Prostate and reference samples exhibit a wide range of TIC events, involving more genes than estimated previously using ESTs. Tissue specificity of TIC events is correlated with expression patterns of the upstream gene. Some TIC events, such as MSMB-NCOA4, may play functional roles in cancer.


        TopHat-Fusion gives you both known and novel fusions and at the end of the html output file it lists the Mitelman database of known gene fusions in cancer. Also, the fusions in the results list are sorted by the fusion score.

        Have a look at deFuse. In my experience, if a novel fusion is found by both TopHat-Fusion and deFuse, then it would be an interesting candidate to look at. But all my analysis is based on human samples so I generally go with the default parameters since they are almost always optimized for human reads.

        Download deFuse for free. deFuse is a software package for gene fusion discovery using RNA-Seq data. deFuse .tar.gz bundles will be released periodically on the sourceforge site, see Files.


        -Dinesh
        Last edited by DineshCyanam; 09-25-2012, 07:24 PM.

        Comment


        • #5
          Thanks a lot for the quick reply Dinesh. I was thinking of using DeFuse, but was not sure if that would work great with mouse samples. I think I will read the paper in order to understand what are true fusions and what are false positives. But, the main problem is that it shows fusions in the control sample as well. But, Once again Thanks a lot for your help. Any comments/suggestions are most appreciated if anyone has more inputs for this matter.
          Thanks,
          Himanshu

          Comment


          • #6
            Hi,

            Any update on how you went with your data and what worked?

            Comment


            • #7
              Hi KMAHD,
              I am still trying to modify the parameters and check if it works. Don't think I am still convinced on what is real and not in the fusions. I will post about the same if I make any progress. Are you trying to do the same?.
              Thanks,
              HImanshu

              Comment


              • #8
                Yes trying similar stuff - especially regarding tophat fusion one thing that confuses me is that the output file does not list the samples in the top_dir or the main directory.

                This is more of a technical question I guess.

                So, I am running tophat fusion search on all samples individually with in the top dir. however when i run the post process the final output result txt and html files do not the sample. I believe the result file should list them - as shown in the example files online.

                In addition just wanted to know if you tried any other approach and how successful you were with that.

                Thanks

                Comment


                • #9
                  Yes, That is true. I have tried fusionseq but was not successful. I will be trying to use DeFuse in some time if this does not work Completely. Have you tried anything else yet?.

                  Comment


                  • #10
                    help with tophat-fusion-post

                    Hi Dinesh and Himanshu,

                    It seems you have tophat-fusion-post working, something I haven't yet managed to accomplish based on their MCF7 example. Have you gotten their example to work? My runs of tophat-fusion-post consistently yield no fusions even though the fusions.out results from the tophat alignment clearly have candidates -- I've complied with the directory structure, etc. requirements and have searched similar threads with no luck. I would appreciate any help!

                    thanks,
                    t

                    Comment


                    • #11
                      Hi Tankman,
                      I did manage to get it working. What was the directory structure you used for the tophat post fusion?. Also, when running the script, did you run it in the top_dir ?.
                      Thanks,
                      HImanshu

                      Comment


                      • #12
                        tophat-fusion-post directory structure

                        Hi Himanshu,

                        Thanks a lot for answering. Here's my directory structure and some other info. THanks!

                        I'm positive it's failing at the filtration step since it doesn't even BLAST anything.


                        [tankmanb01@node2-4 tophat]$ find tophat_* -name "run.log" -exec head -1 {} \;
                        /packages/tophat/2.0.4/bin/tophat -p 64 -o tophat/tophat_KPL_final --fusion-search --keep-fasta-order --bowtie1 --no-coverage-search -r 0 --mate-std-dev 80 --fusion-min-dist 100000 --fusion-anchor-length 13 --fusion-ignore-chromosomes chrM index/hg19 SRR064287_1.fastq
                        SRR064287_2.fastq
                        /packages/tophat/2.0.4/bin/tophat -p 64 -o tophat/tophat_MCF7_final2 --fusion-search --keep-fasta-order --bowtie1 --no-coverage-search -r 0 --mate-std-dev 80 --fusion-min-dist 100000 --fusion-anchor-length 13 --fusion-ignore-chromosomes chrM index/hg19 SRR064286_1.fastq
                        SRR064286_2.fastq
                        /packages/tophat/2.0.4/bin/tophat -p 45 -o tophat/tophat_SKBR_final2 --fusion-search --keep-fasta-order --bowtie1 --no-coverage-search -r 50 --mate-std-dev 80 --fusion-min-dist 100000 --fusion-anchor-length 13 --fusion-ignore-chromosomes chrM index/hg19 SKBR3_mix_1.fastq
                        SKBR3_mix_2.fastq
                        [tankmanb01@node2-4 tophat]$ module list
                        Currently Loaded Modulefiles:
                        1) modules 3) /gcc/4.6.3(default) 5) /python/2.7.2(default) 7) /tophat/2.0.4 9) /bowtie/0.12.7
                        2) 3.2.9 4) /CPAN/bioperl/1.6.901 6) /boost/1.49.0-gcc4.6-python272 8) /blast/2.2.26 10) /samtools/0.1.18
                        [tankmanb01@node2-4 tophat]$
                        [tankmanb01@node2-4 tophat]$ ls -lrt
                        total 672
                        lrwxrwxrwx. 1 tankmanb01 tankmanb01a 65 Sep 16 20:14 blast -> /scratch/tankmanb01/projects/RNA_Cholangiocarcinoma/AN/tophat/blast
                        lrwxrwxrwx. 1 tankmanb01 tankmanb01a 71 Sep 16 20:14 refGene.txt -> /scratch/tankmanb01/projects/RNA_Cholangiocarcinoma/AN/tophat/refGene.txt
                        lrwxrwxrwx. 1 tankmanb01 tankmanb01a 71 Sep 16 20:14 ensGene.txt -> /scratch/tankmanb01/projects/RNA_Cholangiocarcinoma/AN/tophat/ensGene.txt
                        lrwxrwxrwx. 1 tankmanb01 tankmanb01a 63 Sep 16 20:14 mcl -> /scratch/tankmanb01/projects/RNA_Cholangiocarcinoma/AN/tophat/mcl
                        lrwxrwxrwx. 1 tankmanb01 tankmanb01a 65 Sep 16 20:14 index -> /scratch/tankmanb01/projects/RNA_Cholangiocarcinoma/AN/tophat/index
                        lrwxrwxrwx. 1 tankmanb01 tankmanb01a 5 Sep 16 20:23 blast_human -> blast
                        -rwxr-xr-x. 1 tankmanb01 tankmanb01a 79424 Sep 17 07:55 tophat-fusion-post_original
                        lrwxrwxrwx. 1 tankmanb01 tankmanb01a 78 Sep 17 07:57 tophat-fusion-post -> /scratch/tankmanb01/projects/RNA_Cholangiocarcinoma/AN/tophat/tophat-fusion-post
                        -rwxrwxr-x. 1 tankmanb01 tankmanb01a 80027 Sep 24 10:44 tophat-fusion-post_altered
                        drwxrwxr-x. 3 tankmanb01 tankmanb01a 32768 Sep 27 13:41 tophat_MCF7_final2
                        drwxrwxr-x. 4 tankmanb01 tankmanb01a 32768 Sep 27 13:47 tophat_KPL_final
                        drwxrwxr-x. 10 tankmanb01 tankmanb01a 32768 Sep 27 13:51 misc
                        drwxrwxr-x. 10 tankmanb01 tankmanb01a 32768 Sep 27 14:32 misc2
                        drwxrwxr-x. 6 tankmanb01 tankmanb01a 32768 Sep 27 14:52 test_fusion
                        drwxrwxr-x. 3 tankmanb01 tankmanb01a 32768 Sep 28 15:44 tophat_SKBR_final2
                        drwxrwxr-x. 7 tankmanb01 tankmanb01a 32768 Oct 1 09:44 tophatfusion_out
                        -rw-rw-r--. 1 tankmanb01 tankmanb01a 1373 Oct 1 09:45 check
                        [tankmanb01@node2-4 tophat]$
                        [tankmanb01@node2-4 tophat]$ tophat-fusion-post --num-fusion-reads 1 --num-fusion-pairs 2 --num-fusion-both 5 index/hg19
                        [Mon Oct 1 09:43:51 2012] Beginning TopHat-Fusion post-processing run (v2.0.4)
                        -----------------------------------------------
                        [Mon Oct 1 09:43:51 2012] Extracting 23-mer around fusions and mapping them using Bowtie
                        samples updated
                        [Mon Oct 1 09:44:43 2012] Filtering fusions
                        Processing: tophat_MCF7_final2/fusions.out
                        Processing: tophat_SKBR_final2/fusions.out
                        0 fusions are output in ./tophatfusion_out/potential_fusion.txt
                        [Mon Oct 1 09:44:55 2012] Blasting 50-mers around fusions
                        [Mon Oct 1 09:44:55 2012] Generating read distributions around fusions
                        [Mon Oct 1 09:44:55 2012] Reporting final fusion candidates in html format
                        num of fusions: 0
                        -----------------------------------------------
                        [Mon Oct 1 09:44:55 2012] Run complete [00:01:03 elapsed]
                        [tankmanb01@node2-4 tophat]$ tophat-fusion-post --num-fusion-reads 1 --num-fusion-pairs 2 --num-fusion-both 5 CLOWN
                        [Mon Oct 1 09:46:37 2012] Beginning TopHat-Fusion post-processing run (v2.0.4)
                        -----------------------------------------------
                        [Mon Oct 1 09:46:37 2012] Extracting 23-mer around fusions and mapping them using Bowtie
                        [Mon Oct 1 09:46:37 2012] Filtering fusions
                        Processing: tophat_MCF7_final2/fusions.out
                        Processing: tophat_SKBR_final2/fusions.out
                        0 fusions are output in ./tophatfusion_out/potential_fusion.txt
                        [Mon Oct 1 09:46:46 2012] Blasting 50-mers around fusions
                        [Mon Oct 1 09:46:46 2012] Generating read distributions around fusions
                        [Mon Oct 1 09:46:46 2012] Reporting final fusion candidates in html format
                        num of fusions: 0
                        -----------------------------------------------
                        [Mon Oct 1 09:46:46 2012] Run complete [00:00:09 elapsed]
                        [tankmanb01@node2-4 tophat]$

                        Comment


                        • #13
                          tophat-fusion-post

                          Originally posted by himanshu04 View Post
                          Hi Tankman,
                          I did manage to get it working. What was the directory structure you used for the tophat post fusion?. Also, when running the script, did you run it in the top_dir ?.
                          Thanks,
                          HImanshu
                          Hi Himanshu,

                          Wondering if you managed to take a look at my post of the commands I used to try and replicate their MCF example for tophat-fusion-post. I'm really trying to get this to work.

                          thanks a lot,
                          tm

                          Comment


                          • #14
                            Hi himanshu04
                            Hi Tankman,


                            I have got the same mistake as you:

                            Meaning that in the MCF7 sample data, there are
                            "0 fusions are output in ./tophatfusion_out/potential_fusion.txt"

                            Did you manage to find a solution?

                            Best,
                            Naïra

                            Comment


                            • #15
                              Hi there
                              I am running tophat-fusion with mouse samples and like to ask few things:
                              1. there is no difference if I use --non-human option ? My samples are mm10
                              2. Is there any other option which I must add (other than default)?
                              3. BLAST Database error: No alias or index file found for nucleotide database [blast/nt] in search path. However, I exported PATH but still the error..
                              4. I have downloaded blast:
                              1. ncbi-blast-2.2.28+
                              2. extracted est_mouse.tar.gz mouse_genomic_transcript.tar.gz within ncbi-blast-2.2.28+/ and ncbi-blast-2.2.28+/bin
                              3. I am not sure about other_genomic* and nt* ?
                              I used and exported PATH=$PATH:bowtie1, tophat2, blast, samtools
                              Thank you

                              Originally posted by DineshCyanam View Post
                              What flags did you use to run tophat-fusion-post? Did you use the --non-human flag?

                              -Dinesh
                              Last edited by jp.; 08-05-2013, 06:45 PM.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              11 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              51 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              68 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X