Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    For the Paired end, I am not able to say anything.
    prefer to use -o tophat_PRO

    nothing more to say/

    Comment


    • #32
      Thank you so much.!
      No problem,I will try to figure it out!

      Originally posted by Charitra View Post
      For the Paired end, I am not able to say anything.
      prefer to use -o tophat_PRO

      nothing more to say/

      Comment


      • #33
        Finally I solved the problem. Just for general benefits here is the solution:
        tophat-fusion-post -p 8 --num-fusion-reads 1 --num-fusion-pairs 2 --num-fusion-both 5 bowtie

        so, since I am using single end reads, what I did I change
        --num-fusion-pairs 2 to --num-fusion-pairs 0

        It is working!!

        Thanks

        Comment


        • #34
          I was getting an empty tophat-fusion-post results. I am working with Ion Proton reads, ie single stranded reads. The problem was in the command for the top hat-fusion-post
          "tophat-fusion-post -p 8 --num-fusion-reads 1 --num-fusion-pairs 2 --num-fusion-both 5 /path/to/h_sapiens/bowtie_index".
          With single reads I had to specify --num-fusion-pairs 0 This solved the problem. I hope this helps save lots of stress to someone else dealing with single end reads!
          Last edited by nbahlis; 06-17-2013, 10:23 AM. Reason: typos

          Comment


          • #35
            Hi All,

            The problem of getting 0 fusions can be overcome by following the method as described:

            Download the known annotations from the following link:

            Download, extract and copy the ensGene.txt, ensGtp.txt, mcl and refGene_sorted.txt files to your working tophat_directory.

            Retain the directory (folder and files) structure as suggested in the website


            1. Directory structure should contain the following:

            (top_dir) or other wise called your working directory should contain the following:
            a) tophat_sample_1 (sample number one) - which contains the output of tophat fusion, i.e it contains accepted_hits.bam, align_summary.txt, deletions.bed, fusions.out, insertions.bed, junctions.bed, logs (folder), prep_reads.info and unmapped.bam.
            (NOTE: your output name should be "tophat_sample_name", you can have tophat fusion-search output for 'n' samples)
            b) ensGene.txt
            c) ensGtp.txt
            d) mcl
            e) refGene_sorted.txt
            f) blast_human (folder) - contains human_genomic*, other_genomic*, and nt* from blast database

            2. Running tophat-fusion-post

            Use tophat-fusion-post.py program located in the folder "tophatfusioin-0.1.0/src/" for identifying potential fusions (http://tophat-fusion.sourceforge.net...n-0.1.0.tar.gz)

            Usage: /home/user/Downloads/tophatfusion-0.1.0/src/tophat-fusion-post.py -p 8 --num-fusion-reads 1 --num-fusion-pairs 2 --num-fusion-both 5 /home/user/Databases/hg19/hg19 (index_files)

            After using the python script, I could get potential fusions from my data. Hope this will help you.

            Regards,
            Arun

            Comment


            • #36
              Hi, so I actually got fusion, however, I still confused about the blast database, from the instruction it wants a ./blast_human directory containing

              human_genomic* and nt*

              What does the wildcard * imply? because the there are at least 16 gz files for human_genomics alone, human_genomic.00.tar.gz, when expanded there are 11 files per gz file, for example .00.tar.gz has, .nhd, nhi, nhr, nog etc etc up to 16 files. So do we literally download every single file exand all 11 and put them into the blast subdirectory?

              I estimate that there will be at least 539 files in the sub directory alone. Am I missing something. Thanks.

              Comment


              • #37
                BLAST database for TopHat fusion

                Hi Alex,

                Yes. You need to download all the files and extract it. However its simple; in the Linux terminal just execute the following commands.

                Create a directory named: "blast", inside the directory "TopHat" (where you performed tophat fusion command). Change directory to "blast".
                i.e., (top_dir)/blast

                Next, download the all databases with simple command as follows:

                $ wget -c "ftp://ftp.ncbi.nlm.nih.gov/blast/db/human_genomic*"

                The above command will download all the files (one after the other) with the name starting as "human_genomic" in to your folder.

                * - the wildcard '*' matches zero or any character followed by human_genomic, hence all the files that follow the name human_genomic are downloaded.

                Similarly, you can use the above command to download other databases (other_genomic* and nt* in the same folder by replacing "human_genomic" to "other_genomic" and later "nt").

                NOTE: When you index a database, many index files are produced such as .nhd, .nhi, .nhr etc. NCBI have indexed these databases and are stored in each corresponding compressed files. You just need to download them (which is what you are doing in the previous steps) and extract the .tar.gz files (compressed files). Use the following Linux command to unzip all files at once.

                $ tar -xvzf *.gz

                If you have any further queries, let me know.

                Regards,
                Arun
                Last edited by arun; 11-26-2015, 11:04 AM.

                Comment


                • #38
                  Arun, thanks for the clarification and example. Just in case someone else might find this useful, I also found that a symbolic link works once the blastdb is prep. This way I just download it once. It works soemthing like this:

                  ln -s /path/to/blastdb top_dir/blast

                  thanks.
                  Last edited by Alex Lee; 11-30-2015, 08:33 AM.

                  Comment


                  • #39
                    Hi everyone, I was able to solve a similar problem to the original post, failed due to too many files open. This is my experience and some new stuff I learned.

                    1. My reads are about 40 GB per sample so they are quite large. I solve my problem by balancing the number of threads and memory requested
                    2. I learned about the logs. They are useful and there are many different to diagnose problem areas. One log I saw was run.log; I think it might be possible to continue the run by copying the commands from a succesful run and changing parameters like file name, dir, etc... I have not tried this method yet but would be interesting.
                    3. BLAST did not seem to add much other than making my post analysis longer. I would still do it but so far I don't see much difference.
                    Last edited by Alex Lee; 05-21-2016, 08:42 PM.

                    Comment


                    • #40
                      Hello,

                      For blast databases, is it necessary to have the 3 databases : human_genomic, other_genomic and nt ?

                      Originally posted by arun View Post
                      Hi All,

                      The problem of getting 0 fusions can be overcome by following the method as described:

                      Download the known annotations from the following link:

                      Download, extract and copy the ensGene.txt, ensGtp.txt, mcl and refGene_sorted.txt files to your working tophat_directory.

                      Retain the directory (folder and files) structure as suggested in the website


                      1. Directory structure should contain the following:

                      (top_dir) or other wise called your working directory should contain the following:
                      a) tophat_sample_1 (sample number one) - which contains the output of tophat fusion, i.e it contains accepted_hits.bam, align_summary.txt, deletions.bed, fusions.out, insertions.bed, junctions.bed, logs (folder), prep_reads.info and unmapped.bam.
                      (NOTE: your output name should be "tophat_sample_name", you can have tophat fusion-search output for 'n' samples)
                      b) ensGene.txt
                      c) ensGtp.txt
                      d) mcl
                      e) refGene_sorted.txt
                      f) blast_human (folder) - contains human_genomic*, other_genomic*, and nt* from blast database

                      2. Running tophat-fusion-post

                      Use tophat-fusion-post.py program located in the folder "tophatfusioin-0.1.0/src/" for identifying potential fusions (http://tophat-fusion.sourceforge.net...n-0.1.0.tar.gz)

                      Usage: /home/user/Downloads/tophatfusion-0.1.0/src/tophat-fusion-post.py -p 8 --num-fusion-reads 1 --num-fusion-pairs 2 --num-fusion-both 5 /home/user/Databases/hg19/hg19 (index_files)

                      After using the python script, I could get potential fusions from my data. Hope this will help you.

                      Regards,
                      Arun

                      Comment


                      • #41
                        Hi everyone,

                        I am also facing a similar problem. I was able to get 72 fusions with one sample that I had from a group of samples, however for the other samples I get 0 fusions.

                        My command is as follows:
                        tophat-fusion-post -p 8 --num-fusion-reads 1 --num-fusion-pairs 2 --num-fusion-both 5 /path_to_indices/ucsc_hg19

                        If I was able to get tophat-fusion to work for 1 of my samples and not the others, does this mean that the other samples have no fusions ?

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Strategies for Sequencing Challenging Samples
                          by seqadmin


                          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                          03-22-2024, 06:39 AM
                        • seqadmin
                          Techniques and Challenges in Conservation Genomics
                          by seqadmin



                          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                          Avian Conservation
                          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                          03-08-2024, 10:41 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, Yesterday, 06:37 PM
                        0 responses
                        10 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, Yesterday, 06:07 PM
                        0 responses
                        9 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-22-2024, 10:03 AM
                        0 responses
                        50 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-21-2024, 07:32 AM
                        0 responses
                        67 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X