Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • tophat-fusion on mouse

    Hi,
    I see that tophat-fusion-post uses these files (included in the distribution):
    ensGene.txt
    ensGtp.txt
    mcl
    refGene_sorted.txt

    But I'm not sure how to duplicate them for MM9. Can anyone instruct me? If it is in the docs I can't find it.

  • #2
    Hi,

    I'm also trying to run tophat-fusion for mm9. I see tophat-fusion-post use hardcoded blast database files, but this is easy to change. Generating these files you mention is what I haven't being able to figured out. I don't think 'mcl' is important, as is just the Mitelman Database, for easy checking of the results, but the rest are for sure important.

    Were you able to construct these files for mm9?

    Thanks,
    Carlos

    Comment


    • #3
      Hi Carlos,
      I did manage to reconstruct the files for mm9. It just required some reverse engineering.

      I just downloaded the
      ensGene.txt
      refGene.txt
      knownGene.txt
      from UCSC then made refGene_sorted.txt with this command (I don't remember the details but this worked for me)
      echo "1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X Y" | tr ' ' '\n' | xargs -i echo "awk '\$3==\"chr{}\"' refGene.txt | sort +4n -6 " | bash > refGene_sorted.txt

      Also, the ensGtp.txt file was available from the USCS website, I just had to do a little more digging.

      Comment


      • #4
        Great! I should have recognized these names. Let me just add for anyone else going through the same.

        You can get these files from the table browser at UCSC:
        http://genome.ucsc.edu/cgi-bin/hgTables

        Just make sure you select:
        output format: "all fields from selected table"

        Sorting like rcorbett mentioned above seems to work fine, I did it to produce refGene_sorted.txt and I also sorted ensGene.txt, as it seems to me the annotation file for human in the source package is sorted.

        I also filtered ensGtp.txt to keep lines containing ENSMUSP only:
        grep ENSMUSP ensGtp.txt.tmp > ensGtp.txt

        Because it failed complaining you need tree elements per line in ensGtp.txt. Filtering by ENSMUSP, seems to work, as if there is a protein id there would be a gene and transcript id.

        Now you just need to edit tophat-fusion-post to look for the right blast db, I'll be blasting against "other_genomic*" and "nt*".

        Thanks!

        Comment


        • #5
          How do you edit tophat-fusion-post to look for the right blast db. I am running tophat fusion on mouse using mm9 reference and I have finished with the tophat fusion step. I am trying to run the tophatfusion post step. I have downloaded the other_genomic* and nt* databases as well as mouse_genomic blast database. But, I detect fusions and I am not getting the blast score and the sequence alignments.?
          I get the following error : “no index or alias found for nucleotide database[blast/other_genomic] in search path [home/fusion(this is the top_dir)::]”.

          My directory format is :
          home/fusion(top_dir)/blast/nt, home/fusion/blast/other_genomic

          What am I doing wrong?.
          Any help is much appreciated.
          Thanks

          Comment


          • #6
            Confusion between Spanning reads and spanning mate pairs

            Can anybody please explain the difference between Spanning Reads and Spanning Mate pairs. As much I could understand Spanning reads are those reads which do not harbor the fusion point but Split reads do harbor it, but Spanning mate pairs are those spanning reads which are supported by their mate pairs and the number of Spanning mate pairs should be lesser than spanning reads, but this is not the case in my results, why so?

            please guide its urgent

            Comment


            • #7
              hi
              may someone tell me where can i find other_genomic* and nt* for mm9. i searched ftp://ftp.ncbi.nlm.nih.gov/blast/db/, and found only:
              1. est_mouse.tar.gz
              2. mouse_genomic_transcript.tar.gz

              where are these files (other_genomic* and nt*) ?

              when i run the tophat-fusion it says:
              blast nt now found..???
              i have downloaded blast but there is no such things like blastall ??

              expecting reply



              Originally posted by himanshu04 View Post
              How do you edit tophat-fusion-post to look for the right blast db. I am running tophat fusion on mouse using mm9 reference and I have finished with the tophat fusion step. I am trying to run the tophatfusion post step. I have downloaded the other_genomic* and nt* databases as well as mouse_genomic blast database. But, I detect fusions and I am not getting the blast score and the sequence alignments.?
              I get the following error : “no index or alias found for nucleotide database[blast/other_genomic] in search path [home/fusion(this is the top_dir)::]”.

              My directory format is :
              home/fusion(top_dir)/blast/nt, home/fusion/blast/other_genomic

              What am I doing wrong?.
              Any help is much appreciated.
              Thanks

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM
              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              18 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              22 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              16 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              47 views
              0 likes
              Last Post seqadmin  
              Working...
              X