Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • gsinghal
    Junior Member
    • Jul 2010
    • 2

    How to make sense of Tophat's output file 'junctions.bed'

    This is an excerpt from junctions.bed, Tophat's output file generated using paired-end reads. Can somebody suggest how to make sense of the two bed blocks? Both bed blocks have the same coordinates. Besides, how to infer the scores (apparently, which represent the number of alignments spanning the junctions)


    chr20 9353709 9360718 JUNC00000552 2 + 9353709 9360718 255,0,0 2 42,18 0,6991
    chr20 9365023 9368124 JUNC00000553 1 + 9365023 9368124 255,0,0 2 35,15 0,3086
    chr20 9368172 9370544 JUNC00000554 2 + 9368172 9370544 255,0,0 2 31,19 0,2353
    chr20 9371222 9374262 JUNC00000555 7 + 9371222 9374262 255,0,0 2 40,28 0,3012
    chr20 9374285 9376179 JUNC00000556 1 + 9374285 9376179 255,0,0 2 40,10 0,1884
    chr20 9376224 9382178 JUNC00000557 5 + 9376224 9382178 255,0,0 2 41,42 0,5912
    chr20 9385955 9388573 JUNC00000558 1 + 9385955 9388573 255,0,0 2 40,10 0,2608
    chr20 9388666 9389312 JUNC00000559 4 + 9388666 9389312 255,0,0 2 39,33 0,613
    chr20 9389328 9389741 JUNC00000560 6 + 9389328 9389741 255,0,0 2 36,38 0,375
    chr20 9389783 9391703 JUNC00000561 3 + 9389783 9391703 255,0,0 2 45,20 0,1900
    Gaurav Singhal
  • Alex124
    Junior Member
    • Feb 2012
    • 2

    #2
    Explanation of junctions.bed

    [seqname] [start] [end] [id] [score] [strand] [thickStart] [thickEnd] [r,g,b] [block_count] [block_sizes] [block_locations]
    "start" is the start position of the leftmost read that contains the junction.
    "end" is the end position of the rightmost read that contains the junction.
    "id" is the junctions id, e.g. JUNC0001
    "score" is the number of reads that contain the junction.
    "strand" is either + or -.
    "thickStart" and "thickEnd" don't seem to have any effect on display for a junctions track. TopHat sets them as equal to start and end respectively.
    "r","g" and "b" are the red, green, and blue values. They affect the colour of the display.
    "block_count", "block_sizes" and "block_locations":
    The block_count will always be 2. The two blocks specify the regions on either side of the junction. "block_sizes" tells you how large each region is, and "block_locations" tells you, relative to the "start" being 0, where the two blocks occur. Therefore, the first block_location will always be zero.

    [read_start][junction][read_end]
    [block1 ][ ][block2]

    Comment

    • carmeyeii
      Senior Member
      • Mar 2011
      • 137

      #3
      Hi,

      I don't quite understand the block_sizes and block_locations fields. What I get but I think I'm wring is that the block_sizes field indicates the size of the 2 exons a,b (blocks) joined by the spliced junction?

      And the block_locations field would indicate the position relative to the junction (feature) start position where the 2 exons a,b each begin? But this really makes no sense to me as this would mean that [as the first value of this field is 0] the first exon starts right where the splice junction begins, which is actually where it (the exon) ends.

      Thanks for sharing your knowledge,

      Carmen

      Comment

      • Alex124
        Junior Member
        • Feb 2012
        • 2

        #4
        Try IGV

        Easiest way to understand this output is to load it into IGV, Broad Institute's Integrated Genome Viewer. You can then compare the values with what shows on the screen, try changing them to see what effect it has, etc.

        Cheers,

        Alex

        Comment

        • xiongdianguang
          Junior Member
          • Apr 2012
          • 9

          #5
          use cufflinks or cuffdiff to get the gene expression value?

          I used tophat cufflinks and cuffdiff to analysis my mRNA sequencing data, I am confused about the gene expression value. We have 7 samples in my expreiment, I can used cufflinks to produce every gene's expression value(FPKM) in each stage , and I can also used cuffdiff to get the gene's expression value by running cuffdiff with 7 samples together. But the gene's expression value produced by cufflinks and cuffdiff is not the same, so could you give me a instruction about that. Thank you.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Pathogen Surveillance with Advanced Genomic Tools
            by seqadmin




            The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
            03-24-2025, 11:48 AM
          • seqadmin
            New Genomics Tools and Methods Shared at AGBT 2025
            by seqadmin


            This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

            The Headliner
            The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
            03-03-2025, 01:39 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 03-20-2025, 05:03 AM
          0 responses
          44 views
          0 reactions
          Last Post seqadmin  
          Started by seqadmin, 03-19-2025, 07:27 AM
          0 responses
          53 views
          0 reactions
          Last Post seqadmin  
          Started by seqadmin, 03-18-2025, 12:50 PM
          0 responses
          39 views
          0 reactions
          Last Post seqadmin  
          Started by seqadmin, 03-03-2025, 01:15 PM
          0 responses
          194 views
          0 reactions
          Last Post seqadmin  
          Working...