Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to make sense of Tophat's output file 'junctions.bed'

    This is an excerpt from junctions.bed, Tophat's output file generated using paired-end reads. Can somebody suggest how to make sense of the two bed blocks? Both bed blocks have the same coordinates. Besides, how to infer the scores (apparently, which represent the number of alignments spanning the junctions)


    chr20 9353709 9360718 JUNC00000552 2 + 9353709 9360718 255,0,0 2 42,18 0,6991
    chr20 9365023 9368124 JUNC00000553 1 + 9365023 9368124 255,0,0 2 35,15 0,3086
    chr20 9368172 9370544 JUNC00000554 2 + 9368172 9370544 255,0,0 2 31,19 0,2353
    chr20 9371222 9374262 JUNC00000555 7 + 9371222 9374262 255,0,0 2 40,28 0,3012
    chr20 9374285 9376179 JUNC00000556 1 + 9374285 9376179 255,0,0 2 40,10 0,1884
    chr20 9376224 9382178 JUNC00000557 5 + 9376224 9382178 255,0,0 2 41,42 0,5912
    chr20 9385955 9388573 JUNC00000558 1 + 9385955 9388573 255,0,0 2 40,10 0,2608
    chr20 9388666 9389312 JUNC00000559 4 + 9388666 9389312 255,0,0 2 39,33 0,613
    chr20 9389328 9389741 JUNC00000560 6 + 9389328 9389741 255,0,0 2 36,38 0,375
    chr20 9389783 9391703 JUNC00000561 3 + 9389783 9391703 255,0,0 2 45,20 0,1900
    Gaurav Singhal

  • #2
    Explanation of junctions.bed

    [seqname] [start] [end] [id] [score] [strand] [thickStart] [thickEnd] [r,g,b] [block_count] [block_sizes] [block_locations]
    "start" is the start position of the leftmost read that contains the junction.
    "end" is the end position of the rightmost read that contains the junction.
    "id" is the junctions id, e.g. JUNC0001
    "score" is the number of reads that contain the junction.
    "strand" is either + or -.
    "thickStart" and "thickEnd" don't seem to have any effect on display for a junctions track. TopHat sets them as equal to start and end respectively.
    "r","g" and "b" are the red, green, and blue values. They affect the colour of the display.
    "block_count", "block_sizes" and "block_locations":
    The block_count will always be 2. The two blocks specify the regions on either side of the junction. "block_sizes" tells you how large each region is, and "block_locations" tells you, relative to the "start" being 0, where the two blocks occur. Therefore, the first block_location will always be zero.

    [read_start][junction][read_end]
    [block1 ][ ][block2]

    Comment


    • #3
      Hi,

      I don't quite understand the block_sizes and block_locations fields. What I get but I think I'm wring is that the block_sizes field indicates the size of the 2 exons a,b (blocks) joined by the spliced junction?

      And the block_locations field would indicate the position relative to the junction (feature) start position where the 2 exons a,b each begin? But this really makes no sense to me as this would mean that [as the first value of this field is 0] the first exon starts right where the splice junction begins, which is actually where it (the exon) ends.

      Thanks for sharing your knowledge,

      Carmen

      Comment


      • #4
        Try IGV

        Easiest way to understand this output is to load it into IGV, Broad Institute's Integrated Genome Viewer. You can then compare the values with what shows on the screen, try changing them to see what effect it has, etc.

        Cheers,

        Alex

        Comment


        • #5
          use cufflinks or cuffdiff to get the gene expression value?

          I used tophat cufflinks and cuffdiff to analysis my mRNA sequencing data, I am confused about the gene expression value. We have 7 samples in my expreiment, I can used cufflinks to produce every gene's expression value(FPKM) in each stage , and I can also used cuffdiff to get the gene's expression value by running cuffdiff with 7 samples together. But the gene's expression value produced by cufflinks and cuffdiff is not the same, so could you give me a instruction about that. Thank you.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM
          • seqadmin
            Techniques and Challenges in Conservation Genomics
            by seqadmin



            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

            Avian Conservation
            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
            03-08-2024, 10:41 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 06:37 PM
          0 responses
          8 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, Yesterday, 06:07 PM
          0 responses
          8 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-22-2024, 10:03 AM
          0 responses
          49 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-21-2024, 07:32 AM
          0 responses
          67 views
          0 likes
          Last Post seqadmin  
          Working...
          X