-   -   How to make sense of Tophat's output file 'junctions.bed' (http://seqanswers.com/forums/showthread.php?t=10565)

 gsinghal 04-05-2011 01:16 PM

How to make sense of Tophat's output file 'junctions.bed'

This is an excerpt from junctions.bed, Tophat's output file generated using paired-end reads. Can somebody suggest how to make sense of the two bed blocks? Both bed blocks have the same coordinates. Besides, how to infer the scores (apparently, which represent the number of alignments spanning the junctions)

chr20 9353709 9360718 JUNC00000552 2 + 9353709 9360718 255,0,0 2 42,18 0,6991
chr20 9365023 9368124 JUNC00000553 1 + 9365023 9368124 255,0,0 2 35,15 0,3086
chr20 9368172 9370544 JUNC00000554 2 + 9368172 9370544 255,0,0 2 31,19 0,2353
chr20 9371222 9374262 JUNC00000555 7 + 9371222 9374262 255,0,0 2 40,28 0,3012
chr20 9374285 9376179 JUNC00000556 1 + 9374285 9376179 255,0,0 2 40,10 0,1884
chr20 9376224 9382178 JUNC00000557 5 + 9376224 9382178 255,0,0 2 41,42 0,5912
chr20 9385955 9388573 JUNC00000558 1 + 9385955 9388573 255,0,0 2 40,10 0,2608
chr20 9388666 9389312 JUNC00000559 4 + 9388666 9389312 255,0,0 2 39,33 0,613
chr20 9389328 9389741 JUNC00000560 6 + 9389328 9389741 255,0,0 2 36,38 0,375
chr20 9389783 9391703 JUNC00000561 3 + 9389783 9391703 255,0,0 2 45,20 0,1900

 Alex124 02-28-2012 06:50 PM

Explanation of junctions.bed

[seqname] [start] [end] [id] [score] [strand] [thickStart] [thickEnd] [r,g,b] [block_count] [block_sizes] [block_locations]
"start" is the start position of the leftmost read that contains the junction.
"end" is the end position of the rightmost read that contains the junction.
"id" is the junctions id, e.g. JUNC0001
"score" is the number of reads that contain the junction.
"strand" is either + or -.
"thickStart" and "thickEnd" don't seem to have any effect on display for a junctions track. TopHat sets them as equal to start and end respectively.
"r","g" and "b" are the red, green, and blue values. They affect the colour of the display.
"block_count", "block_sizes" and "block_locations":
The block_count will always be 2. The two blocks specify the regions on either side of the junction. "block_sizes" tells you how large each region is, and "block_locations" tells you, relative to the "start" being 0, where the two blocks occur. Therefore, the first block_location will always be zero.

[block1 ][ ][block2]

 carmeyeii 08-29-2012 12:26 PM

Hi,

I don't quite understand the block_sizes and block_locations fields. What I get but I think I'm wring is that the block_sizes field indicates the size of the 2 exons a,b (blocks) joined by the spliced junction?

And the block_locations field would indicate the position relative to the junction (feature) start position where the 2 exons a,b each begin? But this really makes no sense to me as this would mean that [as the first value of this field is 0] the first exon starts right where the splice junction begins, which is actually where it (the exon) ends. :confused::confused:

Carmen

 Alex124 08-29-2012 08:27 PM

Try IGV

Easiest way to understand this output is to load it into IGV, Broad Institute's Integrated Genome Viewer. You can then compare the values with what shows on the screen, try changing them to see what effect it has, etc.

Cheers,

Alex

 xiongdianguang 09-03-2012 07:49 AM

use cufflinks or cuffdiff to get the gene expression value？

I used tophat cufflinks and cuffdiff to analysis my mRNA sequencing data, I am confused about the gene expression value. We have 7 samples in my expreiment, I can used cufflinks to produce every gene's expression value(FPKM) in each stage , and I can also used cuffdiff to get the gene's expression value by running cuffdiff with 7 samples together. But the gene's expression value produced by cufflinks and cuffdiff is not the same, so could you give me a instruction about that. Thank you.
:confused:

 All times are GMT -8. The time now is 10:13 PM.