SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to combine junctions.bed files produced by TopHat HTS Bioinformatics 8 05-03-2015 03:33 AM
Tophat junctions.bed RockChalkJayhawk RNA Sequencing 7 12-12-2013 11:56 AM
How to make sense of Tophat's output file 'junctions.bed' gsinghal RNA Sequencing 4 09-03-2012 07:49 AM
Trouble getting TopHat to work -- empty junctions.bed thurisaz RNA Sequencing 6 12-01-2011 12:13 PM
tophat junctions.bed MerFer Bioinformatics 0 06-16-2010 03:57 AM

Reply
 
Thread Tools
Old 10-22-2012, 10:42 PM   #1
upendra_35
Senior Member
 
Location: USA

Join Date: Apr 2010
Posts: 102
Default tophat junctions.bed file

I recently found that my junctions.bed file contained names that are not found in gff reference. How does it happen?

PHP Code:
upendra_35@vm142-14 tophat_out_3_7_8_lanes]$ head junctions.bed 
track name
=junctions description="TopHat junctions"
Scaffold006725    29    277    JUNC00000001    34    +    29    277    255,0,0    2    90,90    0,158
Scaffold006725    31    254    JUNC00000002    2    
+    31    254    255,0,0    2    88,55    0,168
Scaffold007604    1    292    JUNC00000003    27    
-    1    292    255,0,0    2    79,66    0,225
Scaffold007614    50    255    JUNC00000004    54    
+    50    255    255,0,0    2    90,35    0,170
Scaffold006711    38    322    JUNC00000005    39    
-    38    322    255,0,0    2    89,82    0,202
Scaffold007629    81    293    JUNC00000006    8    
-    81    293    255,0,0    2    70,56    0,156
Scaffold006763    96    316    JUNC00000007    7    
-    96    316    255,0,0    2    90,52    0,168
Scaffold007639    84    292    JUNC00000008    7    
-    84    292    255,0,0    2    82,56    0,152
Scaffold007736    14    230    JUNC00000009    6    
-    14    230    255,0,0    2    44,86    0,130 
PHP Code:
[upendra_35@vm142-14 tophat_out_3_7_8_lanes]$ tail /mydata/B.rapa_gene_model_0830.gff 
Scaffold004047    glean    CDS    11    33    
.    +    0    Parent=Bra041170;
Scaffold004047    glean    CDS    123    321    .    +    2    Parent=Bra041170;
Scaffold004813    glean    mRNA    190    414    0.998901    -    .    ID=Bra041171;
Scaffold004813    glean    CDS    190    414    .    -    0    Parent=Bra041171;
Scaffold004894    glean    mRNA    3    410    1    +    .    ID=Bra041172;
Scaffold004894    glean    CDS    3    410    .    +    0    Parent=Bra041172;
Scaffold005112    blat    mRNA    131    295    1.0000    +    .    ID=Bra041173;
Scaffold005112    blat    CDS    131    295    100    +    .    Parent=Bra041173;
Scaffold008211    glean    mRNA    18    251    0.970334    +    .    ID=Bra041174;
Scaffold008211    glean    CDS    18    251    .    +    0    Parent=Bra041174
upendra_35 is offline   Reply With Quote
Old 10-23-2012, 01:15 AM   #2
Hobbe
Member
 
Location: Uppsala, Sweden

Join Date: Apr 2010
Posts: 29
Default

The names are taken from your genome fasta file, not the reference gff file. This is of course logical, since the junctions are results from your mapping of reads to the genome. Seems tophat finds junctions on scaffolds that have no information in your reference gff.

Or did I not understand your question?
Hobbe is offline   Reply With Quote
Old 10-23-2012, 08:17 AM   #3
upendra_35
Senior Member
 
Location: USA

Join Date: Apr 2010
Posts: 102
Default

Quote:
Originally Posted by Hobbe View Post
The names are taken from your genome fasta file, not the reference gff file. This is of course logical, since the junctions are results from your mapping of reads to the genome. Seems tophat finds junctions on scaffolds that have no information in your reference gff.

Or did I not understand your question?
Thanks Hobbe for the response. I just checked my fasta file and i could find the names in there. It does mean now that my gff is not complete. Do you know is there a way to get a complete gff (probably based on RNAseq data?). I got this from the Brassica genome annotation guys.

I have one other related question regarding junctions.bed file. Can i use this file to tell if a gene is fused or not compared to gff (assuming the gff is complete).

After looking at the tophat bam file and transcript.gtf along with gff (reference) file on IGV i found that some of the annotated genes are fused and some are not fused (i.e a single gene in transcript.gtf is reported as two genes in reference gff and sometimes a fused gene (2 genes) in transcript.gtf is reported as single gene in reference gff). All i want to know is how many of these discrepencies exist in reference annotation (gff) compared to cufflink transcripts.

Any ideas
upendra_35 is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:55 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO