View Single Post
Old 10-23-2012, 08:17 AM   #3
upendra_35
Senior Member
 
Location: USA

Join Date: Apr 2010
Posts: 102
Default

Quote:
Originally Posted by Hobbe View Post
The names are taken from your genome fasta file, not the reference gff file. This is of course logical, since the junctions are results from your mapping of reads to the genome. Seems tophat finds junctions on scaffolds that have no information in your reference gff.

Or did I not understand your question?
Thanks Hobbe for the response. I just checked my fasta file and i could find the names in there. It does mean now that my gff is not complete. Do you know is there a way to get a complete gff (probably based on RNAseq data?). I got this from the Brassica genome annotation guys.

I have one other related question regarding junctions.bed file. Can i use this file to tell if a gene is fused or not compared to gff (assuming the gff is complete).

After looking at the tophat bam file and transcript.gtf along with gff (reference) file on IGV i found that some of the annotated genes are fused and some are not fused (i.e a single gene in transcript.gtf is reported as two genes in reference gff and sometimes a fused gene (2 genes) in transcript.gtf is reported as single gene in reference gff). All i want to know is how many of these discrepencies exist in reference annotation (gff) compared to cufflink transcripts.

Any ideas
upendra_35 is offline   Reply With Quote