View Single Post
Old 06-01-2011, 06:37 PM   #1
avi
Junior Member
 
Location: Michigan

Join Date: Mar 2011
Posts: 7
Question Missing Junctions in Tophat! ( after providing a known junctions & gene models files)

I am using Tophat for aligning unpaired RNAseq reads (36 bases each) and I had a question about junction calling in Tophat. I tried running Tophat with both -j (a known junctions file from Ensembl) and -G (known gene model annotations file from Ensembl) and also ran Cufflinks on the results.

In both of the runs, if I compare the results with the actual known transcripts from Ensembl, it seems like I am missing many known junctions.

To Illustrate, say if Ensembl has a Transcript with say 5 exons. Cufflinks annotates the same region as containing 3 transcripts (Exon 1-2 as Transcript 1, Exon 3-4 as Transcript 2 and Exon 5 as Transcript 3).

If I have understood the paper correctly, this is because even though Tophat has the coordinates of the known junctions it didn't find enough IUM reads spanning those particular junctions (under the given parameters) to allow it to call it junction? Is that correct..?

Under that assumption, I tried running Tophat with --min-anchor-length = 5 (reduced from default 8) and --min-isoform-fraction 0.1 (reduced from default 0.1). But I didn't get any improvement in finding more junctions. (The junctions.bed file has the exact same number of lines)

Does anyone have any suggestions on what else I can try to improve junction calling?

Also, given the Genomic coordinates of a splice junction, is there a way I can extract, from the Tophat output, the no of IUM (Initially Unmapped reads) that Tophat mapped to span that particular junction?

thanks,

Avinash
avi is offline   Reply With Quote