SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to combine junctions.bed files produced by TopHat HTS Bioinformatics 8 05-03-2015 02:33 AM
Junctions in tophat jay2008 Bioinformatics 0 01-26-2012 06:23 PM
tophat - no junctions crh Bioinformatics 2 07-25-2011 11:33 AM
junctions in tophat agent99 RNA Sequencing 0 01-05-2011 09:51 AM
Abberant junctions by tophat RockChalkJayhawk Bioinformatics 3 05-05-2010 03:13 PM

Reply
 
Thread Tools
Old 06-01-2011, 06:37 PM   #1
avi
Junior Member
 
Location: Michigan

Join Date: Mar 2011
Posts: 7
Question Missing Junctions in Tophat! ( after providing a known junctions & gene models files)

I am using Tophat for aligning unpaired RNAseq reads (36 bases each) and I had a question about junction calling in Tophat. I tried running Tophat with both -j (a known junctions file from Ensembl) and -G (known gene model annotations file from Ensembl) and also ran Cufflinks on the results.

In both of the runs, if I compare the results with the actual known transcripts from Ensembl, it seems like I am missing many known junctions.

To Illustrate, say if Ensembl has a Transcript with say 5 exons. Cufflinks annotates the same region as containing 3 transcripts (Exon 1-2 as Transcript 1, Exon 3-4 as Transcript 2 and Exon 5 as Transcript 3).

If I have understood the paper correctly, this is because even though Tophat has the coordinates of the known junctions it didn't find enough IUM reads spanning those particular junctions (under the given parameters) to allow it to call it junction? Is that correct..?

Under that assumption, I tried running Tophat with --min-anchor-length = 5 (reduced from default 8) and --min-isoform-fraction 0.1 (reduced from default 0.1). But I didn't get any improvement in finding more junctions. (The junctions.bed file has the exact same number of lines)

Does anyone have any suggestions on what else I can try to improve junction calling?

Also, given the Genomic coordinates of a splice junction, is there a way I can extract, from the Tophat output, the no of IUM (Initially Unmapped reads) that Tophat mapped to span that particular junction?

thanks,

Avinash
avi is offline   Reply With Quote
Old 06-01-2011, 08:49 PM   #2
shurjo
Senior Member
 
Location: Rockville, MD

Join Date: Jan 2009
Posts: 126
Default

Hi Avinash,

Quote:
In both of the runs, if I compare the results with the actual known transcripts from Ensembl, it seems like I am missing many known junctions.
For any one tissue type, a certain (possibly substantial) fraction of the known junctions will not be present simply due to tissue-specific expression of different isoforms. As such, I wouldn't worry about this part of your question.

Quote:
Also, given the Genomic coordinates of a splice junction, is there a way I can extract, from the Tophat output, the no of IUM (Initially Unmapped reads) that Tophat mapped to span that particular junction?
I doubt this is possible, unless you are MUCH better than hacking into the code and the tmp files than I am.

Best of luck,

Shurjo
shurjo is offline   Reply With Quote
Old 08-02-2011, 04:21 PM   #3
agent99
Member
 
Location: San Francisco

Join Date: Jul 2010
Posts: 10
Default

You could try reducing the segment-length to 1/2 of your read length. I believe that you do not get any mapped splice junctions if your segment-length is greater than 1/2 the read length.
agent99 is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:41 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO