SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Tophat: report *only* novel splice junctions? sdarko RNA Sequencing 14 04-10-2013 04:51 PM
TopHat 1.3.1 non-canonical splice junctions Irina Pulyakhina Bioinformatics 1 04-10-2013 04:49 PM
GSNAP Output splice junctions ChrisAU RNA Sequencing 0 01-23-2012 04:59 AM
No Splice Junctions Found with Galaxy TopHat Alignment? ERScott Bioinformatics 0 11-15-2010 09:04 PM
TopHat:Splice Junctions jrober04 Bioinformatics 2 05-15-2009 05:42 PM

Reply
 
Thread Tools
Old 08-27-2010, 08:06 AM   #1
CompBio
Member
 
Location: Bristol, UK

Join Date: Aug 2009
Posts: 26
Default TopHat Misses Splice Junctions

I'm currently running a comparison between TopHat and other methods for finding splicing patterns in RNA Seq data. I've got about 250 million 32-nt reads from Arabidopsis. For the genes I am interested in, using other methods I have been able to align reads across many known and novel splice junctions at 100% ID that are unique to these locations (i.e., no multi-hits), with at least 8nt on either side of each junction (anchor regions). However, I am unable to get TopHat to find these junctions.

These are the TopHat parameters I'm using:

-g 1 # report only unique hits
-F 0.01 # report even poorly-represented junctions
--segment-mismatches=0 # enforce 100% ID for all reads
--splice-mismatches=0 # enforce 100% ID across junctions
--min-coverage-intron=10 # minimum allowed intron size for Arabidopsis
--max-coverage-intron=11000 # maximum intron size found in Arabidopsis
-i 10 # (as above)
-I 11000 # (as above)
--min-segment-intron=10 # (as above)
--max-segment-intron=11000 # (as above)
-j TAIR9_GFF3_genes.juncs # pre-processed splice junctions from gene model
-a 8 # minimum overlap/anchor of 8nt
-p 4 # allowing up to 4 threads on 8-processor machines

I can relax the constraint on splice mismatches to see if it helps, but ultimately I would like TopHat to find the junctions at 100% ID that I've already seen using other methods. I would like to make my comparison as fair as possible.

Any ideas? Am I misinterpreting/misusing any of these parameters?
CompBio is offline   Reply With Quote
Old 08-27-2010, 08:44 AM   #2
IrisZhu
Member
 
Location: Maryland

Join Date: Jul 2010
Posts: 25
Default

Quote:
Originally Posted by CompBio View Post
I'm currently running a comparison between TopHat and other methods for finding splicing patterns in RNA Seq data. I've got about 250 million 32-nt reads from Arabidopsis. For the genes I am interested in, using other methods I have been able to align reads across many known and novel splice junctions at 100% ID that are unique to these locations (i.e., no multi-hits), with at least 8nt on either side of each junction (anchor regions). However, I am unable to get TopHat to find these junctions.

These are the TopHat parameters I'm using:

-g 1 # report only unique hits
-F 0.01 # report even poorly-represented junctions
--segment-mismatches=0 # enforce 100% ID for all reads
--splice-mismatches=0 # enforce 100% ID across junctions
--min-coverage-intron=10 # minimum allowed intron size for Arabidopsis
--max-coverage-intron=11000 # maximum intron size found in Arabidopsis
-i 10 # (as above)
-I 11000 # (as above)
--min-segment-intron=10 # (as above)
--max-segment-intron=11000 # (as above)
-j TAIR9_GFF3_genes.juncs # pre-processed splice junctions from gene model
-a 8 # minimum overlap/anchor of 8nt
-p 4 # allowing up to 4 threads on 8-processor machines

I can relax the constraint on splice mismatches to see if it helps, but ultimately I would like TopHat to find the junctions at 100% ID that I've already seen using other methods. I would like to make my comparison as fair as possible.

Any ideas? Am I misinterpreting/misusing any of these parameters?
Could the reason be your "-g 1"? Is "1" too small.
I am always a bit confused about this "-g", coz the default is 40 --- I can't imagine it will keep all the reads that mapped to less than 40 places in the genome.
IrisZhu is offline   Reply With Quote
Old 08-27-2010, 08:58 AM   #3
anecsulea
Member
 
Location: Lausanne

Join Date: Dec 2009
Posts: 12
Default

Hi,

I don't know very much about Arabidopsis biology, but in mammals there is one class of genes for which TopHat is unable to find the junctions: the genes that generate duplicates through retrotransposition. This is because these duplicate copies are intronless, and thus the reads that
map on the splice junctions of the parental genes also map perfectly on the genome in the duplicated region. Since TopHat only uses reads that were not mapped on the genome in order to detect junctions, for this class of genes the junctions will not be detected.

So it might be worth to check if these genes have (recent) retrocopies in Arabidposis.

Hope this helps !

Best,

Anamaria


Quote:
Originally Posted by CompBio View Post
I'm currently running a comparison between TopHat and other methods for finding splicing patterns in RNA Seq data. I've got about 250 million 32-nt reads from Arabidopsis. For the genes I am interested in, using other methods I have been able to align reads across many known and novel splice junctions at 100% ID that are unique to these locations (i.e., no multi-hits), with at least 8nt on either side of each junction (anchor regions). However, I am unable to get TopHat to find these junctions.
anecsulea is offline   Reply With Quote
Old 08-27-2010, 09:01 AM   #4
CompBio
Member
 
Location: Bristol, UK

Join Date: Aug 2009
Posts: 26
Default

Good suggestion -- relaxing the mismatches requirement may not help much if I don't allow a few multi-reads. I'll apply them both and see what happens.
CompBio is offline   Reply With Quote
Old 08-27-2010, 11:17 AM   #5
john_mu
Member
 
Location: Stanford, CA

Join Date: May 2010
Posts: 88
Default

There is a new program called HMMsplicer that was designed for the Arabidopsis genome. Maybe you can give that a try.

I haven't personally used it, but they were benchmarking it against SpilceMap and TopHat before.
__________________
SpliceMap: De novo detection of splice junctions from RNA-seq
Download SpliceMap Comment here
john_mu is offline   Reply With Quote
Old 08-27-2010, 01:58 PM   #6
Lee Sam
Member
 
Location: Ann Arbor, MI

Join Date: Oct 2008
Posts: 57
Default

There is also the consideration that (to my admittedly shallow understanding) TopHat only uses the set of canonical splice signal sequences. Unfortunately, I can't recommend anything better - I was playing with the supersplat aligner which does spliced alignment but the program was very very slow and never ran to completion.
Lee Sam is offline   Reply With Quote
Old 08-27-2010, 03:11 PM   #7
CompBio
Member
 
Location: Bristol, UK

Join Date: Aug 2009
Posts: 26
Default

Quote:
Originally Posted by Lee Sam View Post
There is also the consideration that (to my admittedly shallow understanding) TopHat only uses the set of canonical splice signal sequences. Unfortunately, I can't recommend anything better - I was playing with the supersplat aligner which does spliced alignment but the program was very very slow and never ran to completion.
Actually, that's one of the approaches I'm comparing it to. Supersplat can use a lot of memory that appears to increase with the size of the query file as well as with parameter changes. If you're not careful to split up your files, your machine can spend a lot of time swapping.

As for TopHat, I'll have to dig into their documentation again to see if I can figure out how it uses the canonical sites. Another good idea -- thanks!
CompBio is offline   Reply With Quote
Old 01-17-2011, 04:34 AM   #8
BAJ
Member
 
Location: Paris

Join Date: Nov 2008
Posts: 15
Default

Hi,
I would be interested in how you actually compared the different algorithms?
You say you used an existing data set... Why don't you use a constructed one? and how do you know what is true or not?
Could you also please summarize your findings?
As I am planing to do similar things within the next few weeks I would be of course very interested in any advice you could have.

Best,
Bernd
BAJ is offline   Reply With Quote
Old 04-10-2013, 04:48 PM   #9
carmeyeii
Senior Member
 
Location: Mexico

Join Date: Mar 2011
Posts: 137
Default

Hi!

This is a somewhat old thread, but I would like to know more about the biological constraints TopHat uses to call a splice junction... is there anyway to override this?
carmeyeii is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:41 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO