![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Tophat2.03: error | mrfox | Bioinformatics | 6 | 08-07-2013 06:09 AM |
Tophat2 with fusion search and tophat-fusion-post problems | seqfast | Bioinformatics | 9 | 07-30-2013 07:16 PM |
tophat2 error | Xi Wang | Bioinformatics | 13 | 12-21-2012 07:36 AM |
tophat2 segment_juncs error: Error: segment-based junction search failed with err =-6 | hulan0@gmail.com | Bioinformatics | 1 | 04-16-2012 07:37 AM |
TopHat closure based search and coverage based search | tasteandsee | Bioinformatics | 1 | 03-27-2012 02:47 AM |
![]() |
|
Thread Tools |
![]() |
#1 | |
Member
Location: Sydney, Australia Join Date: Jan 2012
Posts: 61
|
![]()
Hi!
I'm a bit confused: when should one use tophat2's coverage search? Is there a logic in leaving it off/on for 100bp PE reads, or is this dictated solely by the computational resources one has available? Overall, what is YOUR standard practice with using this option? I have seen the manual, which states: Quote:
Also, for human, how much sense does it make to use the microexon search option??? Thanks in advance! |
|
![]() |
![]() |
![]() |
#2 |
Junior Member
Location: Sweden Join Date: Aug 2012
Posts: 4
|
![]()
I am interested in this question as well, does anyone have a good answer?
|
![]() |
![]() |
![]() |
#3 |
Member
Location: Irvine CA, USA Join Date: Dec 2009
Posts: 29
|
![]()
I would also REALLY like to hear an answer on this. What am I giving up if I opt for a --no-coverage-search?
__________________
In science, "fact" can only mean "confirmed to such a degree that it would be perverse to withhold provisional assent." I suppose that apples might start to rise tomorrow, but the possibility does not merit equal time in physics classrooms. --Stephen Jay Gould |
![]() |
![]() |
![]() |
#4 |
Member
Location: Irvine CA, USA Join Date: Dec 2009
Posts: 29
|
![]()
Sorry to simply provide a link here but since it was biostars.org that provided the answer, not seqanswers, I felt it was appropriate to give that site the credit.
Here is a thread that provides a discussion on this topic. I make no claims on its validity, but I found it useful to read. http://www.biostars.org/p/49224/
__________________
In science, "fact" can only mean "confirmed to such a degree that it would be perverse to withhold provisional assent." I suppose that apples might start to rise tomorrow, but the possibility does not merit equal time in physics classrooms. --Stephen Jay Gould |
![]() |
![]() |
![]() |
#5 | |
Member
Location: Sydney, Australia Join Date: Jan 2012
Posts: 61
|
![]()
Thanks for the useful link, though I disagree with the interpretation provided by the biostars poster!
From the tophat manual: Quote:
Hi! The identification of new splice sites in different genes/transcripts is still possible without coverage search! Coverage search is, according to the manual, only useful when you've got very short reads, since in this case the probability that the read will "hit" the splice junction exactly may be very low for relatively lowly expressed transcripts. Hence, you need another way of detecting splice sites, which is where coverage search comes in. To make it easier for the algorithm by using coverage search you are allowing for only the most canonical of GT-AG splice junctions (only in this latter step; you'll get the GC-AG and AT-AC junctions that are supported by reads). So the resume is: coverage search should be left off for "modern" Illumina data. Last edited by dvanic; 02-18-2013 at 06:20 PM. Reason: correcting interpretation error |
|
![]() |
![]() |
![]() |
#6 |
Member
Location: Irvine CA, USA Join Date: Dec 2009
Posts: 29
|
![]()
Wow. I am so thankful for your response. And finally, I think I have enough to make a decision on my runs... Unfortunatly, I think I am going to have to re-run many of them with the coverage search off but, THANKFULLY they should take much less time!
Gus
__________________
In science, "fact" can only mean "confirmed to such a degree that it would be perverse to withhold provisional assent." I suppose that apples might start to rise tomorrow, but the possibility does not merit equal time in physics classrooms. --Stephen Jay Gould |
![]() |
![]() |
![]() |
#7 |
Member
Location: Sweden Join Date: Nov 2009
Posts: 23
|
![]()
But I understood the manual so that it first looks for splice sites based on reads overlapping several places using all the different ("GT-AG", "GC-AG" and "AT-AC") splice sites and that the coverage-search then added _more_ junctions to this. Not that coverage search restircted the junctions to over GT-AG introns. Hence with longer reads the return/pay-back of coverage search is diminished but it still adds information.
|
![]() |
![]() |
![]() |
#8 | |
Member
Location: Sydney, Australia Join Date: Jan 2012
Posts: 61
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#9 |
Member
Location: Sweden Join Date: Nov 2009
Posts: 23
|
![]()
I see the point in leaving --coverage-search off, especially since the samples I'm running at the moment have been stuck at this point for >3 days (2*101bp, ~40-50 million reads). I don't agree with the information that long reads should be sufficient in them selfs though. This since even if the chance of covering an exon/exon boundary is increased with the length you will still have a chance. For the genes with a low expression this might not be sufficient hence you'll get more junctions with --coverage-search.
Also the cost per experiment vs the extra (hopefully one time) alignment time, the experiments are expensive and I want the most from my data. But we'll se how long it takes and if I can use the server so much. |
![]() |
![]() |
![]() |
#10 | |
Member
Location: Sydney, Australia Join Date: Jan 2012
Posts: 61
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#11 |
Member
Location: Sweden Join Date: Nov 2009
Posts: 23
|
![]()
I'm afraid I don't understand your point. All junctions/transcripts with a low number of reads are going to be hard to reconstruct. My thought is that by using coverage_search you'll get more reads mapping to junctions which then will move some transcripts from the "to few" bin to the "just enough" bin when it comes to number of mapped reads. This then with regards to reads mapping to junctions especially since you always (in my experience at least) will have more reads mapped to the gene in comparison to the junction.
So I'm currently comparing the output from ~70 samples +/- coverage_search to see if I'll benefit from the 4x mapping time that coverage_search takes. |
![]() |
![]() |
![]() |
#12 | ||
Member
Location: Sydney, Australia Join Date: Jan 2012
Posts: 61
|
![]()
My point is that median exon length in human is quite close to 100 nucleotides, and I work with 100bp PE reads.
So if I haven't managed to "hit" an exon junction with at least one read how likely is it that I will have enough coverage across the entire gene to be able to predict exons accurately? How do I prevent spurious reconstruction of transcripts and exon boundaries because of how lowly the gene is expressed? How many real single exons will be split into more than one exon because of low coverage or regions in them that have low mappability, for example due to repeats? And how do I filter these out? Quote:
Quote:
|
||
![]() |
![]() |
![]() |
#13 |
Member
Location: Sweden Join Date: Nov 2009
Posts: 23
|
![]()
Firstly, I thought that the coverage search defined new exons as based on coverage piles and that it then tried to map reads to the exons and junctions between all such piles. Hence reads that previously would have gotten a map somewhere else could be remapped to a junction between two defined exons.
Regarding all the other questions, well that's something to look in to. I know that I get more reads mapped from our initial investigation comparing between coverage/non coverage. If this is good maps or spurious maps I'll see later on. |
![]() |
![]() |
![]() |
#14 | |
Member
Location: Newcastle upon Tyne Join Date: Aug 2009
Posts: 18
|
![]() Quote:
Thanks, |
|
![]() |
![]() |
![]() |
Thread Tools | |
|
|