Seqanswers Leaderboard Ad

**magbju** · 08-31-2012, 05:41 AM

I am interested in this question as well, does anyone have a good answer?

**Gus** · 02-01-2013, 09:40 AM

I would also REALLY like to hear an answer on this. What am I giving up if I opt for a --no-coverage-search?

**Gus** · 02-01-2013, 09:46 AM

Sorry to simply provide a link here but since it was biostars.org that provided the answer, not seqanswers, I felt it was appropriate to give that site the credit.

Here is a thread that provides a discussion on this topic. I make no claims on its validity, but I found it useful to read.

Coverage-Based Search In Tophat

http://www.biostars.org/p/49224/

**dvanic** · 02-03-2013, 01:02 AM

Thanks for the useful link, though I disagree with the interpretation provided by the biostars poster!

From the tophat manual:

The first and strongest source of evidence for a splice junction is when two segments from the same read (for reads of at least 45bp) are mapped at a certain distance on the same genomic sequence or when an internal segment fails to map - again suggesting that such reads are spanning multiple exons. With this approach, "GT-AG", "GC-AG" and "AT-AC" introns will be found ab initio. The second source is pairings of "coverage islands", which are distinct regions of piled up reads in the initial mapping. Neighboring islands are often spliced together in the transcriptome, so TopHat looks for ways to join these with an intron. We only suggest users use this second option (--coverage-search) for short reads (< 45bp) and with a small number of reads (<= 10 million). This latter option will only report alignments across "GT-AG" introns

I've responded to this on biostars, but to repost here:
Hi! The identification of new splice sites in different genes/transcripts is still possible without coverage search!

Coverage search is, according to the manual, only useful when you've got very short reads, since in this case the probability that the read will "hit" the splice junction exactly may be very low for relatively lowly expressed transcripts. Hence, you need another way of detecting splice sites, which is where coverage search comes in. To make it easier for the algorithm by using coverage search you are allowing for only the most canonical of GT-AG splice junctions (only in this latter step; you'll get the GC-AG and AT-AC junctions that are supported by reads).

So the resume is: coverage search should be left off for "modern" Illumina data.

**Gus** · 02-03-2013, 09:14 AM

Wow. I am so thankful for your response. And finally, I think I have enough to make a decision on my runs... Unfortunatly, I think I am going to have to re-run many of them with the coverage search off but, THANKFULLY they should take much less time!

Gus

**pettervikman** · 02-18-2013, 01:13 AM

But I understood the manual so that it first looks for splice sites based on reads overlapping several places using all the different ("GT-AG", "GC-AG" and "AT-AC") splice sites and that the coverage-search then added _more_ junctions to this. Not that coverage search restircted the junctions to over GT-AG introns. Hence with longer reads the return/pay-back of coverage search is diminished but it still adds information.

**dvanic** · 02-18-2013, 06:18 PM

Originally posted by pettervikman View Post

But I understood the manual so that it first looks for splice sites based on reads overlapping several places using all the different ("GT-AG", "GC-AG" and "AT-AC") splice sites and that the coverage-search then added _more_ junctions to this. Not that coverage search restircted the junctions to over GT-AG introns. Hence with longer reads the return/pay-back of coverage search is diminished but it still adds information.

Hi! Yes, you're right, thank you for catching that. However, I would still argue that coverage search should be left off for longer Illumina reads and mammalian (human, mouse) transcriptomes: the median exon length in humans is ~150 nucleotides, so if you have PE 100 reads you should have some reads cross the splice junctions... I'm not sure how much I would trust novel junctions that are only supported by coverage and not by reads directly, not to mention the additional computational time it takes.

**pettervikman** · 02-21-2013, 03:57 AM

I see the point in leaving --coverage-search off, especially since the samples I'm running at the moment have been stuck at this point for >3 days (2*101bp, ~40-50 million reads). I don't agree with the information that long reads should be sufficient in them selfs though. This since even if the chance of covering an exon/exon boundary is increased with the length you will still have a chance. For the genes with a low expression this might not be sufficient hence you'll get more junctions with --coverage-search.

Also the cost per experiment vs the extra (hopefully one time) alignment time, the experiments are expensive and I want the most from my data. But we'll se how long it takes and if I can use the server so much.

**dvanic** · 03-27-2013, 04:33 PM

This since even if the chance of covering an exon/exon boundary is increased with the length you will still have a chance. For the genes with a low expression this might not be sufficient hence you'll get more junctions with --coverage-search.

How confident can you be, though, that these junctions are real? How well can you reconstruct these genes and their isoforms if you don't have enough reads that cover splice junctions?

**pettervikman** · 03-27-2013, 11:24 PM

I'm afraid I don't understand your point. All junctions/transcripts with a low number of reads are going to be hard to reconstruct. My thought is that by using coverage_search you'll get more reads mapping to junctions which then will move some transcripts from the "to few" bin to the "just enough" bin when it comes to number of mapped reads. This then with regards to reads mapping to junctions especially since you always (in my experience at least) will have more reads mapped to the gene in comparison to the junction.

So I'm currently comparing the output from ~70 samples +/- coverage_search to see if I'll benefit from the 4x mapping time that coverage_search takes.

**dvanic** · 03-27-2013, 11:59 PM

My point is that median exon length in human is quite close to 100 nucleotides, and I work with 100bp PE reads.

So if I haven't managed to "hit" an exon junction with at least one read how likely is it that I will have enough coverage across the entire gene to be able to predict exons accurately? How do I prevent spurious reconstruction of transcripts and exon boundaries because of how lowly the gene is expressed? How many real single exons will be split into more than one exon because of low coverage or regions in them that have low mappability, for example due to repeats? And how do I filter these out?

My thought is that by using coverage_search you'll get more reads mapping to junctions which then will move some transcripts from the "to few" bin to the "just enough" bin when it comes to number of mapped reads.

Coverage search does not increase the number of reads mapping to junctions. Coverage search is when you have "piles" of reads mapping to adjacent regions in the genome and there are NO junction reads, but you infer that there is a junction and these reads are part of one transcript based on them being in an adjacent locus and having the GT-AG sequence in the putative intron between them:

The second source is pairings of "coverage islands", which are distinct regions of piled up reads in the initial mapping.

**pettervikman** · 03-28-2013, 12:11 AM

Firstly, I thought that the coverage search defined new exons as based on coverage piles and that it then tried to map reads to the exons and junctions between all such piles. Hence reads that previously would have gotten a map somewhere else could be remapped to a junction between two defined exons.

Regarding all the other questions, well that's something to look in to. I know that I get more reads mapped from our initial investigation comparing between coverage/non coverage. If this is good maps or spurious maps I'll see later on.

**byb121** · 07-07-2014, 01:40 AM

Originally posted by pettervikman View Post

So I'm currently comparing the output from ~70 samples +/- coverage_search to see if I'll benefit from the 4x mapping time that coverage_search takes.

Hi, I am wondering what conclusion did you get from the comparison. Do you think coverage search is worth the time?

Thanks,

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 31 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 33 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

When to use tophat2's coverage search?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News