Seqanswers Leaderboard Ad

**GenoMax** · 02-19-2014, 08:22 AM

Is that limit documented somewhere or based on personal experience?

Heng Li has referred to pileup being able to use 200GB BAM's before (albeit not for one amplicon) http://seqanswers.com/forums/showthread.php?t=6680

**SNPsaurus** · 02-19-2014, 08:50 AM

I use
samtools mpileup -BQ60 -d500000 -D -f

for our low-variant detection. The "-d" is "-d INT At a position, read maximally INT reads per input BAM. [250]" which limits the depth of the pileup. I turn off the BAQ calculation as I find it depresses scores of any variant, and while we only allow quality scores of 60 that is because our method greatly improves the quality scores so if you are looking at normal reads you might skip that or set -Q to 30.

**swbarnes2** · 02-19-2014, 09:28 AM

Given the error-prone nature of Illumina sequencing, there is a limit to how ultra sensitive you can be. I am skeptical that millions of reads will give you more true positives than a hundred thousand.

**Bukowski** · 02-19-2014, 04:17 PM

Originally posted by swbarnes2 View Post

Given the error-prone nature of Illumina sequencing, there is a limit to how ultra sensitive you can be. I am skeptical that millions of reads will give you more true positives than a hundred thousand.

Agreed. The race to the bottom for ultra-sensitive variant detection seems to be conveniently ignoring the false positive rate right now and it's quite disconcerting. Combined with your PCR induced errors, you're asking for trouble.

**svos** · 02-20-2014, 02:32 AM

Originally posted by Bukowski View Post

Agreed. The race to the bottom for ultra-sensitive variant detection seems to be conveniently ignoring the false positive rate right now and it's quite disconcerting. Combined with your PCR induced errors, you're asking for trouble.

Of course, you're right! We are also thinking about these problems and try to face them using corresponding control samples.
But this is another question, I just wanted to know if it would be possible to map millions of reads to one and the same location, process them with (m)pileup and call variants on it.

**swbarnes2** · 02-20-2014, 01:15 PM

Originally posted by svos View Post

Of course, you're right! We are also thinking about these problems and try to face them using corresponding control samples.
But this is another question, I just wanted to know if it would be possible to map millions of reads to one and the same location, process them with (m)pileup and call variants on it.

It's hard to say without knowing exactly how low you are trying to go, but I would NOT believe mpileup on anything less than a few % unless I had very solid spike-in data proving that the false positive and false negative rates were acceptable.

**svos** · 02-21-2014, 03:49 AM

Originally posted by swbarnes2 View Post

It's hard to say without knowing exactly how low you are trying to go, but I would NOT believe mpileup on anything less than a few % unless I had very solid spike-in data proving that the false positive and false negative rates were acceptable.

Again, you're right, but thats another problem... Hopefully we will have control settings allowing us to perform such an analysis.

The simple question is, is this kind of variant detection possible in respect to its technical / bioinformatic setting using e.g. (m)pileup or an alternative? Or will we face the problems already here (without thinking about the biological and sequencing background)?

**jkbonfield** · 02-21-2014, 09:44 AM

Perhaps one solution is to compute it in sections (say 1000 reads at a time), computing a vector of ACGT- at each point along with confidences, and then combining those vectors together in a second round of mpileup.

It's not possible with the current code, but in principle the "reduced-reads" style notation (done formally) could yield a way to compute extreme depth pileups in a memory-tractable manner.

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 15 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

SAMtools pileup of millions of reads from a single amplicon

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News