Seqanswers Leaderboard Ad

**nilshomer** · 07-19-2009, 06:21 AM

Originally posted by lukemn View Post

Hello,

I'm doing mutation detection by ~30x Illumina genome resequencing on a haploid eukaryote.

Maq seems to be working fine otherwise, not that I have a great deal of experience here, but final SNP list includes MASSES of ambiguous calls (ie. C>M, G>R etc) many with max phred of 255. By masses I mean ~2/3, from ~1700 total filtered SNPs over the genome. From a haploid! And this is randomly distributed over the entire genome, 8 chromosomes, so it's not partial duplications or restricted to repetitive sequence.

I should say I'm manually filtering to advised thresholds (phred 40, depth 3, also looking at neighbouring quality and number of hits but these numbers are looking fine) rather than running SNPfilter, but I don't think this should matter AFAIK. Mostly using default maq settings, except for the consensus assembly (-s -q 30).

I'm moving to BWA/SAMtools to compare, but still, anyone know what could be going on here? I'm very happy to just throw these away if spurious, but not without knowing why they're getting through.

Thanks,
Luke

You could convert the MAQ alignments to the SAM format and use the SAMtoolos SNP caller, which itself uses the MAQ consensus caller (written by the same author as MAQ). In SAMtools I believe you can specify the ploidy so the SNP calls will never be called heterozygous. There are also a number of other parameters that are useful to tune.

Just curious, but what are you doing about indels?

**lukemn** · 07-19-2009, 03:11 PM

Thanks, I'll try that.

And yes another reason I'm going ahead with BWA/SAMtools is to use the handling of gapped alignments for single end reads (I had thought we were doing paired ends but it turns out not to be the case). This should reveal indels, rearrangements, I hope.

**nilshomer** · 07-19-2009, 05:31 PM

Originally posted by lukemn View Post

Thanks, I'll try that.

And yes another reason I'm going ahead with BWA/SAMtools is to use the handling of gapped alignments for single end reads (I had thought we were doing paired ends but it turns out not to be the case). This should reveal indels, rearrangements, I hope.

Both SHRiMP and BFAST also are able to search for indels with single end data by using a full smith waterman algorithm. Keep me updated on your progress, I would be interested in your assessment.

**sungdugkim** · 07-20-2009, 07:43 AM

I am new in this field and like to learn from the basic..

Can you recommend any web site ?

Thank you

SK

**swbarnes2** · 07-20-2009, 09:27 AM

Originally posted by lukemn View Post

Hello,

I'm doing mutation detection by ~30x Illumina genome resequencing on a haploid eukaryote.

Maq seems to be working fine otherwise, not that I have a great deal of experience here, but final SNP list includes MASSES of ambiguous calls (ie. C>M, G>R etc) many with max phred of 255. By masses I mean ~2/3, from ~1700 total filtered SNPs over the genome. From a haploid! And this is randomly distributed over the entire genome, 8 chromosomes, so it's not partial duplications or restricted to repetitive sequence.

I should say I'm manually filtering to advised thresholds (phred 40, depth 3, also looking at neighbouring quality and number of hits but these numbers are looking fine) rather than running SNPfilter, but I don't think this should matter AFAIK. Mostly using default maq settings, except for the consensus assembly (-s -q 30).

I'm moving to BWA/SAMtools to compare, but still, anyone know what could be going on here? I'm very happy to just throw these away if spurious, but not without knowing why they're getting through.

Thanks,
Luke

I've seen those too in bacteria, and the high quality ones have confirmed with Sanger sequencing. So probably, what you are seeing is really in the original DNA, and not a false positive. You should sanger check a few, then ask the people who prepped the DNA why there appear to be two templates in their sample.

**lukemn** · 07-20-2009, 06:35 PM

I agree... there could be some contamination, especially of closely related progeny. But I would only have myself to blame for that!

Doing what I should have done in the first place before posting, manually inspecting the alignment (SAMtools tview), I see that most of these are probably just conservative variant calling by Maq... a few more than usual (say 3-5/average 30x coverage) seq errors that happen to fall on the same base, and are not representative of the consensus. Probably tunable but good to manually inspect as well I guess.

Also picking up a few extra SNPs from BWA relative to Maq.

**Torst** · 08-04-2009, 06:55 PM

Originally posted by lukemn View Post

I agree... there could be some contamination, especially of closely related progeny.

Some of the bacterial samples we've sequenced suggest the existence of a sub-population in the mix.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 25 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 27 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 24 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

high Q ambiguous SNPs from Maq

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News