Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • high Q ambiguous SNPs from Maq

    Hello,

    I'm doing mutation detection by ~30x Illumina genome resequencing on a haploid eukaryote.

    Maq seems to be working fine otherwise, not that I have a great deal of experience here, but final SNP list includes MASSES of ambiguous calls (ie. C>M, G>R etc) many with max phred of 255. By masses I mean ~2/3, from ~1700 total filtered SNPs over the genome. From a haploid! And this is randomly distributed over the entire genome, 8 chromosomes, so it's not partial duplications or restricted to repetitive sequence.

    I should say I'm manually filtering to advised thresholds (phred 40, depth 3, also looking at neighbouring quality and number of hits but these numbers are looking fine) rather than running SNPfilter, but I don't think this should matter AFAIK. Mostly using default maq settings, except for the consensus assembly (-s -q 30).

    I'm moving to BWA/SAMtools to compare, but still, anyone know what could be going on here? I'm very happy to just throw these away if spurious, but not without knowing why they're getting through.

    Thanks,
    Luke

  • #2
    Originally posted by lukemn View Post
    Hello,

    I'm doing mutation detection by ~30x Illumina genome resequencing on a haploid eukaryote.

    Maq seems to be working fine otherwise, not that I have a great deal of experience here, but final SNP list includes MASSES of ambiguous calls (ie. C>M, G>R etc) many with max phred of 255. By masses I mean ~2/3, from ~1700 total filtered SNPs over the genome. From a haploid! And this is randomly distributed over the entire genome, 8 chromosomes, so it's not partial duplications or restricted to repetitive sequence.

    I should say I'm manually filtering to advised thresholds (phred 40, depth 3, also looking at neighbouring quality and number of hits but these numbers are looking fine) rather than running SNPfilter, but I don't think this should matter AFAIK. Mostly using default maq settings, except for the consensus assembly (-s -q 30).

    I'm moving to BWA/SAMtools to compare, but still, anyone know what could be going on here? I'm very happy to just throw these away if spurious, but not without knowing why they're getting through.

    Thanks,
    Luke
    You could convert the MAQ alignments to the SAM format and use the SAMtoolos SNP caller, which itself uses the MAQ consensus caller (written by the same author as MAQ). In SAMtools I believe you can specify the ploidy so the SNP calls will never be called heterozygous. There are also a number of other parameters that are useful to tune.

    Just curious, but what are you doing about indels?

    Comment


    • #3
      Thanks, I'll try that.

      And yes another reason I'm going ahead with BWA/SAMtools is to use the handling of gapped alignments for single end reads (I had thought we were doing paired ends but it turns out not to be the case). This should reveal indels, rearrangements, I hope.

      Comment


      • #4
        Originally posted by lukemn View Post
        Thanks, I'll try that.

        And yes another reason I'm going ahead with BWA/SAMtools is to use the handling of gapped alignments for single end reads (I had thought we were doing paired ends but it turns out not to be the case). This should reveal indels, rearrangements, I hope.
        Both SHRiMP and BFAST also are able to search for indels with single end data by using a full smith waterman algorithm. Keep me updated on your progress, I would be interested in your assessment.

        Comment


        • #5
          I am new in this field and like to learn from the basic..

          Can you recommend any web site ?

          Thank you

          SK

          Comment


          • #6
            Originally posted by lukemn View Post
            Hello,

            I'm doing mutation detection by ~30x Illumina genome resequencing on a haploid eukaryote.

            Maq seems to be working fine otherwise, not that I have a great deal of experience here, but final SNP list includes MASSES of ambiguous calls (ie. C>M, G>R etc) many with max phred of 255. By masses I mean ~2/3, from ~1700 total filtered SNPs over the genome. From a haploid! And this is randomly distributed over the entire genome, 8 chromosomes, so it's not partial duplications or restricted to repetitive sequence.

            I should say I'm manually filtering to advised thresholds (phred 40, depth 3, also looking at neighbouring quality and number of hits but these numbers are looking fine) rather than running SNPfilter, but I don't think this should matter AFAIK. Mostly using default maq settings, except for the consensus assembly (-s -q 30).

            I'm moving to BWA/SAMtools to compare, but still, anyone know what could be going on here? I'm very happy to just throw these away if spurious, but not without knowing why they're getting through.

            Thanks,
            Luke
            I've seen those too in bacteria, and the high quality ones have confirmed with Sanger sequencing. So probably, what you are seeing is really in the original DNA, and not a false positive. You should sanger check a few, then ask the people who prepped the DNA why there appear to be two templates in their sample.

            Comment


            • #7
              I agree... there could be some contamination, especially of closely related progeny. But I would only have myself to blame for that!

              Doing what I should have done in the first place before posting, manually inspecting the alignment (SAMtools tview), I see that most of these are probably just conservative variant calling by Maq... a few more than usual (say 3-5/average 30x coverage) seq errors that happen to fall on the same base, and are not representative of the consensus. Probably tunable but good to manually inspect as well I guess.

              Also picking up a few extra SNPs from BWA relative to Maq.

              Comment


              • #8
                Originally posted by lukemn View Post
                I agree... there could be some contamination, especially of closely related progeny.
                Some of the bacterial samples we've sequenced suggest the existence of a sub-population in the mix.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM
                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                25 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                27 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                24 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                52 views
                0 likes
                Last Post seqadmin  
                Working...
                X