Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • SNP Frequency

    I am trying to estimate the frequency of SNP occurence in my library as reported by the maq aligner. The quick and dirty way of doing this, I believe, is to calculate the number of bases of the reference genome that are covered by at least 3 uniquely mapped reads. That should match the cutoff specified in SNPFilter command of maq. Does anyone know if this information is available somewhere in the maq output files?
    Also, please let me know if you think my logic for estimating the frequency is blatantly wrong. I understand that its not perfect but this is just my first hack at it.

  • #2
    Hi Kvarla,

    I've written several SNP callers myself. In principle, your method is good. I don't trust 3 independent fragments, however - I think it's not nearly enough. Generally, I'll use a minimum of 8x, and the base qualities of each aligned base must be over a given threshold (20 for scores out of 30, 27 for scores out of 40). Some people also require that you see fragments in both directions. YMMV.

    I don't think MAQ is much different than that.

    The only thing I'm not sure of is why you're doing an estimate - it's not hard to get the actual value of SNPs in your library.

    Anthony
    The more you know, the more you know you don't know. —Aristotle

    Comment


    • #3
      I can easily find the number of SNPs. I'm trying to get the number of bases that are aligned at the same quality as the SNP calls but are identical to the reference. I don't think its accurate to estimate the SNP frequency as no. of SNPs called / total bases aligned since all the bases are not available for making that call.

      Comment


      • #4
        That makes sense. I typically just write out the number of bases observed with that level of coverage, since I use my own SNP callers. I'm not sure how one would get Maq to do that on it's own. I can point you to one of my SNP callers, since it's open source and produces similar results to maq, but I don't think that's what you're asking.

        As far as I'm concerned, there really is no way to estimate the number of bases Maq will see without doing the actual calculation.

        Anthony
        The more you know, the more you know you don't know. —Aristotle

        Comment


        • #5
          will maq's pileup be of help?
          use the verbose command to output qualities as well, and then parse out the numbers
          --
          bioinfosm

          Comment


          • #6
            ah.. of course. Thanks bioinfosm for the pointer. Serves me right for not reading the manual more carefully. Next time before I ask I'll make sure I RTFM.

            Comment


            • #7
              Note there are some default filters on maq pileup output:
              -m INT Maximum number of mismatches allowed for a read to be used [7]
              -Q INT Maximum allowed number of quality values of mismatches [60]

              We find these subtly change results - dont expect maq pileup to generate exactly what is in the .map file.

              david

              Comment


              • #8
                SNP frequency

                Hi,

                Could you please tell me how you use pileup file for SNP frequency. I am also trying to find out variant proportion in each position. Do you know how can I calculate that? e.g if If reference base is A and consensus base is T/G how can I know at what proportion each one of these bases are?

                Thanks

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM
                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                24 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                25 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                23 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                52 views
                0 likes
                Last Post seqadmin  
                Working...
                X