Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Maq SNP filtering script bug?

    Hi,

    I have run across a problem when I tried to filtered the out.snp file by using SNPfilter of Maq
    So I test the Maq Demo, the same error happened again. Below is how I processed the analysis.
    I used the command "perl maq.pl SNPfilter out.snp >out.filtered.snp", then error was reported:

    "Use of uninitialized value in string ne at maq.pl line 286, <> line 59.
    Use of uninitialized value in addition (+) at maq.pl line 286, <> line 59.
    Use of uninitialized value in string ne at maq.pl line 286, <> line 60.
    Use of uninitialized value in addition (+) at maq.pl line 286, <> line 60.
    Use of uninitialized value in string ne at maq.pl line 286, <> line 62.
    Use of uninitialized value in addition (+) at maq.pl line 286, <> line 62.
    Use of uninitialized value in string ne at maq.pl line 286, <> line 101.
    Use of uninitialized value in addition (+) at maq.pl line 286, <> line 101.
    Use of uninitialized value in string ne at maq.pl line 286, <> line 102.
    Use of uninitialized value in addition (+) at maq.pl line 286, <> line 102.
    ..............................
    ................................................."

    So I tracked original code of maq.pl, and found that at line 286 was like this:
    "$is_good = 0 unless ($t[4] >= $opts{q} || ($t[2] ne $t[9] && $t[4]+$t[10] >= $opts{q})); # consensus quality filter"
    but for my input file, out.snp, it did not contain $t[9] and $t[10], it only had 9 columns. That's why the error happened.

    Could anyone let me know how to solve this problem or what I did is wrong? Thanks.

  • #2
    hey! I'm not in a position to check now, but I think that SNPfilter needs the consensus as well.

    Comment


    • #3
      Hi ECO,

      Below is the quote from maq manual, I did not see any argument idicating its consensus, could you help me here? Thanks.


      maq.pl SNPfilter [-d minDep] [-D maxDep] [-Q maxMapQ] [-q minCnsQ] [-w indelWinSize] [-n minNeiQ] [-F in.indelpe] [-f in.indelsoa] [-s minScore] [-m maxAcross] [-a] [-N maxWinSNP] [-W densWinSize] in.cns2snp.snp > out.filtered.snp

      Rule out SNPs that are covered by few reads (specified by -d), by too many reads (specified by -D), near (specified by -w) to a potential indel, falling in a possible repetitve region (characterized by -Q), or having low-quality neighbouring bases (specified by -n). If maxWinSNP or more SNPs appear in any densWinSize window, they will also be filtered out together.

      OPTIONS:
      -d INT Minimum read depth required to call a SNP [3]
      -D INT Maximum read depth required to call a SNP (<255, otherwise ignored) [256]
      -Q INT Required maximum mapping quality of reads covering the SNP [40]
      -q INT Minimum consensus quality [20]
      -n INT Minimum adjacent consensus quality [20]
      -w INT Size of the window around the potential indels. SNPs that are close to indels will be suppressed [3]
      -F FILE The indelpe output [null]
      -f FILE The indelsoa output [null]
      -s INT Minimum score for a soa-indel to be considered [3]
      -m INT Maximum number of reads that can be mapped across a soa-indel [1]
      -a Alternative filter for single end alignment

      Comment


      • #4
        I dont know whats with the code... but the command sure works

        $ maq.pl SNPfilter Seq.fasta.snp
        gi|115315570 219 A R 31 255 1.00 63 62
        gi|115315570 1005 T Y 14 255 1.00 63 62
        gi|115315570 1576 G R 255 255 1.00 63 62
        ..

        Where
        $ head Seq.fasta.snp
        gi|115315570 219 A R 31 255 1.00 63 62
        gi|115315570 1005 T Y 14 255 1.00 63 62
        gi|115315570 1576 G R 255 255 1.00 63 62
        gi|115315570 1591 C Y 255 255 1.00 63 62
        gi|115315570 1595 G R 56 255 1.00 63 62
        gi|115315570 1689 C Y 255 255 1.00 63 62
        ..
        --
        bioinfosm

        Comment


        • #5
          I'll check when I get to my other computer, that perl error looks like a problem with your input file which starts on line 59.

          Comment


          • #6
            row 58: gi|162446888|ref|NC_010163.1| 73416 C S 159 45 1.31 63 62
            row 59: gi|162446888|ref|NC_010163.1| 73802 G T 3 55 2 0 2
            row 60: gi|162446888|ref|NC_010163.1| 74866 A C 3 42 2 0 2
            row 61: gi|162446888|ref|NC_010163.1| 75245 G S 40 49 2 63 62
            row62: gi|162446888|ref|NC_010163.1| 78151 C M 14 37 2 63 62


            The above rows (59,60,62) are the input lines example which are reported to have error. Apparently the fifth row's number is much lower than those of rows (58,61) which are not reported to be error.

            Could you or someone let me know why Maq.pl SNPfilter considers such lines have error? Thanks.
            Last edited by qiudao; 10-03-2008, 09:40 AM.

            Comment


            • #7
              maq SNP filter

              Hi,

              l looked a bit closer into the script and it seems to me that is really a bug.

              To me the line

              "$is_good = 0 unless ($t[4] >= $opts{q} || ($t[2] ne $t[9] && $t[4]+$t[10] >= $opts{q}));"

              means roughly translated:

              A snp is not good unless:
              - the consensus quality of the snp is greater or equal then the minimum consensus quality (-q, default: 20)

              - there is a second, different snp and the sum of both snp qualities is larger then the minimum consensus quality (-q, default: 20)

              I don't know what are the assumptions for this second condition, but in your case (in columns 59,60,62) the first condition failed and maq.pl could not evaluate the second condition because there is no second snp.

              I would suggest

              "$is_good = 0 unless ($t[4] >= $opts{q} || ($[9] && $t[2] ne $t[9] && $t[4]+$t[10] >= $opts{q}));"

              should fix your problem. Now the second condition fails already if there is no second snp.

              Any other optinions ?


              Cheers from Germany,

              Andy

              Comment


              • #8
                Maq does come with its own SNP caller...is the general feeling that it's not a very good one, and that's why people are writing their own?

                Comment


                • #9
                  Andpet,
                  Thanks for your reply. I agree with you. Defined the second snp first will solve this problem.
                  Thank you.

                  Comment


                  • #10
                    Originally posted by swbarnes2 View Post
                    Maq does come with its own SNP caller...is the general feeling that it's not a very good one, and that's why people are writing their own?
                    I also wish to have more discussion on SNPfiltering. MAQ SNPs are an inclusive list with false positives, but SNPfilter gets rid of a few good looking ones.

                    Any thoughts?
                    --
                    bioinfosm

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM
                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    30 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    32 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 09:21 AM
                    0 responses
                    28 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-04-2024, 09:00 AM
                    0 responses
                    53 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X