Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Maq SNP filtering script bug?

    Hi,

    I have run across a problem when I tried to filtered the out.snp file by using SNPfilter of Maq
    So I test the Maq Demo, the same error happened again. Below is how I processed the analysis.
    I used the command "perl maq.pl SNPfilter out.snp >out.filtered.snp", then error was reported:

    "Use of uninitialized value in string ne at maq.pl line 286, <> line 59.
    Use of uninitialized value in addition (+) at maq.pl line 286, <> line 59.
    Use of uninitialized value in string ne at maq.pl line 286, <> line 60.
    Use of uninitialized value in addition (+) at maq.pl line 286, <> line 60.
    Use of uninitialized value in string ne at maq.pl line 286, <> line 62.
    Use of uninitialized value in addition (+) at maq.pl line 286, <> line 62.
    Use of uninitialized value in string ne at maq.pl line 286, <> line 101.
    Use of uninitialized value in addition (+) at maq.pl line 286, <> line 101.
    Use of uninitialized value in string ne at maq.pl line 286, <> line 102.
    Use of uninitialized value in addition (+) at maq.pl line 286, <> line 102.
    ..............................
    ................................................."

    So I tracked original code of maq.pl, and found that at line 286 was like this:
    "$is_good = 0 unless ($t[4] >= $opts{q} || ($t[2] ne $t[9] && $t[4]+$t[10] >= $opts{q})); # consensus quality filter"
    but for my input file, out.snp, it did not contain $t[9] and $t[10], it only had 9 columns. That's why the error happened.

    Could anyone let me know how to solve this problem or what I did is wrong? Thanks.

  • #2
    hey! I'm not in a position to check now, but I think that SNPfilter needs the consensus as well.

    Comment


    • #3
      Hi ECO,

      Below is the quote from maq manual, I did not see any argument idicating its consensus, could you help me here? Thanks.


      maq.pl SNPfilter [-d minDep] [-D maxDep] [-Q maxMapQ] [-q minCnsQ] [-w indelWinSize] [-n minNeiQ] [-F in.indelpe] [-f in.indelsoa] [-s minScore] [-m maxAcross] [-a] [-N maxWinSNP] [-W densWinSize] in.cns2snp.snp > out.filtered.snp

      Rule out SNPs that are covered by few reads (specified by -d), by too many reads (specified by -D), near (specified by -w) to a potential indel, falling in a possible repetitve region (characterized by -Q), or having low-quality neighbouring bases (specified by -n). If maxWinSNP or more SNPs appear in any densWinSize window, they will also be filtered out together.

      OPTIONS:
      -d INT Minimum read depth required to call a SNP [3]
      -D INT Maximum read depth required to call a SNP (<255, otherwise ignored) [256]
      -Q INT Required maximum mapping quality of reads covering the SNP [40]
      -q INT Minimum consensus quality [20]
      -n INT Minimum adjacent consensus quality [20]
      -w INT Size of the window around the potential indels. SNPs that are close to indels will be suppressed [3]
      -F FILE The indelpe output [null]
      -f FILE The indelsoa output [null]
      -s INT Minimum score for a soa-indel to be considered [3]
      -m INT Maximum number of reads that can be mapped across a soa-indel [1]
      -a Alternative filter for single end alignment

      Comment


      • #4
        I dont know whats with the code... but the command sure works

        $ maq.pl SNPfilter Seq.fasta.snp
        gi|115315570 219 A R 31 255 1.00 63 62
        gi|115315570 1005 T Y 14 255 1.00 63 62
        gi|115315570 1576 G R 255 255 1.00 63 62
        ..

        Where
        $ head Seq.fasta.snp
        gi|115315570 219 A R 31 255 1.00 63 62
        gi|115315570 1005 T Y 14 255 1.00 63 62
        gi|115315570 1576 G R 255 255 1.00 63 62
        gi|115315570 1591 C Y 255 255 1.00 63 62
        gi|115315570 1595 G R 56 255 1.00 63 62
        gi|115315570 1689 C Y 255 255 1.00 63 62
        ..
        --
        bioinfosm

        Comment


        • #5
          I'll check when I get to my other computer, that perl error looks like a problem with your input file which starts on line 59.

          Comment


          • #6
            row 58: gi|162446888|ref|NC_010163.1| 73416 C S 159 45 1.31 63 62
            row 59: gi|162446888|ref|NC_010163.1| 73802 G T 3 55 2 0 2
            row 60: gi|162446888|ref|NC_010163.1| 74866 A C 3 42 2 0 2
            row 61: gi|162446888|ref|NC_010163.1| 75245 G S 40 49 2 63 62
            row62: gi|162446888|ref|NC_010163.1| 78151 C M 14 37 2 63 62


            The above rows (59,60,62) are the input lines example which are reported to have error. Apparently the fifth row's number is much lower than those of rows (58,61) which are not reported to be error.

            Could you or someone let me know why Maq.pl SNPfilter considers such lines have error? Thanks.
            Last edited by qiudao; 10-03-2008, 09:40 AM.

            Comment


            • #7
              maq SNP filter

              Hi,

              l looked a bit closer into the script and it seems to me that is really a bug.

              To me the line

              "$is_good = 0 unless ($t[4] >= $opts{q} || ($t[2] ne $t[9] && $t[4]+$t[10] >= $opts{q}));"

              means roughly translated:

              A snp is not good unless:
              - the consensus quality of the snp is greater or equal then the minimum consensus quality (-q, default: 20)

              - there is a second, different snp and the sum of both snp qualities is larger then the minimum consensus quality (-q, default: 20)

              I don't know what are the assumptions for this second condition, but in your case (in columns 59,60,62) the first condition failed and maq.pl could not evaluate the second condition because there is no second snp.

              I would suggest

              "$is_good = 0 unless ($t[4] >= $opts{q} || ($[9] && $t[2] ne $t[9] && $t[4]+$t[10] >= $opts{q}));"

              should fix your problem. Now the second condition fails already if there is no second snp.

              Any other optinions ?


              Cheers from Germany,

              Andy

              Comment


              • #8
                Maq does come with its own SNP caller...is the general feeling that it's not a very good one, and that's why people are writing their own?

                Comment


                • #9
                  Andpet,
                  Thanks for your reply. I agree with you. Defined the second snp first will solve this problem.
                  Thank you.

                  Comment


                  • #10
                    Originally posted by swbarnes2 View Post
                    Maq does come with its own SNP caller...is the general feeling that it's not a very good one, and that's why people are writing their own?
                    I also wish to have more discussion on SNPfiltering. MAQ SNPs are an inclusive list with false positives, but SNPfilter gets rid of a few good looking ones.

                    Any thoughts?
                    --
                    bioinfosm

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Advancing Precision Medicine for Rare Diseases in Children
                      by seqadmin




                      Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                      12-16-2024, 07:57 AM
                    • seqadmin
                      Recent Advances in Sequencing Technologies
                      by seqadmin



                      Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                      Long-Read Sequencing
                      Long-read sequencing has seen remarkable advancements,...
                      12-02-2024, 01:49 PM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 12-17-2024, 10:28 AM
                    0 responses
                    22 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 12-13-2024, 08:24 AM
                    0 responses
                    42 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 12-12-2024, 07:41 AM
                    0 responses
                    28 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 12-11-2024, 07:45 AM
                    0 responses
                    42 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X