Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Uniquely mapped short reads (MAQ)

    I searched for this discussion but could not find one.

    I would like to use only uniquely mapped reads for SNP calling, etc in MAQ. By uniquely mapped I mean, the top hit/best alignment is unique and no other position on reference maps with equally good score. I would want to use only these reads to do SNP calling and further analysis.. I believe its doable using MAQ, but not sure.

    Next, I would like to keep track of what reads were dumped because of non-unique mapping, apart from the regular dumped reads found using MAQ's -u option.
    --
    bioinfosm

  • #2
    if i recall correctly maq by default reports non-unique reads; it just gives them a mapping quality of 0

    Comment


    • #3
      True, is there a direct way to use just the uniquely mapping reads in order to do SNP calling..
      --
      bioinfosm

      Comment


      • #4
        Finding unique reads

        Has anyone been able to work around this? Essentially, I would like MAQ to assemble unique reads only. If it is not able to do this (which I think is an easy modification), how do I view the mapping qualities? I would probably just throw those away with scores of 0 and map the rest again from start.

        Any help would be appreciated. Thanks

        Comment


        • #5
          The ELAND/Bowtie type aligners tell your for each read how many times they aligned to your reference sequence. So you could align with one of thsoe programs first, and then sift the output, pulling out the read info on lines where the read hits only one time. Then give only those reads to MAQ.

          You would have to align twice, which is kind of silly, but it would work.

          Comment


          • #6
            i think you will be better off using bowtie for the alignment step - bowtie is nicer and faster anyway and can automatically filter out non-unique reads (use the -m 1 option)

            then split your bowtie map file into 2 million line chunks and use bowtie-maqconvert to convert into maq map files, then merge into a single maq map file to go on and perform downstream maq applications

            something along these lines
            Code:
            mkdir splitout
            split -l 2000000 -d bowtie.map splitout/bowtie.map
            for files in splitout/*
                do bowtie-maqconvert "$files" "$files".maq maq_reference.bfa;
            done
            maq mapmerge maq.map splitout/*.maq

            Comment


            • #7
              It's easy enough to exclude non-unique reads from MAQ by using the builtin features e.g.

              for assembly use

              Code:
              maq assemble -q 10 ...
              Where 10 is your quality score filter.

              Comment


              • #8
                I am slightly confused about how the SNP calls are being made etc, and where the exclusion occurs on using maq assemble -q 10

                I ran the same data with and without -q 10, all other commands remaining same. The results are interestingly different.

                The maq map alignments are different, but the number of final SNPs reported are way different - 2797 when using -q 10, and 1751 otherwise (1411 common)

                Can someone walk me through these numbers...

                Without -q 10
                Code:
                [B]Lane1[/B]
                -- 135 potential soa-indels pass the filter.
                -- == statmap report ==
                -- # single end (SE) reads: 6655133
                -- # mapped SE reads: 1676926 (/ 6655133 = 25.19%)
                
                [B]Lane2[/B]
                -- 101 potential soa-indels pass the filter.
                -- == statmap report ==
                -- # single end (SE) reads: 6925588
                -- # mapped SE reads: 2213553 (/ 6925588 = 31.96%)
                With -q 10
                Code:
                [B]Lane1[/B]
                -- 131 potential soa-indels pass the filter.
                -- == statmap report ==
                -- # single end (SE) reads: 6655133
                -- # mapped SE reads: 1676926 (/ 6655133 = 25.19%)
                
                [B]Lane2[/B]
                -- 101 potential soa-indels pass the filter.
                -- == statmap report ==
                -- # single end (SE) reads: 6925588
                -- # mapped SE reads: 2213553 (/ 6925588 = 31.96%)
                Code:
                $ diff regular.maq.aln q10.map.aln
                1,2c1,2
                < 30PERAAXX_300164-25:1:52:1644:1221    MLH1    2       -       0       0       99      99      99      0       0       1       0       36      TTGGCTGAAGGCACTTCCGTTGAGCATCTAGACG
                TT    IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
                < 30PERAAXX_300164-25:1:80:1159:720     MLH1    2       -       0       0       96      96      96      0       0       1       0       36      TtGGCTGAAGGCaCTtCCGTTGAGCATCTAGACG
                TT    @0IIIIIBIIII#II5IIIIIIIIIIIIIIIIIII>
                ---
                > 30PERAAXX_300164-25:1:52:1644:1221    MLH1    2       -       0       0       98      98      98      0       0       1       0       36      TTGGCTGAAGGCACTTCCGTTGAGCATCTAGACG
                TT    IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
                > 30PERAAXX_300164-25:1:80:1159:720     MLH1    2       -       0       0       95      95      95      0       0       1       0       36      TtGGCTGAAGGCaCTtCCGTTGAGCATCTAGACG
                TT    @0IIIIIBIIII#II5IIIIIIIIIIIIIIIIIII>
                
                34,44c34,44
                < 30PERAAXX_300164-25:1:33:693:1787     MLH1    14      -       0       0       99      99      99      0       0       1       0       36      ACTTCCGTTGAGCATCTAGACGTTTCCTTGGCTC
                TT    IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
                < 30PERAAXX_300164-25:1:41:347:1531     MLH1    14      -       0       0       99      99      99      0       0       1       0       36      ACTTCCGTTGAGCATCTAGACGTTTCCTTGGCTC
                TT    IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
                < 30PERAAXX_300164-25:1:67:393:1068     MLH1    14      -       0       0       99      99      99      0       0       1       0       36      ACTTCCGTTGAGCATCTAGaCGTTTCCTTGGCTC
                TT    HIIHIIIIIIIIIIIIIII3IIIIIIIIIIIIIIII
                < 30PERAAXX_300164-25:1:70:959:1492     MLH1    14      -       0       0       99      99      99      0       0       1       0       36      aCTTCCGTTGAGCATCTAGACGTTTCCTTGGCTC
                TT    2IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
                < 30PERAAXX_300164-25:1:93:1678:647     MLH1    14      -       0       0       96      96      96      0       0       1       0       36      aCTTcCGTTGAGCATCTAGACGTTTCCTTGGCTC
                TT    +III6IIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
                < 30PERAAXX_300164-25:1:4:220:1034      MLH1    15      -       0       0       99      99      99      0       0       1       0       36      CTTCCGTTGAGCATCTAGACGTTTCCTTGGCTCT
                TC    IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
                < 30PERAAXX_300164-25:1:9:249:1562      MLH1    15      -       0       0       99      99      99      0       0       1       0       36      CTTCCGTTGAGCATCTAGACGTTTCCTTGGCTCT
                TC    IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
                < 30PERAAXX_300164-25:1:12:1505:614     MLH1    15      -       0       0       96      96      96      0       0       1       0       36      CttCCGTTGAGCATCTAGACGTTTCCTTGGCTCT
                TC    I81IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
                < 30PERAAXX_300164-25:1:13:368:1753     MLH1    15      -       0       0       99      99      99      0       0       1       0       36      CTTCCGTTGAGCATCTAGACGTTTCCTTGGCTCT
                TC    IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
                < 30PERAAXX_300164-25:1:15:9:1341       MLH1    15      -       0       0       99      99      99      0       0       1       0       36      CTTCCGTTGAGCATCTAGACGTTTCCTTGGCTCT
                TC    IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
                < 30PERAAXX_300164-25:1:19:639:103      MLH1    15      -       0       0       99      99      99      0       0       1       0       36      CTTCCGTTGAGCATCTAGACGTTTCCTTGGCTCT
                TC    IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
                ---
                > 30PERAAXX_300164-25:1:33:693:1787     MLH1    14      -       0       0       98      98      98      0       0       1       0       36      ACTTCCGTTGAGCATCTAGACGTTTCCTTGGCTC
                TT    IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
                > 30PERAAXX_300164-25:1:41:347:1531     MLH1    14      -       0       0       98      98      98      0       0       1       0       36      ACTTCCGTTGAGCATCTAGACGTTTCCTTGGCTC
                TT    IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
                > 30PERAAXX_300164-25:1:67:393:1068     MLH1    14      -       0       0       98      98      98      0       0       1       0       36      ACTTCCGTTGAGCATCTAGaCGTTTCCTTGGCTC
                TT    HIIHIIIIIIIIIIIIIII3IIIIIIIIIIIIIIII
                > 30PERAAXX_300164-25:1:70:959:1492     MLH1    14      -       0       0       98      98      98      0       0       1       0       36      aCTTCCGTTGAGCATCTAGACGTTTCCTTGGCTC
                TT    2IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
                > 30PERAAXX_300164-25:1:93:1678:647     MLH1    14      -       0       0       95      95      95      0       0       1       0       36      aCTTcCGTTGAGCATCTAGACGTTTCCTTGGCTC
                TT    +III6IIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
                > 30PERAAXX_300164-25:1:4:220:1034      MLH1    15      -       0       0       98      98      98      0       0       1       0       36      CTTCCGTTGAGCATCTAGACGTTTCCTTGGCTCT
                TC    IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
                > 30PERAAXX_300164-25:1:9:249:1562      MLH1    15      -       0       0       98      98      98      0       0       1       0       36      CTTCCGTTGAGCATCTAGACGTTTCCTTGGCTCT
                TC    IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
                > 30PERAAXX_300164-25:1:12:1505:614     MLH1    15      -       0       0       95      95      95      0       0       1       0       36      CttCCGTTGAGCATCTAGACGTTTCCTTGGCTCT
                TC    I81IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
                > 30PERAAXX_300164-25:1:13:368:1753     MLH1    15      -       0       0       98      98      98      0       0       1       0       36      CTTCCGTTGAGCATCTAGACGTTTCCTTGGCTCT
                TC    IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
                > 30PERAAXX_300164-25:1:15:9:1341       MLH1    15      -       0       0       98      98      98      0       0       1       0       36      CTTCCGTTGAGCATCTAGACGTTTCCTTGGCTCT
                TC    IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
                > 30PERAAXX_300164-25:1:19:639:103      MLH1    15      -       0       0       98      98      98      0       0       1       0       36      CTTCCGTTGAGCATCTAGACGTTTCCTTGGCTCT
                TC    IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
                --
                bioinfosm

                Comment


                • #9
                  I realize that there is a -q option that one may use with maq pileup as well.
                  Would a repeat mapped read mapping to different locations with same score (mapping quality or the 7th column of reads.mapview is 0) be used to do SNP calling?

                  Thanks
                  --
                  bioinfosm

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Essential Discoveries and Tools in Epitranscriptomics
                    by seqadmin




                    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                    04-22-2024, 07:01 AM
                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Yesterday, 08:47 AM
                  0 responses
                  12 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  60 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  59 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  54 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X