Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • hanleng
    Member
    • Mar 2012
    • 15

    Remove reads which are not uniquely mapped

    I have already had the BAM file, so how can I remove those reads which are not uniquely mapped? Can samtools do this? All some other software?
  • Heisman
    Senior Member
    • Dec 2010
    • 534

    #2
    Look in the samtools specification... pretty sure there is some type of flag for multiply mapped reads. Then you can convert the file to SAM format and grep -v it.

    Comment

    • nilshomer
      Nils Homer
      • Nov 2008
      • 1283

      #3
      Please define what uniquely mapped means to you.

      Comment

      • hanleng
        Member
        • Mar 2012
        • 15

        #4
        Originally posted by nilshomer View Post
        Please define what uniquely mapped means to you.
        Reads that can be hit only once in the genome - I know there is some cutoff for this hit, but generally, how to remove those have mutilple hits?

        Comment

        • hanleng
          Member
          • Mar 2012
          • 15

          #5
          Originally posted by Heisman View Post
          Look in the samtools specification... pretty sure there is some type of flag for multiply mapped reads. Then you can convert the file to SAM format and grep -v it.
          That is the reason I asked because I could not find it.

          Comment

          • nilshomer
            Nils Homer
            • Nov 2008
            • 1283

            #6
            I don't like this definition, since suppose a mapper X tries harder to find a hit than mapper Y, then mapper X is most likely more sensitive and specific, but will have fewer reads that only had one hit, even though a hit for a read may be much more likely than all other hits.

            That's why you should look at mapping quality. Those with mapping quality zero are ambiguous: multiple equally best alignments were found. But those with higher quality should have higher confidence.

            Comment

            • Chipper
              Senior Member
              • Mar 2008
              • 323

              #7
              In other words, use samtools view -q 1 on the .bam to get reads with a mapping quality of at least 1. Depending on your application and aligner you may want to use something like -q 20 to get more reliable hits.

              Comment

              • maubp
                Peter (Biopython etc)
                • Jul 2009
                • 1544

                #8
                Originally posted by Heisman View Post
                Look in the samtools specification... pretty sure there is some type of flag for multiply mapped reads. Then you can convert the file to SAM format and grep -v it.
                If the SAM/BAM file has secondary mappings recorded (which not all mapping software will do), then yes, you can filter them out using the FLAG bit values. However I'd recommend using 'samtools view' with the -f and/or -F switches rather than grep.

                Comment

                • Simon Anders
                  Senior Member
                  • Feb 2010
                  • 995

                  #9
                  If your aligner properly sets the MAPQ field, you can filter on this. After all, if there are two equally plausible alignments, the probability for each alignment to be correct is at most 50%, which transforms to a Phred score of 3. Hence, an aligner should never indicate a mapping quality above 3 for multiply matched reads.

                  Furthermore, many aligners follow the recommendation to use the optional tag "NH" to indicate how many alignments are reported for this read. This helps only, of course, if you instructed the aligner to report multiple alignments.

                  Comment

                  • Annibal
                    Member
                    • Mar 2012
                    • 10

                    #10
                    Originally posted by Simon Anders View Post
                    If your aligner properly sets the MAPQ field, you can filter on this. After all, if there are two equally plausible alignments, the probability for each alignment to be correct is at most 50%, which transforms to a Phred score of 3. Hence, an aligner should never indicate a mapping quality above 3 for multiply matched reads.
                    What about paired end sequencing? Uniqueness of a read alignment depends also on its mate position. One mate could map 10 times on the genome but only 1 position is valid considering its mate. How can you filter from a bowtie/bwa generated bam file only uniquely mapped paire end reads?
                    Don't want considering only "concordant" reads, since i would like to retain paired reads that map in the same region but with discordant orientation (if stranded)

                    Comment

                    Latest Articles

                    Collapse

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by SEQadmin2, 06-05-2026, 10:09 AM
                    0 responses
                    14 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-04-2026, 08:59 AM
                    0 responses
                    25 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-02-2026, 12:03 PM
                    0 responses
                    31 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-02-2026, 11:40 AM
                    0 responses
                    23 views
                    0 reactions
                    Last Post SEQadmin2  
                    Working...