Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Unique Reads from BAM/SAM

    Hi all,

    I'm using published data that includes all reads from an experiment and am not familiar with the SAM/BAM output. I'm having trouble interpreting the bitwise flag and MAPQ values to extract uniquely mapping reads and understand appropriate cut-offs.

    The SAM FAQ says to use samtools view -q 1 in order to extract "unique" reads; to which value does q refer? MAPQ?

    I want to replicate the number of acceptable reads in the publications using their criteria: "The best alignment was kept only in cases in which the second-best alignment had at least three more mismatches." What sort of q value (MAPQ) would this be?

    In all, I just want to be confident that I can filter out all unique reads.

    Thanks for any help!

    Further info: the dataset is single end bisulfite treated and only has 10 bitwise flags:
    0
    4
    16
    512
    516
    528
    1024
    1040
    1536
    1552
    Last edited by Gen2007; 01-13-2011, 10:31 AM.

  • #2
    Hope this will help:

    Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

    Comment


    • #3
      The only flags you want are 0 and 16. Everything higher than 1023 is a duplicate. Everything higher than 511 failed a platform quality check. The 4's are unmapped.

      view -f 0x010 is what you want, I think.

      Comment


      • #4
        Thanks for the suggestions. Depending on how I filter I got different results (out of 24.5 million reads). Do you guys have any idea of what is contributing to the disparities?

        samtools view -q 1 Colon_tumor_FFPE_lane2.bam | wc -l : 8,481,938 reads
        view -f 0x010 Colon_tumor_FFPE_lane2.bam | wc -l : 9,597,123 reads
        awk '$2~/^(0|16)$/' Colon_tumor_FFPE_lane2.sam | wc -l : 1,237,334

        Any ideas?

        Thanks a lot!

        Comment


        • #5
          I have the same question. After searching the forum I found many people have similar questions, and people have different understandings of how "unique/multiple" mappings were defined in SAM format. From the SAM specification, the flag 0x100 indicates whether the alignment is primary or not. But this is different from "unique", because multiple mappings will still have one primary mapping and other non-primary mappings, so just counting primary mappings doesn't give you the answer. I wonder if SAM format could add an aux field for this basic information, which could be extracted easily? If I'm missing something, please correct me.

          Comment


          • #6
            Hello,
            Originally posted by Gen2007 View Post
            The SAM FAQ says to use samtools view -q 1 in order to extract "unique" reads; to which value does q refer? MAPQ?
            ...Well, the FAQ on samtools shows an example of how to filter out reads with MAPQ below a certain threshold. In that example they put 1 which means 'filter out nearly nothing!'. As they say, more stringent thresholds would be appropriate, say 15 (p-value ~0.05 that a read is incorrectly mapped) or 20 (p-value ~0.01).
            Also note that 'uniqueness' has a rather vague definition, better to think in terms of reliability (i.e. MAPQ).

            I want to replicate the number of acceptable reads in the publications using their criteria: "The best alignment was kept only in cases in which the second-best alignment had at least three more mismatches." What sort of q value (MAPQ) would this be?
            I can't tell exactly, but I would say that -q 15 or -q 20 should be stringent enough while avoiding to throw away too many genuine reads.
            The filtering may become trickier if you want to replicate *exactly* the published results.

            Just my 2p...
            Dario

            Comment


            • #7
              Thanks a lot for the reply Dario. It seemed to me that -q 1 was not removing very much, and I was not very confident in using that value as a proper filter. Let me try your suggestions. Thanks again!

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:37 PM
              0 responses
              10 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 06:07 PM
              0 responses
              9 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              51 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              67 views
              0 likes
              Last Post seqadmin  
              Working...
              X