Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • sorrychen
    Junior Member
    • Aug 2009
    • 9

    In SAM, how can I know a read is mapped ambiguously?

    In .sam file, how can I know if an alignment is ambiguous. (A read is mapped to multiple places)?
    Forgive me if this can be found in SAM's Spec.

    In flag 0x0100 shows the alignment is not primary, which should have a different meaning. Even it is primary, does it mean it is the only mapping or there are other place that the read mapped.

    I haven't found other field that provide the information. It should be there, right?
  • sorrychen
    Junior Member
    • Aug 2009
    • 9

    #2
    oops, I think I post in the wrong place

    Comment

    • ECO
      --Site Admin--
      • Oct 2007
      • 1360

      #3
      no worries, moving it now.

      Comment

      • lh3
        Senior Member
        • Feb 2008
        • 686

        #4
        In short, what you should look at is the MAPQ column, the mapping quality. So far as I know, it is the only universal way to define the reliability of a generic alignment. For a much longer answer, see here.

        Comment

        • sorrychen
          Junior Member
          • Aug 2009
          • 9

          #5
          Thank you very much for the quick reply.This is actually complicated than I thought.

          Comment

          • sorrychen
            Junior Member
            • Aug 2009
            • 9

            #6
            I had a SAM file (generated by TopHat) with only four kinds of MAPQ value (0, 1, 3 and 255). It is so strange to me. I am not sure the reason that no mapping with value 2? The spec said 255 is unknown. Is there a way to guess the uniqueness of mappings?

            Comment

            • aleferna
              Senior Member
              • Sep 2009
              • 121

              #7
              Same problem as you sorrychen

              I'm working for a lab and they need to map some reads to the genome, but they are looking at short regions from the gene deserts therefore LOTS of repeats. I have the same problem, I don't seem to find a reliable mapping quality formula, I have tried using MIRA, BWA and just plain blat, they give me pretty much the same alignment but none of them will give me a probabilistically derived understandable mapping quality index. Has anybody seen any papers on deriving mapping quality, I saw the PHRED one but they don't seem to use suboptimal alignments which doesn't make any sense to me.... I mean I can have a perfect match in one chromosome but if I have a suboptimal hit of the same read with one mismatch the read should have a really low mapping score right? Has anybody seen any papers/formulas deriving mapping scores using the 2nd best alignment?

              Comment

              • aleferna
                Senior Member
                • Sep 2009
                • 121

                #8
                Started to search and I think I found the answer, in the MAQ paper they have a nice formula for calculating mapping quality just have to change a few things here and there to make it fit 454 data.

                Comment

                • Haneko
                  Member
                  • Jan 2010
                  • 36

                  #9
                  Hi all, I'm new to the forums. Do forgive me if this has been posted, but I couldn't find the answer.

                  The flag 0x100, means the alignment is not primary. What is a primary alignment? I'm finding reads with all its alignments having this flag, and some reads with all but one alignment having this flag. I'm quite confused.

                  Comment

                  • nilshomer
                    Nils Homer
                    • Nov 2008
                    • 1283

                    #10
                    Originally posted by Haneko View Post
                    Hi all, I'm new to the forums. Do forgive me if this has been posted, but I couldn't find the answer.

                    The flag 0x100, means the alignment is not primary. What is a primary alignment? I'm finding reads with all its alignments having this flag, and some reads with all but one alignment having this flag. I'm quite confused.
                    A read may have multiple alignments given the sensitivity of the aligner. The primary is typically the first or best alignment (depends on the aligner) although this does not have to be the case. Again, depending on the aligner you may be able to iterate through all the hits if CC and CP are specified.

                    Comment

                    • Haneko
                      Member
                      • Jan 2010
                      • 36

                      #11
                      Thanks for the feedback!

                      I found a read with 10 alignments, all having the same alignment length, but having different number of mismatches. If I calculate the score for each alignment, I can find one with the best score. Yet all the alignments have this flag, even the one with the best score. In addition, looking at the original map file, the best scoring alignment is the first to be reported. Does this mean my aligner does not see 'primary alignments' in the same definition?

                      Apparently I need to use this flag to filter reads when trying to reconstruct a GFF file from this SAM file for unique alignments (single alignments+multiple alignments fulfilling unique criteria).

                      Comment

                      • nilshomer
                        Nils Homer
                        • Nov 2008
                        • 1283

                        #12
                        Originally posted by Haneko View Post
                        Thanks for the feedback!

                        I found a read with 10 alignments, all having the same alignment length, but having different number of mismatches. If I calculate the score for each alignment, I can find one with the best score. Yet all the alignments have this flag, even the one with the best score. In addition, looking at the original map file, the best scoring alignment is the first to be reported. Does this mean my aligner does not see 'primary alignments' in the same definition?

                        Apparently I need to use this flag to filter reads when trying to reconstruct a GFF file from this SAM file for unique alignments (single alignments+multiple alignments fulfilling unique criteria).
                        I am curious, what aligner are you using? It may be good to give feedback to the aligner's developer(s).

                        Comment

                        • darked89
                          Member
                          • Jun 2009
                          • 38

                          #13
                          Originally posted by lh3 View Post
                          In short, what you should look at is the MAPQ column, the mapping quality. So far as I know, it is the only universal way to define the reliability of a generic alignment. For a much longer answer, see here.
                          Heng Li's page has moved here

                          Comment

                          • clariet
                            Member
                            • Mar 2010
                            • 18

                            #14
                            I am trying to do the same thing: extract unique alignments (single alignments+multiple alignments fulfilling unique criteria).

                            How exactly could I accomplish this by using SAM view -f (or -F)? I searched everywhere I could, but still confused.

                            Help please!!!!!
                            clariet

                            Originally posted by Haneko View Post
                            Thanks for the feedback!

                            I found a read with 10 alignments, all having the same alignment length, but having different number of mismatches. If I calculate the score for each alignment, I can find one with the best score. Yet all the alignments have this flag, even the one with the best score. In addition, looking at the original map file, the best scoring alignment is the first to be reported. Does this mean my aligner does not see 'primary alignments' in the same definition?

                            Apparently I need to use this flag to filter reads when trying to reconstruct a GFF file from this SAM file for unique alignments (single alignments+multiple alignments fulfilling unique criteria).

                            Comment

                            • JimC
                              Member
                              • Nov 2008
                              • 10

                              #15
                              MAPQ values in Tophat output

                              Originally posted by sorrychen View Post
                              I had a SAM file (generated by TopHat) with only four kinds of MAPQ value (0, 1, 3 and 255). It is so strange to me. I am not sure the reason that no mapping with value 2? The spec said 255 is unknown. Is there a way to guess the uniqueness of mappings?

                              Can anyone address the MAPQ output from Tophat(1.4.1), where the accepted_hits.bam file, when viewed in sam format -- the MAPQ only values are 0, 1, 3, 255. I've looked through the whole file, and there are excellent perfect matches on genes, but no values above 3 (which isn't a good MAPQ score from what I can determine). Any suggestions or pointers to where this might be described? I've tried all the manuals etc.

                              Thanks!

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                by SEQadmin2


                                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


                                Here are nine questions we think about, in roughly the order they matter, before...
                                Today, 07:11 AM
                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                06-02-2026, 10:05 AM
                              • SEQadmin2
                                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                by SEQadmin2


                                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                Introduction

                                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                05-22-2026, 06:42 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Yesterday, 06:09 AM
                              0 responses
                              16 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-09-2026, 11:58 AM
                              0 responses
                              36 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-05-2026, 10:09 AM
                              0 responses
                              42 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-04-2026, 08:59 AM
                              0 responses
                              49 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...