Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • efoss
    Member
    • Jul 2011
    • 98

    removing reads that map to more than one location from gsnap-aligned bam files

    I would like to remove reads that map to more than one location from bam files I created using gsnap. Does anyone know how to do this? I would prefer not to have to realign. Also, does anyone know what MAPQ 0 means in files that have been aligned using gsnap? As I understand it, the meaning of MAPQ 0 can change depending on which aligner was used to generate the bam file.

    Thanks.

    Eric
  • Richard Finney
    Senior Member
    • Feb 2009
    • 701

    #2
    see here : https://www.biostars.org/p/56246/

    samtools view -bq 1 file.bam > unique.bam

    Comment

    • efoss
      Member
      • Jul 2011
      • 98

      #3
      Fantastic! Thanks so much, Richard Finney.

      As I understand this command, it's saying to filter out anything with a mapping quality (MAPQ) score that is less than one and output that as a bam file. Is it true that, regardless of which aligner you use to create your bam file, a read that maps to more than one location will have a MAPQ score of 0?

      Comment

      • Richard Finney
        Senior Member
        • Feb 2009
        • 701

        #4
        Not necessarily.
        The tags (I think) are optional and not all alignment programs go the extra mile to make sure the tags are thorough.
        I'm not sure how orthodox GSNAP is on this matter; you may wish to view the sam output tags to make sure they're what they should be.

        Comment

        • Brian Bushnell
          Super Moderator
          • Jan 2014
          • 2709

          #5
          Typically, a read that maps to multiple locations with a similar (internal) score will get a mapq of 3 or less, as 3 indicates at most a 50% chance that a given alignment is correct. But it varies greatly by aligner; some will always give mapq 255 for any mapped read, for example.

          More importantly, even if an aligner does assign a read to multiple locations, they are not necessarily equivalent; the primary might be much better than the secondaries (and as such perhaps get a mapq well above 3). There is not a simple, universal way to ensure that you remove all reads from a sam file that map to multiple locations when processing it as a stream without tracking names to see how many times they occur, though you could make this process efficient if the file is sorted by name.

          It's trivial to filter out all secondary alignments, though, with samtools.

          Comment

          Latest Articles

          Collapse

          • SEQadmin2
            From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
            by SEQadmin2


            Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


            The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
            ...
            06-02-2026, 10:05 AM
          • SEQadmin2
            Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
            by SEQadmin2


            With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


            Introduction

            Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
            05-22-2026, 06:42 AM
          • SEQadmin2
            Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
            by SEQadmin2

            Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


            Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
            05-06-2026, 09:04 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by SEQadmin2, Yesterday, 08:59 AM
          0 responses
          14 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-02-2026, 12:03 PM
          0 responses
          22 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-02-2026, 11:40 AM
          0 responses
          19 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 05-28-2026, 11:40 AM
          0 responses
          32 views
          0 reactions
          Last Post SEQadmin2  
          Working...