Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • samtools flagstat issues w/ star, hisat2, gsnap & tophat2

    I am comparing these alignment tools:

    hisat2 (v2.0.4)
    gsnap (v2016-08-24)
    star (v2.5.2b)
    tophat2 (v2.1.1)

    by mapping a small set of RNAseq illumina samples vs. an annotated reference. And then I'm using samtools flagstat (samtools v0.1.18) to look at the % of reads that map. But the results I'm getting don't make sense.

    For hisat2 & gsnap, flagstat is reporting the input read count (the top line of the output) as being more reads than are actually in the fastq files. Are the bamfiles generated by those aligners incompatible with flagstat?

    For star & tophat2 its saying the total input read counts are equal to the # mapped in every case (i.e. 100% mapped for all samples), but I think thats simply because the output bams are only the 'hits'. So that makes sense to me. I just don't understand the hisat2 & flagstat results.

    Can anyone explain what I'm seeing for hisat2 & gsnap? Is there an alternate tool I could use to report % reads mapped?

  • #2
    You'll want to count only the primary alignments (i.e., subtract secondary alignments from the flagstat output).

    Comment


    • #3
      Is there a convenient way to make samtools report the number of secondary alignments? I actually see that hisat2 reports all the numbers I need in its STDOUT (it had been captured in my job log file so I didn't see it originally). As with bowtie2 it seems to be reporting numbers properly as part of the STDOUT it makes. But for gsnap I don't see any such report.

      So for gsnap does that mean I have to re-build the bam file using a filter for not including non-primary alignments? I guess I would use:

      samtools view -F 0x256

      Do I also have to filter out 'supplementary reads' as well? I am always a bit confused by whether the flag values given by explain flags are AND or OR combinations, but if I check both non-primary & supplementary I get 0x2304, so I guess I need to use:

      samtools view -F 0x2304

      and build a new bam and then run flagstat? I don't have a lot of confidence in using the -f/F filter of samtools view since I was told at one point that the only flag I can really rely on is '4' (not mapped).

      Comment


      • #4
        If "samtools flagstat" doesn't output the number of secondary alignments (I don't recall the output off-hand), then the following will give you the number of primary alignments:

        Code:
        samtools view -cF 2308 alignments.bam
        Note that "-F 2304" and "-F 0x2304" are very different things. The "0x" means, "the rest is hexadecimal", so 0x2304 is 8964 in normal numbers, which isn't what you want.

        Comment


        • #5
          I guess I'd also like to know if these aligners can be set to only report non-supplementary, primary alignments (per query) in the output sam file. I think with bwa mem you can use the -M argument (at least to only report primary alignments, not sure what that does to supplementary alignments). And I think bowtie2 by default already only gives you primary, non-supplementary hits in the output. (At least with a default bowtie2 run the STDOUT report it generates seems to match the flagstat results, which also match my # of input reads).

          This seems like something that would be generally useful. In my case I would very frequently want only primary, non-supplementary hits in the output sam files they build.

          Comment


          • #6
            AFAIK, "bwa mem" is the only aligner that reports supplemental alignments. Many aligners will allow you to limit the number of secondary alignments reported (generally it's an option labelled something like "maximum number of alignments to report"). Most of the more experienced folks are piping the alignments through samtools anyway (i.e., directly writing BAM or CRAM files with no SAM intermediate), so filtering with it on the fly is quite simple.

            Comment


            • #7
              Thanks thats very helpful!

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:37 PM
              0 responses
              10 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 06:07 PM
              0 responses
              9 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              50 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              67 views
              0 likes
              Last Post seqadmin  
              Working...
              X