Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • nguyendofx
    Member
    • May 2011
    • 31

    samtools flagstat output

    Hello All,
    I ran the 'samtools flagstat' command, it generated output that doesn't look right. Does anyone have similar problem?

    18009870 in total
    0 QC failure
    0 duplicates
    13036424 mapped (72.38%)
    0 paired in sequencing
    0 read1
    0 read2
    0 properly paired (nan%)
    0 with itself and mate mapped
    0 singletons (nan%)
    0 with mate mapped to a different chr
    0 with mate mapped to a different chr (mapQ>=5)

    Thanks,
    Ng
  • aggp11
    Member
    • Jun 2011
    • 87

    #2
    Could you elaborate more on what you think "doesn't look right" ?

    Thanks,
    PA

    Comment

    • nguyendofx
      Member
      • May 2011
      • 31

      #3
      The total reads and mapped reads look good. However, its 0 (zeros) for all. It must has positive numbers. I think 'samtools flagstat' doesn't detect a 'flags' on a BAM file ?

      0 paired in sequencing
      0 read1
      0 read2
      0 properly paired (nan%)
      0 with itself and mate mapped
      0 singletons (nan%)
      0 with mate mapped to a different chr
      0 with mate mapped to a different chr (mapQ>=5)

      Comment

      • aggp11
        Member
        • Jun 2011
        • 87

        #4
        Sorry if this sounds dumb, but do you have paired-end or mate-pair sequencing data? because, if you have single-end data, then i assume, that all these columns would show 0.

        Thanks,
        PA

        Comment

        • nguyendofx
          Member
          • May 2011
          • 31

          #5
          It is PE, so I expect read_1 and read_2 both should have the same positive number, not zero! I can not explain why most of output are zeros.

          Comment

          • adaptivegenome
            Super Moderator
            • Nov 2009
            • 436

            #6
            Have you examined a couple regions of the BAM to see if the flag bits are correct for individual reads? How did you map the data?

            Comment

            • aggp11
              Member
              • Jun 2011
              • 87

              #7
              I have PE data as well and samtools flagstat works fine... Like genericforms said check if the flag bits are correct for the individual reads. May be there is something funky with the BAM file.

              Comment

              • swbarnes2
                Senior Member
                • May 2008
                • 910

                #8
                flagstat just reports what the sam file flags are. Whatever software made the .sam file left most of the flags empty. If, for instance, neither the 64 nor the 128 flag is set in any sequence, none of them are going to look like read 1 or read 2.

                What software made the .sam file? bwa samse?

                Comment

                • nguyendofx
                  Member
                  • May 2011
                  • 31

                  #9
                  Thank you all for your input.
                  I was using novoalign software. A raw sequences is in fastq format.

                  samtools flagstat TGACCA.sorted.bam
                  4267855 in total
                  0 QC failure
                  0 duplicates
                  3599261 mapped (84.33%)
                  0 paired in sequencing
                  0 read1
                  0 read2
                  0 properly paired (nan%)
                  0 with itself and mate mapped
                  0 singletons (nan%)
                  0 with mate mapped to a different chr
                  0 with mate mapped to a different chr (mapQ>=5)

                  # samtools view -F 64 TGACCA.sorted.bam |wc -l
                  4267855
                  #samtools view -F 128 TGACCA.sorted.bam |wc -l
                  4267855
                  #samtools view -F 4 TGACCA.sorted.bam |wc -l
                  3599261
                  # samtools view -f 4 Eyal2_TGACCA.sorted.bam |wc -l
                  668594 <== unmapped

                  It looks like flags are missing on a SAM/BAM file.

                  Thanks,
                  Ng

                  Comment

                  • adaptivegenome
                    Super Moderator
                    • Nov 2009
                    • 436

                    #10
                    Talk to the novoalign guys. They are active on SEQanswers and always happy to help!

                    Comment

                    • adaptivegenome
                      Super Moderator
                      • Nov 2009
                      • 436

                      #11
                      How exactly did you run novoalign? Post the command string.

                      Comment

                      • nguyendofx
                        Member
                        • May 2011
                        • 31

                        #12
                        # non_CLC]$ samtools view -h TGACCA.sorted.bam |grep @PG
                        @PG ID:novoalign VN:V2.07.11 CL:novoalign -k -F ILM1.8 -o SAM -f TGACCA.1.fastq -d ../ref/hg19.for_exome_seq_GATK.ndx

                        The alignment run on a cluster, it aligns files on both read and merge SAM/BAM file together.

                        Comment

                        • zee
                          NGS specialist
                          • Apr 2008
                          • 249

                          #13
                          Hi

                          You will need to specify two input files for novoalign to do a paired-end Illumina alignment e.g.

                          novoalign -d database -f file1.fastq file2.fastq [..other options..]

                          The order of the files is also important. file1.fastq contains the left-side (forward orientation) of the paired-end and file2.fastq contains reads from the right-side (reverse).
                          If you're doing Illumina mate-pair alignment you need to specify "-i MP "

                          The reason why you are seeing no paired reads in flagstat because novoalign did not set them due to the alignments being done as single end.

                          There are quite a few advanced options for novoalign that can make your life easier in this regard. Please do consult our user manual or wiki site.

                          Originally posted by nguyendofx View Post
                          # non_CLC]$ samtools view -h TGACCA.sorted.bam |grep @PG
                          @PG ID:novoalign VN:V2.07.11 CL:novoalign -k -F ILM1.8 -o SAM -f TGACCA.1.fastq -d ../ref/hg19.for_exome_seq_GATK.ndx

                          The alignment run on a cluster, it aligns files on both read and merge SAM/BAM file together.

                          Comment

                          • adaptivegenome
                            Super Moderator
                            • Nov 2009
                            • 436

                            #14
                            Thanks zee! See nguyendofx, I told you these guys respond quick!

                            Comment

                            • nguyendofx
                              Member
                              • May 2011
                              • 31

                              #15
                              Hi zee,

                              Thanks for input, you are right. The last novoalign command was a bit confuse you.

                              I run novoalign on our cluster. Here is a real command that I have used.

                              echo "novoalign -k -F ILM1.8 -o SAM -f $FILE -d $REF | samtools view -S -b -h -o $FILE.novoalign.bam - " | qsub -S /bin/bash -V -q $Q -cwd -N $NAME.novocall -t 1-$LENGTH -tc $P

                              where $FILE is all files for read_1 and read_2


                              Thanks,
                              Ng


                              Originally posted by zee View Post
                              Hi

                              You will need to specify two input files for novoalign to do a paired-end Illumina alignment e.g.

                              novoalign -d database -f file1.fastq file2.fastq [..other options..]

                              The order of the files is also important. file1.fastq contains the left-side (forward orientation) of the paired-end and file2.fastq contains reads from the right-side (reverse).
                              If you're doing Illumina mate-pair alignment you need to specify "-i MP "

                              The reason why you are seeing no paired reads in flagstat because novoalign did not set them due to the alignments being done as single end.

                              There are quite a few advanced options for novoalign that can make your life easier in this regard. Please do consult our user manual or wiki site.

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                06-02-2026, 10:05 AM
                              • SEQadmin2
                                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                by SEQadmin2


                                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                Introduction

                                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                05-22-2026, 06:42 AM
                              • SEQadmin2
                                Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                by SEQadmin2

                                Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                05-06-2026, 09:04 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Today, 08:59 AM
                              0 responses
                              10 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 12:03 PM
                              0 responses
                              21 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 11:40 AM
                              0 responses
                              17 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-28-2026, 11:40 AM
                              0 responses
                              31 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...