Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • samtools flagstat output

    Hello All,
    I ran the 'samtools flagstat' command, it generated output that doesn't look right. Does anyone have similar problem?

    18009870 in total
    0 QC failure
    0 duplicates
    13036424 mapped (72.38%)
    0 paired in sequencing
    0 read1
    0 read2
    0 properly paired (nan%)
    0 with itself and mate mapped
    0 singletons (nan%)
    0 with mate mapped to a different chr
    0 with mate mapped to a different chr (mapQ>=5)

    Thanks,
    Ng

  • #2
    Could you elaborate more on what you think "doesn't look right" ?

    Thanks,
    PA

    Comment


    • #3
      The total reads and mapped reads look good. However, its 0 (zeros) for all. It must has positive numbers. I think 'samtools flagstat' doesn't detect a 'flags' on a BAM file ?

      0 paired in sequencing
      0 read1
      0 read2
      0 properly paired (nan%)
      0 with itself and mate mapped
      0 singletons (nan%)
      0 with mate mapped to a different chr
      0 with mate mapped to a different chr (mapQ>=5)

      Comment


      • #4
        Sorry if this sounds dumb, but do you have paired-end or mate-pair sequencing data? because, if you have single-end data, then i assume, that all these columns would show 0.

        Thanks,
        PA

        Comment


        • #5
          It is PE, so I expect read_1 and read_2 both should have the same positive number, not zero! I can not explain why most of output are zeros.

          Comment


          • #6
            Have you examined a couple regions of the BAM to see if the flag bits are correct for individual reads? How did you map the data?

            Comment


            • #7
              I have PE data as well and samtools flagstat works fine... Like genericforms said check if the flag bits are correct for the individual reads. May be there is something funky with the BAM file.

              Comment


              • #8
                flagstat just reports what the sam file flags are. Whatever software made the .sam file left most of the flags empty. If, for instance, neither the 64 nor the 128 flag is set in any sequence, none of them are going to look like read 1 or read 2.

                What software made the .sam file? bwa samse?

                Comment


                • #9
                  Thank you all for your input.
                  I was using novoalign software. A raw sequences is in fastq format.

                  samtools flagstat TGACCA.sorted.bam
                  4267855 in total
                  0 QC failure
                  0 duplicates
                  3599261 mapped (84.33%)
                  0 paired in sequencing
                  0 read1
                  0 read2
                  0 properly paired (nan%)
                  0 with itself and mate mapped
                  0 singletons (nan%)
                  0 with mate mapped to a different chr
                  0 with mate mapped to a different chr (mapQ>=5)

                  # samtools view -F 64 TGACCA.sorted.bam |wc -l
                  4267855
                  #samtools view -F 128 TGACCA.sorted.bam |wc -l
                  4267855
                  #samtools view -F 4 TGACCA.sorted.bam |wc -l
                  3599261
                  # samtools view -f 4 Eyal2_TGACCA.sorted.bam |wc -l
                  668594 <== unmapped

                  It looks like flags are missing on a SAM/BAM file.

                  Thanks,
                  Ng

                  Comment


                  • #10
                    Talk to the novoalign guys. They are active on SEQanswers and always happy to help!

                    Comment


                    • #11
                      How exactly did you run novoalign? Post the command string.

                      Comment


                      • #12
                        # non_CLC]$ samtools view -h TGACCA.sorted.bam |grep @PG
                        @PG ID:novoalign VN:V2.07.11 CL:novoalign -k -F ILM1.8 -o SAM -f TGACCA.1.fastq -d ../ref/hg19.for_exome_seq_GATK.ndx

                        The alignment run on a cluster, it aligns files on both read and merge SAM/BAM file together.

                        Comment


                        • #13
                          Hi

                          You will need to specify two input files for novoalign to do a paired-end Illumina alignment e.g.

                          novoalign -d database -f file1.fastq file2.fastq [..other options..]

                          The order of the files is also important. file1.fastq contains the left-side (forward orientation) of the paired-end and file2.fastq contains reads from the right-side (reverse).
                          If you're doing Illumina mate-pair alignment you need to specify "-i MP "

                          The reason why you are seeing no paired reads in flagstat because novoalign did not set them due to the alignments being done as single end.

                          There are quite a few advanced options for novoalign that can make your life easier in this regard. Please do consult our user manual or wiki site.

                          Originally posted by nguyendofx View Post
                          # non_CLC]$ samtools view -h TGACCA.sorted.bam |grep @PG
                          @PG ID:novoalign VN:V2.07.11 CL:novoalign -k -F ILM1.8 -o SAM -f TGACCA.1.fastq -d ../ref/hg19.for_exome_seq_GATK.ndx

                          The alignment run on a cluster, it aligns files on both read and merge SAM/BAM file together.

                          Comment


                          • #14
                            Thanks zee! See nguyendofx, I told you these guys respond quick!

                            Comment


                            • #15
                              Hi zee,

                              Thanks for input, you are right. The last novoalign command was a bit confuse you.

                              I run novoalign on our cluster. Here is a real command that I have used.

                              echo "novoalign -k -F ILM1.8 -o SAM -f $FILE -d $REF | samtools view -S -b -h -o $FILE.novoalign.bam - " | qsub -S /bin/bash -V -q $Q -cwd -N $NAME.novocall -t 1-$LENGTH -tc $P

                              where $FILE is all files for read_1 and read_2


                              Thanks,
                              Ng


                              Originally posted by zee View Post
                              Hi

                              You will need to specify two input files for novoalign to do a paired-end Illumina alignment e.g.

                              novoalign -d database -f file1.fastq file2.fastq [..other options..]

                              The order of the files is also important. file1.fastq contains the left-side (forward orientation) of the paired-end and file2.fastq contains reads from the right-side (reverse).
                              If you're doing Illumina mate-pair alignment you need to specify "-i MP "

                              The reason why you are seeing no paired reads in flagstat because novoalign did not set them due to the alignments being done as single end.

                              There are quite a few advanced options for novoalign that can make your life easier in this regard. Please do consult our user manual or wiki site.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              51 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              67 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X