![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
samtools flagstat output | manore | Bioinformatics | 11 | 07-24-2013 12:19 AM |
Samtools flagstat - no duplicates? | Orr Shomroni | Bioinformatics | 3 | 11-25-2011 01:46 AM |
SAMtools flagstat output interpretation | a2z@blr | Bioinformatics | 2 | 10-20-2011 02:23 PM |
Samtools flagstat | Anelda | Bioinformatics | 0 | 09-26-2011 04:55 AM |
samtools flagstat | bair | Bioinformatics | 3 | 05-28-2010 07:15 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: CA Join Date: May 2011
Posts: 31
|
![]()
Hello All,
I ran the 'samtools flagstat' command, it generated output that doesn't look right. Does anyone have similar problem? 18009870 in total 0 QC failure 0 duplicates 13036424 mapped (72.38%) 0 paired in sequencing 0 read1 0 read2 0 properly paired (nan%) 0 with itself and mate mapped 0 singletons (nan%) 0 with mate mapped to a different chr 0 with mate mapped to a different chr (mapQ>=5) Thanks, Ng |
![]() |
![]() |
![]() |
#2 |
Member
Location: Wisconsin Join Date: Jun 2011
Posts: 87
|
![]()
Could you elaborate more on what you think "doesn't look right" ?
Thanks, PA |
![]() |
![]() |
![]() |
#3 |
Member
Location: CA Join Date: May 2011
Posts: 31
|
![]()
The total reads and mapped reads look good. However, its 0 (zeros) for all. It must has positive numbers. I think 'samtools flagstat' doesn't detect a 'flags' on a BAM file ?
0 paired in sequencing 0 read1 0 read2 0 properly paired (nan%) 0 with itself and mate mapped 0 singletons (nan%) 0 with mate mapped to a different chr 0 with mate mapped to a different chr (mapQ>=5) |
![]() |
![]() |
![]() |
#4 |
Member
Location: Wisconsin Join Date: Jun 2011
Posts: 87
|
![]()
Sorry if this sounds dumb, but do you have paired-end or mate-pair sequencing data? because, if you have single-end data, then i assume, that all these columns would show 0.
Thanks, PA |
![]() |
![]() |
![]() |
#5 |
Member
Location: CA Join Date: May 2011
Posts: 31
|
![]()
It is PE, so I expect read_1 and read_2 both should have the same positive number, not zero! I can not explain why most of output are zeros.
|
![]() |
![]() |
![]() |
#6 |
Super Moderator
Location: US Join Date: Nov 2009
Posts: 437
|
![]()
Have you examined a couple regions of the BAM to see if the flag bits are correct for individual reads? How did you map the data?
|
![]() |
![]() |
![]() |
#7 |
Member
Location: Wisconsin Join Date: Jun 2011
Posts: 87
|
![]()
I have PE data as well and samtools flagstat works fine... Like genericforms said check if the flag bits are correct for the individual reads. May be there is something funky with the BAM file.
|
![]() |
![]() |
![]() |
#8 |
Senior Member
Location: San Diego Join Date: May 2008
Posts: 912
|
![]()
flagstat just reports what the sam file flags are. Whatever software made the .sam file left most of the flags empty. If, for instance, neither the 64 nor the 128 flag is set in any sequence, none of them are going to look like read 1 or read 2.
What software made the .sam file? bwa samse? |
![]() |
![]() |
![]() |
#9 |
Member
Location: CA Join Date: May 2011
Posts: 31
|
![]()
Thank you all for your input.
I was using novoalign software. A raw sequences is in fastq format. samtools flagstat TGACCA.sorted.bam 4267855 in total 0 QC failure 0 duplicates 3599261 mapped (84.33%) 0 paired in sequencing 0 read1 0 read2 0 properly paired (nan%) 0 with itself and mate mapped 0 singletons (nan%) 0 with mate mapped to a different chr 0 with mate mapped to a different chr (mapQ>=5) # samtools view -F 64 TGACCA.sorted.bam |wc -l 4267855 #samtools view -F 128 TGACCA.sorted.bam |wc -l 4267855 #samtools view -F 4 TGACCA.sorted.bam |wc -l 3599261 # samtools view -f 4 Eyal2_TGACCA.sorted.bam |wc -l 668594 <== unmapped It looks like flags are missing on a SAM/BAM file. Thanks, Ng |
![]() |
![]() |
![]() |
#10 |
Super Moderator
Location: US Join Date: Nov 2009
Posts: 437
|
![]()
Talk to the novoalign guys. They are active on SEQanswers and always happy to help!
|
![]() |
![]() |
![]() |
#11 |
Super Moderator
Location: US Join Date: Nov 2009
Posts: 437
|
![]()
How exactly did you run novoalign? Post the command string.
|
![]() |
![]() |
![]() |
#12 |
Member
Location: CA Join Date: May 2011
Posts: 31
|
![]()
# non_CLC]$ samtools view -h TGACCA.sorted.bam |grep @PG
@PG ID:novoalign VN:V2.07.11 CL:novoalign -k -F ILM1.8 -o SAM -f TGACCA.1.fastq -d ../ref/hg19.for_exome_seq_GATK.ndx The alignment run on a cluster, it aligns files on both read and merge SAM/BAM file together. |
![]() |
![]() |
![]() |
#13 | |
NGS specialist
Location: Malaysia Join Date: Apr 2008
Posts: 249
|
![]()
Hi
You will need to specify two input files for novoalign to do a paired-end Illumina alignment e.g. novoalign -d database -f file1.fastq file2.fastq [..other options..] The order of the files is also important. file1.fastq contains the left-side (forward orientation) of the paired-end and file2.fastq contains reads from the right-side (reverse). If you're doing Illumina mate-pair alignment you need to specify "-i MP " The reason why you are seeing no paired reads in flagstat because novoalign did not set them due to the alignments being done as single end. There are quite a few advanced options for novoalign that can make your life easier in this regard. Please do consult our user manual or wiki site. Quote:
|
|
![]() |
![]() |
![]() |
#14 |
Super Moderator
Location: US Join Date: Nov 2009
Posts: 437
|
![]()
Thanks zee! See nguyendofx, I told you these guys respond quick!
|
![]() |
![]() |
![]() |
#15 | |
Member
Location: CA Join Date: May 2011
Posts: 31
|
![]()
Hi zee,
Thanks for input, you are right. The last novoalign command was a bit confuse you. I run novoalign on our cluster. Here is a real command that I have used. echo "novoalign -k -F ILM1.8 -o SAM -f $FILE -d $REF | samtools view -S -b -h -o $FILE.novoalign.bam - " | qsub -S /bin/bash -V -q $Q -cwd -N $NAME.novocall -t 1-$LENGTH -tc $P where $FILE is all files for read_1 and read_2 Thanks, Ng Quote:
|
|
![]() |
![]() |
![]() |
#16 | |
NGS specialist
Location: Malaysia Join Date: Apr 2008
Posts: 249
|
![]()
OK, I can understand what you're doing here but if your $FILE represents "read_1 read_2" then your output BAM file will be called "read_1 read_2".bam which is going to cause some problems for you. Note that novoalign only recognizes paired-end reads from them being in separate files so you cannot concatenate them and assume it's going to pair them together.
Quote:
|
|
![]() |
![]() |
![]() |
#17 | |
Member
Location: CA Join Date: May 2011
Posts: 31
|
![]()
Yes, Every $FILE.fastq input file generates $FILE.bam. I view a BAM file on IGV, it shows both reads ???
The problem could be either 'samtool flagstat' don't understand flags on a BAM file or BAM/SAM file doesn't have flags ? Thanks, Ng Quote:
|
|
![]() |
![]() |
![]() |
#18 |
Senior Member
Location: San Diego Join Date: May 2008
Posts: 912
|
![]()
flagstat understands flags just fine. And right, your bam doesn't have flags. It can't possibly have the normal paired end flags, because you told Novoalign, the software making the .sam, that you didn't have paired end data!
The problem is not with the software. The problem is with the command line you wrote. Software isn't magic. It can't read your mind, and know that you have pairs of fastqs which are pairs. You have to tell it which fastq file is paired with whch other one. It takes five seconds to see how to do that on the novoalign quick start page, and your command line isn't going to do that. |
![]() |
![]() |
![]() |
#19 | |
Super Moderator
Location: US Join Date: Nov 2009
Posts: 437
|
![]() Quote:
![]() |
|
![]() |
![]() |
![]() |
#20 |
Member
Location: CA Join Date: May 2011
Posts: 31
|
![]()
Thanks you all. Its all good now
|
![]() |
![]() |
![]() |
Thread Tools | |
|
|