Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
Comparing two BAM files using SAMtools khoops66 Bioinformatics 12 02-14-2012 07:06 PM
Reverse engineering BAM files: BAM -> FASTQ gene coder Bioinformatics 3 01-03-2012 02:42 PM
Counting mapped nucleotides in a bam file moriah Bioinformatics 7 08-22-2011 01:30 AM
How to get mapped reads ID from bam files using samtools? fabrice Bioinformatics 15 08-20-2011 06:03 PM
NEw to Chip-seq and have .bam/.sam/.bam.bai files... then what? NGS newbie Bioinformatics 11 05-25-2011 07:48 AM

Thread Tools
Old 01-16-2011, 10:11 PM   #1
Location: Japan

Join Date: Oct 2010
Posts: 52
Question Counting entries in bam-files: difficulties with samtools and bamToBed

I have a bam-file 'stuff.bam' and want to know how many sequences are described in it. I am not getting the same results when using samtools and bamToBed:
[prompt]$ ~/samtools-0.1.12a/samtools view -c stuff.bam 
...ok, now I count the entries in a conversion of stuff in bed-format:
[prompt]$ bamToBed -i stuff.bam | wc -l
Why does this give different results, and how can I get the correct number?

I am using
Azazel is offline   Reply With Quote
Old 01-17-2011, 01:31 AM   #2
Location: Vienna

Join Date: Nov 2010
Posts: 10
Default BAM files can contain mapped and unmapped reads

Dear Azazel,

Using samtools view you are counting the complete number of reads in your BAM file - be it mapped or unmapped reads.

$ samtools view data.bam | wc -l
Information about which read was mapped and how can be seen in the flag field of the BAM file. See more here and here

You could use this to count the different flag tags in your file:

$ samtools view data.bam | awk '{print $2}' | sort | uniq -c
10976575 0
10968005 16
5483180 4
So 5.5M reads do not map the genome at all, and roughly 11M map to the forward respectively the reverse strand of the genome you used to map the reads.

Using BamToBed you just count the number of mapped reads. A which hasn't been mapped obviously does not have any chromosome, start, and end information and is thus not taken into account.

So the following command will give you the total 22M reads which map to the genome.

$ bamToBed -i data/bowtie_remapped/input_0.5-2h_1_o.bam | wc -l
Hope this helps,

Cheers, droog_22
droog_22 is offline   Reply With Quote
Old 01-17-2011, 01:42 AM   #3
Senior Member
Location: Australia

Join Date: Sep 2008
Posts: 136

"samtools flagstat" is probably a better tool for the job, it will give a summary of all the reads in your bam file eg
$ samtools flagstat data_1.bam 
25038903 in total
0 QC failure
0 duplicates
16921771 mapped (67.58%)
0 paired in sequencing
0 read1
0 read2
0 properly paired (-nan%)
0 with itself and mate mapped
0 singletons (-nan%)
0 with mate mapped to a different chr
0 with mate mapped to a different chr (mapQ>=5)
frozenlyse is offline   Reply With Quote
Old 01-17-2011, 06:01 AM   #4
Location: Singapore

Join Date: Oct 2009
Posts: 44

What I like to di is to count the flags that are present:

samtools view *.bam | awk ' { per[$2] += 1 }END { for (i in per)print i, per[i] }

This command will show you which flags are present and how often they occur.
DerSeb is offline   Reply With Quote
Old 01-17-2011, 07:31 PM   #5
Location: Japan

Join Date: Oct 2010
Posts: 52

OK! I understand. Just wanted to say thanks for the answers.
Azazel is offline   Reply With Quote

samtools bedtools

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 12:03 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO