Unconfigured Ad

**maubp** · 10-12-2011, 10:11 AM

samtools idxstats?

**cascoamarillo** · 10-12-2011, 01:00 PM

Hi,

When I use this command, I get this:

ref 12398 1478064 0
* 0 0 8139515

with the reference sequence name, sequence length, # mapped reads and # unmapped reads. What is the meaning of the "0" and the "*"? (if they have any).

best

**maubp** · 10-12-2011, 01:04 PM

idxstats samtools idxstats <aln.bam>
Retrieve and print stats in the index file. The output is TAB delimited with each line consisting of reference sequence name, sequence length, # mapped reads and # unmapped reads.

The final line is the unmapped reads (* instead of a reference sequence name, zero length).

**cascoamarillo** · 10-12-2011, 01:34 PM

Thanks, now I get it. But one more question; what if I want to know the total number of reads mapped in a multi-sequence (several contigs) reference?
i.e.:

samtools idxstats <aln.bam>

contig00001 22002 147 0
contig00002 19783 23 0
contig00003 19528 25 0
contig00004 17742 192 0
contig00005 16684 35 0
.
.
.
contig61681 100 0 0
contig61682 100 0 0
contig61684 100 0 0
contig61685 100 0 0
contig61686 100 0 0
* 0 0 2005333

Is there a way to see the sum of all the mapped reads.

thanks in advance.

**maubp** · 10-12-2011, 02:36 PM

Add up all the values in column 3 (it doesn't hurt to include the final row as it has zero there) for the number of mapped reads.

That's trivial in Perl / Python / etc. You can do it with a Unix one liner too, this is one way using the cut command to select just column 3, and awk for counting:

Code:

samtools idxstats example.bam | cut -f3 | awk 'BEGIN {total=0} {total += $1} END {print total}'

There may well be a neater Unix solution...

**maubp** · 10-19-2011, 05:08 AM

Also you can use 'samtools flagstat' as pointed out by av_d on this thread:

samtools idxstats - SEQanswers

http://seqanswers.com/forums/showpost.php?p=54361&postcount=4

Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

**ruchira** · 01-20-2012, 12:47 AM

With the human genome as reference, when I run samtools idxstat I get a count of unmapped reads for each chromosome. But if the read was unmapped, how was it assigned to a particular chromosome?

**swbarnes2** · 01-20-2012, 09:33 AM

Originally posted by ruchira View Post

With the human genome as reference, when I run samtools idxstat I get a count of unmapped reads for each chromosome. But if the read was unmapped, how was it assigned to a particular chromosome?

Most likely, its mate mapped to that chromosome. Sam specs call for unmapped reads to be given the mapping coordaintes of their mates, when their mates mapped. So the read has both the unampped flag set, and a mapping position.

**ruchira** · 01-20-2012, 10:04 AM

Thanks very much for your answer, swbarnes2. I had used bwa align and then bwa sampe to generate the sam files. So I guess perhaps bwa sampe included the chromosome information but still listed the reads as unmapped.

I expected some reads to be completely unmapped (because they were not human DNA). Where would these appear in the idxstats output?

**nchernia** · 02-02-2012, 08:07 AM

I wanted the same information recently and thought I'd post my steps. My output from BWA is in .sam format, so first it needs to be converted to bam, then sorted, then indexed. Not everyone will need to sort.

Code:

samtools view -Sb filename.sam > filename.bam
samtools sort filename.bam filename_sorted
samtools index filename_sorted.bam

Then I run this awk command. The "cut" of the previous is unnecessary since you can just read the appropriate field via awk.

Code:

samtools idxstats filename_sorted.bam | awk 'BEGIN {a=0;b=0} {a += $3; b+=$4 } END{print a " mapped " b " unmapped "}'

**wmyashar** · 06-21-2012, 03:57 PM

Is there a way to utilize samtools idxstats so i can read the amount of mapped/unmapped reads to particular genes rather than the entire chromosome itself? I get an output like this:
chr1 197195432 2022423 0
chr2 181748087 1915486 0
chr3 159599783 1418344 0
chr4 155630120 1352178 0
chr5 152537259 1526530 0
.....
thank you in advance

**swbarnes2** · 06-22-2012, 10:51 AM

Originally posted by wmyashar View Post

Is there a way to utilize samtools idxstats so i can read the amount of mapped/unmapped reads to particular genes rather than the entire chromosome itself? I get an output like this:
chr1 197195432 2022423 0
chr2 181748087 1915486 0
chr3 159599783 1418344 0
chr4 155630120 1352178 0
chr5 152537259 1526530 0
.....
thank you in advance

You want to filter your .bam based on those coordiantes, then count. I think samtools view can take a .bed file and do that, so can BEDTools.

**clsppb** · 08-01-2012, 09:49 AM

"Mapping" unmapped reads

Originally posted by swbarnes2 View Post

Most likely, its mate mapped to that chromosome. Sam specs call for unmapped reads to be given the mapping coordaintes of their mates, when their mates mapped. So the read has both the unampped flag set, and a mapping position.

By "its mate" are you referring to paired-end protocols?

**Gonza** · 10-29-2014, 05:34 AM

Hi all,

after sorting, indexing and counting reads that align to chromosomes in a BAM file i get this (example for one library only):

1 | 30,427,671 | 5,913,901 | 0
2 | 19,698,289 | 3,386,635 | 0
3 | 23,459,830 | 4,784,837 | 0
4 | 18,585,056 | 3,292,873 | 0
5 | 26,975,502 | 5,032,188 | 0
Mt | 366,924 | 37,747 | 0
Pt | 154,478 | 9,107 | 0
* 0 0 0

First column Arabidopsis chromosomes, 2nd chromosomes length (in bp), 3rd mapped reads and 4th unmapped. Correct?

Now, is it possible, for example, that in chr 1 i only mapped 5 million reads and I have 0 unmpped for all? Looks odd to me.
Any comments.
Thanks.

Topics	Statistics	Last Post
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 21 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 40 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 46 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 49 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM

Unconfigured Ad

[BWA] Calculate % of reads not aligned to reference

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News