charon 07-31-2017 03:51 PM

what does "average quality" mean in samtools stat
I used samtools stats to measure some basic metrics of a input bam file and got the following results:

raw total sequences: 1105415
filtered sequences: 80516
sequences: 1024899
is sorted: 1
1st fragments: 1024899
last fragments: 0
reads mapped: 940001
reads mapped and paired: 0 # paired-end technology bit set + both mates mapped
reads unmapped: 84898
reads properly paired: 0 # proper-pair bit set
reads paired: 0 # paired-end technology bit set
reads duplicated: 0 # PCR or optical duplicate bit set
reads MQ0: 14800 # mapped and MQ=0
reads QC failed: 0
non-primary alignments: 0
total length: 5395194643 # ignores clipping
bases mapped: 4998712634 # ignores clipping
bases mapped (cigar): 4531562523 # more accurate
bases trimmed: 0
bases duplicated: 0
mismatches: 688215582 # from NM fields
error rate: 1.518716e-01 # mismatches / bases mapped (cigar)
average quality: 19.8
insert size average: 0.0

How was wondering how's the average quality calculated? (It's a bit higher than I expected) Is it related to read's mean base quality? i.e. For each read, calculate its mean base quality, and then take the average of all reads?

Thanks in advance!

dpryan 08-01-2017 11:20 PM

The base qualities for the whole file are summed and then that's divided by the total number of bases in the file.

charon 08-02-2017 10:24 AM

Makes sense. Thanks dpryan!

