View Single Post
Old 04-30-2013, 09:05 AM   #4
alexdobin
Senior Member
 
Location: NY

Join Date: Feb 2009
Posts: 161
Default

This appears to be an interesting case.
Here is how I assess this mapping statistics.

First check the uniquely mapped reads:
Average mapped length | 199.28 : good, close you your pair length of 202
Mismatch rate per base, % | 1.11% : a bit on the high side, you would get 0.5-0.8% for good libraries,
The splices are dominated by annotated and canonical, which is good.
The indel rate is low.
So, the reads that actually mapped uniquely - as few as they are - look fine.

The ratio of unique to multimappers is 7.28%/1.26% ~ 6 is somewhat high, that is - for typical human cells, I am not sure what are you sequencing. Our typical value is 15-20.

% of reads mapped to too many loci | 0.09% : by default "too many loci" is >10, but this number is good so you are not missing much.

Finally - most importantly - unmapped reads.
% of reads unmapped: too short | 1.19% : this number would be large if you had poor sequencing quality, it is surprisingly small (we typically get ~5%).

% of reads unmapped: other | 90.17% :
this where all the unmapped reads went and it is very unusual.

It means that for 90% of the reads STAR could not find good anchor seeds. Two main possibilities are:
1. Contamination. Most reads have very little homology with human genome. You can check it by BLASTing a few unmapped reads against everything.
2. Repeat regions dominate expression. The number of loci a seed could map to is limited by --winAnchorMultimapNmax = 50 by default. You could increase it to ~1000 to see if more reads get mapped (also increase --outFilterMultimapNmax to output them as multi-mappers).
alexdobin is offline   Reply With Quote