SEQanswers (
-   Bioinformatics (
-   -   FASTQ alignment metrics (RNA-Seq)? (

dan 07-03-2015 02:36 AM

FASTQ alignment metrics (RNA-Seq)?

How do people judge the quality of a FASTQ (short read) alignment? In particular I'm interested in evaluating RNA-Seq alignments, typically (but not exclusively) from ILLUMINA instruments.

What comes to mind is:
* Fraction of reads mapped
* Fraction of reads mapped uniquely
* Fraction of 'good' pairs (right orientation, right distance)

and for RNA-Seq specifically
* Fraction of reads mapping within a gene

Anything based on read mapping quality?

What other metrics can we think of?

annaprotasio 07-03-2015 02:56 AM

hi Dan,

Have a look at "samtools flagstat"

The output will looks something like this and I think it contains all the info you requested.


7276199 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
7276199 + 0 mapped (100.00%:-nan%)
7276199 + 0 paired in sequencing
3787000 + 0 read1
3489199 + 0 read2
6195536 + 0 properly paired (85.15%:-nan%)
6795026 + 0 with itself and mate mapped
481173 + 0 singletons (6.61%:-nan%)
480036 + 0 with mate mapped to a different chr
480036 + 0 with mate mapped to a different chr (mapQ>=5)

good luck

GenoMax 07-03-2015 03:37 AM

Also take a look at RSeQC:

Most aligners will produce stats on alignments e.g. BBMap, TopHat and probably STAR as well.

maxsalm 07-03-2015 05:25 AM

FastQC may also be of general use: http://www.bioinformatics.babraham.a...ojects/fastqc/

dan 07-03-2015 07:48 AM


Originally Posted by maxsalm (Post 176856)

I agree it's useful, but it's not what I want here.

jwfoley 07-03-2015 11:49 AM

How about proportion of duplicate fragments? This will depend on whether you've done single- or paired-end reads, though, since with single RNA-seq reads you do expect a certain amount of duplication by chance (with paired reads it's a much smaller chance).

bjackson 07-05-2015 02:53 PM

I do primarily single ended reads, but for alignment quality I look primarily at
1) pct of reads mapped
2) pct of reads uniquely mapped

It sounds like you are also asking about post-alignment qc in general and I add
3) read duplication (ie how many reads align to identical location) - most reads should have only one or several.
4) reads biotype distribution (most should map to protein-coding regions)
5) cumulative pct measures - I sort genes by count or fpkm and graph # of genes vs cumulative percentage. That will tell you if you are sinking a lot of reads into very common transcripts and tell you that you might need more depth to see certain less common transcripts.

All times are GMT -8. The time now is 01:04 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.