View Single Post
Old 08-05-2011, 05:30 AM   #1
fabrice
Member
 
Location: paris

Join Date: Oct 2009
Posts: 86
Default Human Illumina Paired-end RNA-Seq remove duplication.

I am using Human Illumina Paired-end RNA-Seq. I analysis purpose is to
get expression of isoform level. Not for SNP calling.

When I used fastqc(0.94) to examin my RNA-seq data, I found that there
are very high duplication level in it. About 70% are duplication
repost by fastqc. So I tried to use Picard(1.50) to remove duplicate
reads.

The command is:

java -Xmx4g -jar ~/bin/picard/MarkDuplicates.jar REMOVE_DUPLICATES=true
INPUT=accepted_hits.bam OUTPUT=remove_accepted_hits.bam
METRICS_FILE=dup.txt

After run picard, I used fastqc to check again. It is better but it is
still have a high duplication level (63% duplication). Does it mean
picard do not work well or fastqc report have a problem?

I looked the output from Picard,
In the METRICS_FILE of picard output, the PERCENT_DUPLICATION is 0.312927.
But fastqc give the DUPLICATION level percent is 70%.

Why have this difference?


Thanks.
fabrice is offline   Reply With Quote