View Single Post
Old 11-29-2011, 08:38 PM   #1
spapillon
Junior Member
 
Location: Montreal, Canada

Join Date: Nov 2011
Posts: 4
Default Problems with RNA-seq analysis results

Hi everyone,

This is my first post here, be sure to let me know if I break a rule of conduct or anything. Tricks & Tips are appreciated.

The situation:
I'm currently trying to analyze RNA-seq data from Illumina Body Map 2.0. I've built a pipeline that seems reasonable and used it to asses quality, trim, map, analyze RNA-seq data with the standard tools. The pipeline is for 100bp single end reads.

The pipeline:
Quality (fastx_tools for trimming and filtering, FastQC for reporting)
fastq_quality_filter -Q33 -q 20 -p 80 <FASTQC_FILE>
fastq_quality_trimmer -Q33 -t 20 -l 50 <FILTER_OUT>
fastqc <TRIM_OUT>

Assembling
tophat --solexa-quals <UCSC hg19 REF> <TRIM_OUT>

Analysis
cufflinks <TOPHAT_OUT>
cuffcompare -r <UCSC hg19 ANNOTATION> -R <CUFFLINKS_OUT>

The problems
The output of cuffcompare (cuffcmp.tacking) identifies:
13586 [23.32%] novel (class code j)
6127 [10.51%] intronic (class code i)
19145 [32.86%] contained (class code c)

In this sample, novel+intronic > contained. I'm highly dubious of the trustfulness of those results since one would not expect such high number of non previously reported transcripts. If anyone could point out a flaw in the pipeline or my interpretation of the obtained results I would greatly appreciate it. Do tell if I need to give more details on any part.

Best regards,

Simon

Last edited by spapillon; 11-29-2011 at 08:48 PM.
spapillon is offline   Reply With Quote