SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   Problems with RNA-seq analysis results (http://seqanswers.com/forums/showthread.php?t=15878)

spapillon 11-29-2011 08:38 PM

Problems with RNA-seq analysis results
 
Hi everyone,

This is my first post here, be sure to let me know if I break a rule of conduct or anything. Tricks & Tips are appreciated.

The situation:
I'm currently trying to analyze RNA-seq data from Illumina Body Map 2.0. I've built a pipeline that seems reasonable and used it to asses quality, trim, map, analyze RNA-seq data with the standard tools. The pipeline is for 100bp single end reads.

The pipeline:
Quality (fastx_tools for trimming and filtering, FastQC for reporting)
fastq_quality_filter -Q33 -q 20 -p 80 <FASTQC_FILE>
fastq_quality_trimmer -Q33 -t 20 -l 50 <FILTER_OUT>
fastqc <TRIM_OUT>

Assembling
tophat --solexa-quals <UCSC hg19 REF> <TRIM_OUT>

Analysis
cufflinks <TOPHAT_OUT>
cuffcompare -r <UCSC hg19 ANNOTATION> -R <CUFFLINKS_OUT>

The problems
The output of cuffcompare (cuffcmp.tacking) identifies:
13586 [23.32%] novel (class code j)
6127 [10.51%] intronic (class code i)
19145 [32.86%] contained (class code c)

In this sample, novel+intronic > contained. I'm highly dubious of the trustfulness of those results since one would not expect such high number of non previously reported transcripts. If anyone could point out a flaw in the pipeline or my interpretation of the obtained results I would greatly appreciate it. Do tell if I need to give more details on any part.

Best regards,

Simon


All times are GMT -8. The time now is 08:34 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.