Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
Filtering "merge.gtf" (from cuffmerge) by redundance Ramon Vidal Bioinformatics 3 02-13-2012 06:47 AM
cuffmerge/cuffcompare: result contains no p_id martin_313 Bioinformatics 1 01-12-2012 03:23 AM
Questions about cuffmerge Jolin Bioinformatics 0 10-04-2011 12:46 AM
How to decide FPKM fo known genes zun RNA Sequencing 4 12-02-2010 03:41 PM
Binary characters in cuffcompare result & Questions on cuffdiff nkwuji RNA Sequencing 3 11-03-2010 02:01 AM

Thread Tools
Old 04-05-2012, 04:31 AM   #1
Junior Member
Location: china

Join Date: Mar 2011
Posts: 1
Question Questions on cutoff setting of FPKM value & know genes filtering in Cuffmerge result


I am running the Tuxedo protocol and trying to discover novel transcripts from RNA-seq data of several samples of mouse. As said in the protocol, I mapped the reads for each sample to the reference genome using Tophat (with -G parameter specified to guide the mapping process), and then assembled transcripts for each sample using Cufflinks (with -b and -u parameters specified to enable bias correction).

After that I ran Cuffmerge on all sample assembies to create a merged transcriptome (with -g and -s parameters specified). I would like to set a cutoff on the FPKM value to filter low expression transcripts(or background noise) for further investigation, but the FPKM values in the Cuffmerge output "transcripts.gtf" file seem to range from 0 to 1, even though the corresponding FPKM values in each separate sample assembly (the Cufflinks output "transcript.gtf" file) may present at the level of hundreds or even thousands. Did Cuffmerge go through some kind of normalization? Information on cuffmerge output in Cufflinks official website is very limited:

cuffmerge Output

cuffmerge produces a GTF file that contains an assembly that merges together the input assemblies.

So if I would stick to my plan and run the cuffmerge result through the FPKM filter, what value would be a appropriate threshold? Or should I apply the filter on each sample assembly (which will lead to another question that whether to keep or to leave out a transcript that is high expressed in one sample and low expressed in another)? Or should I use the combined.gtf from cuffcompare output instead?

Another thing is puzzling me is that if I want to filer out known genes(those annotated in UCSC,for example), can I feed the transcripts.gtf file previously built by cuffmerge and a GTF file that contain information on these genes to cuffcompare, and simply cross out transcripts marked with "class code" =, c, j, e in the resulting <outprefix>.tracking file(or otherwise keep those with "class code" u) ?

Class Codes

If you ran cuffcompare with the -r option, tracking rows will contain the following values. If you did not use -r, the rows will all contain "-" in their class code column.
Priority Code Description
1 = Complete match of intron chain
2 c Contained
3 j Potentially novel isoform (fragment): at least one splice junction is shared with a reference transcript
4 e Single exon transfrag overlapping a reference exon and at least 10 bp of a reference intron, indicating a possible pre-mRNA fragment.
5 i A transfrag falling entirely within a reference intron
6 o Generic exonic overlap with a reference transcript
7 p Possible polymerase run-on fragment (within 2Kbases of a reference transcript)
8 r Repeat. Currently determined by looking at the soft-masked reference sequence and applied to transcripts where at least 50% of the bases are lower case
9 u Unknown, intergenic transcript
10 x Exonic overlap with reference on the opposite strand
11 s An intron of the transfrag overlaps a reference intron on the opposite strand (likely due to read mapping errors)
12 . (.tracking file only, indicates multiple classifications)
Any suggestion would be greatly appreciated~
zhlingl is offline   Reply With Quote

cuffcompare, cufflinks, cuffmerge, tophat

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 09:31 PM.

Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.