Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Questions on cutoff setting of FPKM value & know genes filtering in Cuffmerge result

    Hello~

    I am running the Tuxedo protocol and trying to discover novel transcripts from RNA-seq data of several samples of mouse. As said in the protocol, I mapped the reads for each sample to the reference genome using Tophat (with -G parameter specified to guide the mapping process), and then assembled transcripts for each sample using Cufflinks (with -b and -u parameters specified to enable bias correction).

    After that I ran Cuffmerge on all sample assembies to create a merged transcriptome (with -g and -s parameters specified). I would like to set a cutoff on the FPKM value to filter low expression transcripts(or background noise) for further investigation, but the FPKM values in the Cuffmerge output "transcripts.gtf" file seem to range from 0 to 1, even though the corresponding FPKM values in each separate sample assembly (the Cufflinks output "transcript.gtf" file) may present at the level of hundreds or even thousands. Did Cuffmerge go through some kind of normalization? Information on cuffmerge output in Cufflinks official website is very limited:

    cuffmerge Output

    cuffmerge produces a GTF file that contains an assembly that merges together the input assemblies.

    <outprefix>/merged.gtf
    So if I would stick to my plan and run the cuffmerge result through the FPKM filter, what value would be a appropriate threshold? Or should I apply the filter on each sample assembly (which will lead to another question that whether to keep or to leave out a transcript that is high expressed in one sample and low expressed in another)? Or should I use the combined.gtf from cuffcompare output instead?


    Another thing is puzzling me is that if I want to filer out known genes(those annotated in UCSC,for example), can I feed the transcripts.gtf file previously built by cuffmerge and a GTF file that contain information on these genes to cuffcompare, and simply cross out transcripts marked with "class code" =, c, j, e in the resulting <outprefix>.tracking file(or otherwise keep those with "class code" u) ?

    Class Codes

    If you ran cuffcompare with the -r option, tracking rows will contain the following values. If you did not use -r, the rows will all contain "-" in their class code column.
    Priority Code Description
    1 = Complete match of intron chain
    2 c Contained
    3 j Potentially novel isoform (fragment): at least one splice junction is shared with a reference transcript
    4 e Single exon transfrag overlapping a reference exon and at least 10 bp of a reference intron, indicating a possible pre-mRNA fragment.
    5 i A transfrag falling entirely within a reference intron
    6 o Generic exonic overlap with a reference transcript
    7 p Possible polymerase run-on fragment (within 2Kbases of a reference transcript)
    8 r Repeat. Currently determined by looking at the soft-masked reference sequence and applied to transcripts where at least 50% of the bases are lower case
    9 u Unknown, intergenic transcript
    10 x Exonic overlap with reference on the opposite strand
    11 s An intron of the transfrag overlaps a reference intron on the opposite strand (likely due to read mapping errors)
    12 . (.tracking file only, indicates multiple classifications)
    Any suggestion would be greatly appreciated~

Latest Articles

Collapse

  • seqadmin
    Techniques and Challenges in Conservation Genomics
    by seqadmin



    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

    Avian Conservation
    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
    03-08-2024, 10:41 AM
  • seqadmin
    The Impact of AI in Genomic Medicine
    by seqadmin



    Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
    02-26-2024, 02:07 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 03-14-2024, 06:13 AM
0 responses
34 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-08-2024, 08:03 AM
0 responses
72 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-07-2024, 08:13 AM
0 responses
81 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-06-2024, 09:51 AM
0 responses
68 views
0 likes
Last Post seqadmin  
Working...
X