Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Questions on cutoff setting of FPKM value & know genes filtering in Cuffmerge result

    Hello~

    I am running the Tuxedo protocol and trying to discover novel transcripts from RNA-seq data of several samples of mouse. As said in the protocol, I mapped the reads for each sample to the reference genome using Tophat (with -G parameter specified to guide the mapping process), and then assembled transcripts for each sample using Cufflinks (with -b and -u parameters specified to enable bias correction).

    After that I ran Cuffmerge on all sample assembies to create a merged transcriptome (with -g and -s parameters specified). I would like to set a cutoff on the FPKM value to filter low expression transcripts(or background noise) for further investigation, but the FPKM values in the Cuffmerge output "transcripts.gtf" file seem to range from 0 to 1, even though the corresponding FPKM values in each separate sample assembly (the Cufflinks output "transcript.gtf" file) may present at the level of hundreds or even thousands. Did Cuffmerge go through some kind of normalization? Information on cuffmerge output in Cufflinks official website is very limited:

    cuffmerge Output

    cuffmerge produces a GTF file that contains an assembly that merges together the input assemblies.

    <outprefix>/merged.gtf
    So if I would stick to my plan and run the cuffmerge result through the FPKM filter, what value would be a appropriate threshold? Or should I apply the filter on each sample assembly (which will lead to another question that whether to keep or to leave out a transcript that is high expressed in one sample and low expressed in another)? Or should I use the combined.gtf from cuffcompare output instead?


    Another thing is puzzling me is that if I want to filer out known genes(those annotated in UCSC,for example), can I feed the transcripts.gtf file previously built by cuffmerge and a GTF file that contain information on these genes to cuffcompare, and simply cross out transcripts marked with "class code" =, c, j, e in the resulting <outprefix>.tracking file(or otherwise keep those with "class code" u) ?

    Class Codes

    If you ran cuffcompare with the -r option, tracking rows will contain the following values. If you did not use -r, the rows will all contain "-" in their class code column.
    Priority Code Description
    1 = Complete match of intron chain
    2 c Contained
    3 j Potentially novel isoform (fragment): at least one splice junction is shared with a reference transcript
    4 e Single exon transfrag overlapping a reference exon and at least 10 bp of a reference intron, indicating a possible pre-mRNA fragment.
    5 i A transfrag falling entirely within a reference intron
    6 o Generic exonic overlap with a reference transcript
    7 p Possible polymerase run-on fragment (within 2Kbases of a reference transcript)
    8 r Repeat. Currently determined by looking at the soft-masked reference sequence and applied to transcripts where at least 50% of the bases are lower case
    9 u Unknown, intergenic transcript
    10 x Exonic overlap with reference on the opposite strand
    11 s An intron of the transfrag overlaps a reference intron on the opposite strand (likely due to read mapping errors)
    12 . (.tracking file only, indicates multiple classifications)
    Any suggestion would be greatly appreciated~

Latest Articles

Collapse

  • seqadmin
    Strategies for Sequencing Challenging Samples
    by seqadmin


    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
    03-22-2024, 06:39 AM
  • seqadmin
    Techniques and Challenges in Conservation Genomics
    by seqadmin



    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

    Avian Conservation
    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
    03-08-2024, 10:41 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 06:37 PM
0 responses
12 views
0 likes
Last Post seqadmin  
Started by seqadmin, Yesterday, 06:07 PM
0 responses
10 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-22-2024, 10:03 AM
0 responses
52 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-21-2024, 07:32 AM
0 responses
68 views
0 likes
Last Post seqadmin  
Working...
X