Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • imsharmanitin
    Postdoc Cancer Bioinformatics
    • Dec 2014
    • 17

    Differential expression using TopHat/Cufflinks, analysis by CumeRbund

    I am trying to do differential expression analysis for 6 samples with no replicates

    The protocol used by me after following several threads and http://www.nature.com/nprot/journal/....2012.016.html
    is as follows

    1) after steps like alignment, etc

    2) I assembled transcripts for each sample
    cufflinks -p 8 -o <output directory> accepted_hits.bam

    5) Ran Cuffmerge to create a single merged transcriptome annotation:
    cuffmerge -g Homo_sapiens.GRCh37.75.gtf -s Homo_sapiens.GRCh37.75.dna.primary_assembly.fa -p 8 assemblies.txt

    6) ran cuffdiff using following options:
    --library-type fr-unstranded
    --dispersion-method blind -L C1,C2,C3,C4,C5,C6
    -b ./index/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa
    -u ./merged_asm/merged.gtf

    7) used CumeRbund for analysis


    However, I was unable to get Ensemble id and instead got XLOC id for the genes.


    following the thread http://seqanswers.com/forums/showthread.php?t=18357
    I tried following two options

    1) featureNames(sigGenes)

    tracking_id gene_short_name
    1 XLOC_002638 HES4,RP11-54O7.17
    2 XLOC_005270 <NA>
    3 XLOC_005288 <NA>
    4 XLOC_005368 <NA>
    5 XLOC_007367 RP11-47A8.5
    6 XLOC_007664 <NA>
    7 XLOC_007703 <NA>

    But the names for some of XLOC were missing

    2) then i tired other method and i got error

    > names<-featureNames(sigGenes)
    > row.names(names)=names$tracking_id
    > sigGenesNames <-as.matrix(names)
    > sigGenesNames <- sigGenesNames [,-1]
    > sigGenesData<-diffData(sigGenes)
    > row.names(sigGenesData)= sigGenesData$gene_id
    Error in `row.names<-.data.frame`(`*tmp*`, value = c("XLOC_002638", "XLOC_002638", :
    duplicate 'row.names' are not allowed
    In addition: Warning message:
    non-unique values when setting 'row.names': ‘XLOC_002638’, ‘XLOC_005270’, [... truncated]


    Hence, I started to look into the ways to get Ensemble ids and the threads lead to lot of confusion and queries.

    1. should i use gtf file crated by cuffmerge or cuffcompare as input for cuffdiff ?

    2. while creating database using cummerbund should we use genome and gtf file?

    3. If the answer of 2nd question is yes, then should I use gtf from Ensemble or cuffmerge or cuffcomapre?

    4. how to get Ensemble id instead of XLOC id.

    5. Is there some error in my protocol?



    Thanks in advance.
  • csmatyi
    Member
    • Oct 2011
    • 25

    #2
    Hello everybody,

    I have a line from genes.fpkm_tracking from cuffdiff output:

    XLOC_000001 - - XLOC_000001 NM_008866 TSS1 chr1:4807892-4846735 - - 18.3357 13.5137 23.1577 OK 18.0846 13.4121 22.7572 OK

    Here the aggregated control sample value is 18.3357

    However, from the 3 control transcript.gtf files I have these 3 values:

    ../Nsg1-Hip_tophat/cufflinks_output_gtf/genes.fpkm_tracking:NM_008866 - - NM_008866 - - chr1:4807892-4846735 - - 16.1344 13.3363 18.9325 OK
    ../Nsg2-Hip_tophat/cufflinks_output_gtf/genes.fpkm_tracking:NM_008866 - - NM_008866 - - chr1:4807892-4846735 - - 15.4661 12.8039 18.1283 OK
    ../Nsg3-Hip_tophat/cufflinks_output_gtf/genes.fpkm_tracking:NM_008866 - - NM_008866 - - chr1:4807892-4846735 - - 13.0078 10.6329 15.3827 OK

    Here the aggregete value of 18.3 is larger than the individual values 16.1, 15.5, and 13.0. How does cuffdiff calculate the aggregate values from the 3 samples?
    Does it normalize? How does it do so?

    I've tried finding this in the Trapnell, 2012 Nature Protocol paper, but it doesn't say.

    Thanks, csmatyi

    Comment

    Latest Articles

    Collapse

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by SEQadmin2, 06-09-2026, 11:58 AM
    0 responses
    25 views
    0 reactions
    Last Post SEQadmin2  
    Started by SEQadmin2, 06-05-2026, 10:09 AM
    0 responses
    31 views
    0 reactions
    Last Post SEQadmin2  
    Started by SEQadmin2, 06-04-2026, 08:59 AM
    0 responses
    39 views
    0 reactions
    Last Post SEQadmin2  
    Started by SEQadmin2, 06-02-2026, 12:03 PM
    0 responses
    62 views
    0 reactions
    Last Post SEQadmin2  
    Working...