SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
Cutoffs for differential Expression analysis (DESeq and Cufflinks/Cuffdiff) rspitale RNA Sequencing 4 01-10-2013 06:50 AM
Differential expression of microRNAs with Cufflinks? gufranca Bioinformatics 2 07-30-2012 06:45 PM
Cufflinks differential expression problem Rachelly Bioinformatics 2 05-12-2011 12:08 AM
Differential expression analysis workflow in Cufflinks anna_vt Bioinformatics 4 12-19-2010 03:04 AM
help with differential gene expression with cufflinks and tophat waterboy Bioinformatics 1 11-28-2010 10:51 AM

Reply
 
Thread Tools
Old 07-28-2015, 09:08 AM   #1
imsharmanitin
Member
 
Location: cambridge,UK

Join Date: Dec 2014
Posts: 17
Cool Differential expression using TopHat/Cufflinks, analysis by CumeRbund

I am trying to do differential expression analysis for 6 samples with no replicates

The protocol used by me after following several threads and http://www.nature.com/nprot/journal/....2012.016.html
is as follows

1) after steps like alignment, etc

2) I assembled transcripts for each sample
cufflinks -p 8 -o <output directory> accepted_hits.bam

5) Ran Cuffmerge to create a single merged transcriptome annotation:
cuffmerge -g Homo_sapiens.GRCh37.75.gtf -s Homo_sapiens.GRCh37.75.dna.primary_assembly.fa -p 8 assemblies.txt

6) ran cuffdiff using following options:
--library-type fr-unstranded
--dispersion-method blind -L C1,C2,C3,C4,C5,C6
-b ./index/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa
-u ./merged_asm/merged.gtf

7) used CumeRbund for analysis


However, I was unable to get Ensemble id and instead got XLOC id for the genes.


following the thread http://seqanswers.com/forums/showthread.php?t=18357
I tried following two options

1) featureNames(sigGenes)

tracking_id gene_short_name
1 XLOC_002638 HES4,RP11-54O7.17
2 XLOC_005270 <NA>
3 XLOC_005288 <NA>
4 XLOC_005368 <NA>
5 XLOC_007367 RP11-47A8.5
6 XLOC_007664 <NA>
7 XLOC_007703 <NA>

But the names for some of XLOC were missing

2) then i tired other method and i got error

> names<-featureNames(sigGenes)
> row.names(names)=names$tracking_id
> sigGenesNames <-as.matrix(names)
> sigGenesNames <- sigGenesNames [,-1]
> sigGenesData<-diffData(sigGenes)
> row.names(sigGenesData)= sigGenesData$gene_id
Error in `row.names<-.data.frame`(`*tmp*`, value = c("XLOC_002638", "XLOC_002638", :
duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique values when setting 'row.names': ‘XLOC_002638’, ‘XLOC_005270’, [... truncated]


Hence, I started to look into the ways to get Ensemble ids and the threads lead to lot of confusion and queries.

  1. should i use gtf file crated by cuffmerge or cuffcompare as input for cuffdiff ?

  2. while creating database using cummerbund should we use genome and gtf file?

  3. If the answer of 2nd question is yes, then should I use gtf from Ensemble or cuffmerge or cuffcomapre?

  4. how to get Ensemble id instead of XLOC id.

  5. Is there some error in my protocol?


Thanks in advance.
imsharmanitin is offline   Reply With Quote
Old 08-17-2015, 09:57 AM   #2
csmatyi
Member
 
Location: Nebraska

Join Date: Oct 2011
Posts: 25
Default

Hello everybody,

I have a line from genes.fpkm_tracking from cuffdiff output:

XLOC_000001 - - XLOC_000001 NM_008866 TSS1 chr1:4807892-4846735 - - 18.3357 13.5137 23.1577 OK 18.0846 13.4121 22.7572 OK

Here the aggregated control sample value is 18.3357

However, from the 3 control transcript.gtf files I have these 3 values:

../Nsg1-Hip_tophat/cufflinks_output_gtf/genes.fpkm_tracking:NM_008866 - - NM_008866 - - chr1:4807892-4846735 - - 16.1344 13.3363 18.9325 OK
../Nsg2-Hip_tophat/cufflinks_output_gtf/genes.fpkm_tracking:NM_008866 - - NM_008866 - - chr1:4807892-4846735 - - 15.4661 12.8039 18.1283 OK
../Nsg3-Hip_tophat/cufflinks_output_gtf/genes.fpkm_tracking:NM_008866 - - NM_008866 - - chr1:4807892-4846735 - - 13.0078 10.6329 15.3827 OK

Here the aggregete value of 18.3 is larger than the individual values 16.1, 15.5, and 13.0. How does cuffdiff calculate the aggregate values from the 3 samples?
Does it normalize? How does it do so?

I've tried finding this in the Trapnell, 2012 Nature Protocol paper, but it doesn't say.

Thanks, csmatyi
csmatyi is offline   Reply With Quote
Reply

Tags
cuffcompare, cuffdiff, cuffmerge, cumerbund, rna-seq

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:06 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO