Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
DEG for paired samples, biological replicates hypatia RNA Sequencing 24 05-07-2014 01:25 AM
running Cuffdiff with biological replicates Jane M Bioinformatics 13 07-15-2013 07:55 AM
Biological replicates with cuffdiff, cummeRbund turnersd Bioinformatics 15 11-19-2012 05:59 AM
using Cuffdiff with biological replicates Jane M RNA Sequencing 0 09-01-2011 12:42 AM
cuffdiff with biological replicates PFS Bioinformatics 1 06-14-2011 06:51 PM

Thread Tools
Old 07-02-2013, 10:50 PM   #1
Location: haifa israel

Join Date: Jun 2011
Posts: 62
Default Cuffdiff with biological replicates vs individual samples not adding up.

I am performing differential expression analysis of three different experimental conditions, each of which has 4 biological replicates, and was trying to do this using cuffdiff. We are really only interested initially in looking at currently annotated genes. I am using the Ensembl genome and gtf files from iGenomes.

STEP 1: tophat
I initially (after quality filtering fastq files) ran tophat with the following command:
For the first run (to make a transcriptome index from the GTF file):
tophat -p 8 -G Ensembl_genes.gtf --transcriptome-index=GTF_transcriptome_index/known -o SW-1B-1.tophat --no-novel-juncs --library-type=fr-firststrand genome SW-1B-1_AC_QF_FINAL.fastq;

for the rest of the 11 tophat runs using previously built transcriptome index
for k in {2..12}; do
tophat -p 8 --transcriptome-index=GTF_transcriptome_index/known -o SW-1B-$k.tophat --no-novel-juncs --library-type=fr-firststrand genome SW-1B-$k.AC_QF_FINAL.fastq;

STEP 2: cuffdiff
Then I ran cuffdiff in one of two ways. Firstly, specifying three replicate groups (named T,P,N) as such:
cuffdiff -o Diff_OUT -v -b genome.fa -p 8 -L T,P,N -u Ensembl_genes.gtf \
./SW-1B-1.tophat/accepted_hits.bam,./SW-1B-4.tophat/accepted_hits.bam,./SW-1B-7.tophat/accepted_hits.bam,./SW-1B-10.tophat/accepted_hits.bam \
./SW-1B-2.tophat/accepted_hits.bam,./SW-1B-5.tophat/accepted_hits.bam,./SW-1B-8.tophat/accepted_hits.bam,./SW-1B-11.tophat/accepted_hits.bam \

Secondly as we wanted to see the FPKMs of each sample separately we ran cuffdiff on each sample separately:
cuffdiff -o Diff_OUT_separate -v -b genome.fa -p 8 -L 1T,2T,3T,4T,1P,2P,3P,4P,1N,2N,3N,4N -u Ensembl_genes.gtf \
./SW-1B-1.tophat/accepted_hits.bam \
./SW-1B-4.tophat/accepted_hits.bam \
./SW-1B-7.tophat/accepted_hits.bam \
./SW-1B-10.tophat/accepted_hits.bam \
./SW-1B-2.tophat/accepted_hits.bam \
./SW-1B-5.tophat/accepted_hits.bam \
./SW-1B-8.tophat/accepted_hits.bam \
./SW-1B-11.tophat/accepted_hits.bam \
./SW-1B-3.tophat/accepted_hits.bam \
./SW-1B-6.tophat/accepted_hits.bam \
./SW-1B-9.tophat/accepted_hits.bam \

Step3: confusion!
Now I am really confused: The analysis using replicates as three groups gave us only a handful of significantly expressed genes.

When I look at the FPKMs of these genes in the cuffdiff which ran each sample separately- they don’t add up to what I am seeing when the samplse are grouped as biological replicates.

The gene NCKX1 was supposed to be of average FPKM 2 in T and 7.77 in P; in the cuffdiff run separately this seems to match up OK with the individual FPKMs.
But the gene RGD1359290 was supposed to go from 2.6 in T to 18.8 in P and 17.4 in N; in the individual sampes nothing is even remotely near those values in P, and there is only one high FPKM value in N.

Would love to know what is going on and whether I can trust these data!
Attached Files
File Type: pdf temp.pdf (43.9 KB, 9 views)
Noa is offline   Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 12:51 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO