SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Biological replicates with cuffdiff, cummeRbund turnersd Bioinformatics 15 11-19-2012 05:59 AM
using Cuffdiff with biological replicates Jane M RNA Sequencing 0 09-01-2011 12:42 AM
overdispersion and biological replicates shilez Bioinformatics 3 08-29-2011 07:43 AM
overdispersion and biological replicates shilez RNA Sequencing 0 08-25-2011 06:10 PM
cuffdiff with biological replicates PFS Bioinformatics 1 06-14-2011 06:51 PM

Reply
 
Thread Tools
Old 09-04-2011, 11:29 PM   #1
Jane M
Senior Member
 
Location: Paris

Join Date: Aug 2011
Posts: 239
Question running Cuffdiff with biological replicates

Hello,

I have posted a question about Cuffdiff last week in this forum, in the subpart "rna-sequencing", but I think this place is more appropriate, since I have seen several topics about it.

Nevertheless, I haven't found information about the use of Cuffdiff with biological replicates.

I try to use Cuffdiff to find differentially expressed isoforms between 2 conditions (Raman - HM1). There are 3 replicates per condition. There are 5295 genes and 6432 isoforms.
In the documentation, it's written "If you have more than one replicate for a sample, supply the SAM files for the sample as a single comma-separated list".
So I ran Cuffdiff this way:

Quote:
cuffdiff -L Raman,HM1 -N --FDR 0.05 /path/fichier.gtf /path/Cufflinks/L2/accepted_hits.bam,/path/Cufflinks/L4/accepted_hits.bam,/path/Cufflinks/L6/accepted_hits.bam /path/Cufflinks/L3/accepted_hits.bam,/path/Cufflinks/L7/accepted_hits.bam,/path/Cufflinks/L8/accepted_hits.bam
The replicates for the condition Raman are L2, L4 and L6. The replicates for the condition HM1 are L3, L7 and L8.
I'm interested in the output files isoform_exp.diff:

Code:
test_id	gene	locus	sample_1	sample_2	status	value_1	value_2	ln(fold_change)	test_stat	p_value	significant
EHI_000010.ref	-	DS571600:2419-3622	q1	q2	NOTEST	8.28385	21.4211	0.950069	-2.32208	0.0202284	no
EHI_000130.alt1	-	DS571600:7792-8309	q1	q2	OK	108.42	6.20207	-2.86113	4.25521	2.08856e-05	yes
EHI_000130.ref	-	DS571600:7792-8309	q1	q2	OK	1152.64	2299.79	0.690763	-16.1849	0	yes
EHI_000240.ref	-	DS571186:1554-2669	q1	q2	OK	558.654	857.323	0.428284	-7.87676	3.33067e-15	yes
EHI_000250.ref	-	DS571186:2850-3551	q1	q2	OK	134.444	301.066	0.80618	-7.77203	7.77156e-15	yes
...
EHI_C00159.ref	-	EH4264:45-392	q1	q2	NOTEST	11.8076	27.8643	0.858599	-2.4726	0.0134133	no
EHI_000010.ref	-	DS571600:2419-3622	q1	q3	NOTEST	8.28385	14.2453	0.542123	-1.24073	0.214705	no
EHI_000130.alt1	-	DS571600:7792-8309	q1	q3	OK	108.42	4.26615	-3.2353	4.30991	1.63319e-05	yes
EHI_000130.ref	-	DS571600:7792-8309	q1	q3	OK	1152.64	1736.74	0.409956	-9.25409	0	yes
...
EHI_C00155.ref	-	DS571588:780-994	q5	q6	OK	4382.71	4702.45	0.0704184	-3.35392	0.000796741	yes
EHI_C00156.ref	-	DS571646:5447-6016	q5	q6	NOTEST	12.2637	40.6463	1.19826	-3.67794	0.000235123	no
EHI_C00157.ref	-	DS571705:1482-1899	q5	q6	OK	776.201	1585.64	0.714329	-16.3066	0	yes
EHI_C00159.ref	-	EH4264:45-392	q5	q6	NOTEST	9.16532	4.74531	-0.65827	1.16396	0.244442	no
Here is my problem. The file contains 96480 (15*6432) lines (instead of 6432). 15 is the number of combinations between the 6 data sets... I think Cuffdiff did not consider the 3 biological replicates as one condition !
Moreover, I don't know why I don't have the adjusted p-value information...

Did someone use Cuffdiff with replicates to get differentially expressed isoforms ?
Thanks for your help,
Jane
Jane M is offline   Reply With Quote
Old 09-07-2011, 11:35 PM   #2
Jane M
Senior Member
 
Location: Paris

Join Date: Aug 2011
Posts: 239
Default

My problem came from the version of Cuffdiff. To handle replicates, the version 1.0.0 at least should be used !
Finally, I got the "right files" and the following output, with the command line :

Code:
 cuffdiff /path/fichier_isoem.gtf -N -u -L Raman,HM1 /path/Cufflinks/L2/accepted_hits.bam,/path/Cufflinks/L4/accepted_hits.bam,/path/Cufflinks/L6/accepted_hits.bam /path/Cufflinks/L3/accepted_hits.bam,/path/Cufflinks/L7/accepted_hits.bam,/path/Cufflinks/L8/accepted_hits.bam
I remember that I have 6 samples: 2 conditions of 3 replicates each. I'm suprised by the output: it seems that "Inspecting maps and determining fragment length distributions" is done only for 3 samples and "Modeling fragment count overdispersion" for the 3 other samples... How do you interpret this output?

Code:
[10:47:14] Loading reference annotation.
[10:47:15] Inspecting maps and determining fragment length distributions.
> Map Properties:
>       Upper Quartile: 1053.00
>       Number of Multi-Reads: 131 (with 234 total hits)
>       Read Type: 100bp x 100bp
>       Fragment Length Distribution: Empirical (learned)
>                     Estimated Mean: 144.16
>                  Estimated Std Dev: 16.97
> Map Properties:
>       Upper Quartile: 2148.00
>       Number of Multi-Reads: 1932 (with 3473 total hits)
>       Read Type: 100bp x 100bp
>       Fragment Length Distribution: Empirical (learned)
>                     Estimated Mean: 136.57
>                  Estimated Std Dev: 15.99
> Map Properties:
>       Upper Quartile: 2419.00
>       Number of Multi-Reads: 10954 (with 23240 total hits)
>       Read Type: 100bp x 100bp
>       Fragment Length Distribution: Empirical (learned)
>                     Estimated Mean: 124.10
>                  Estimated Std Dev: 19.20
[12:39:11] Modeling fragment count overdispersion.
> Map Properties:
>       Upper Quartile: 3427.00
>       Number of Multi-Reads: 4714 (with 8673 total hits)
>       Read Type: 100bp x 100bp
>       Fragment Length Distribution: Empirical (learned)
>                     Estimated Mean: 137.82
>                  Estimated Std Dev: 24.89
> Map Properties:
>       Upper Quartile: 1900.00
>       Number of Multi-Reads: 9871 (with 19464 total hits)
>       Read Type: 100bp x 100bp
>       Fragment Length Distribution: Empirical (learned)
>                     Estimated Mean: 139.56
>                  Estimated Std Dev: 28.99
> Map Properties:
>       Upper Quartile: 2061.00
>       Number of Multi-Reads: 2252 (with 3480 total hits)
>       Read Type: 100bp x 100bp
>       Fragment Length Distribution: Empirical (learned)
>                     Estimated Mean: 146.45
>                  Estimated Std Dev: 18.72
[13:48:50] Modeling fragment count overdispersion.
[13:48:50] Calculating initial abundance estimates for multi-read correction.
> Processed 5272 loci.                         [*************************] 100%
[17:20:35] Testing for differential expression and regulation in locus.
> Processed 5272 loci.                         [*************************] 100%
Performed 6425 isoform-level transcription difference tests
Performed 0 tss-level transcription difference tests
Performed 5289 gene-level transcription difference tests
Performed 0 CDS-level transcription difference tests
Performed 0 splicing tests
Performed 0 promoter preference tests
Performing 0 relative CDS output tests
Writing isoform-level FPKM tracking
Writing TSS group-level FPKM tracking
Writing gene-level FPKM tracking
Writing CDS-level FPKM tracking
Jane M is offline   Reply With Quote
Old 09-08-2011, 05:20 AM   #3
Hunny
Member
 
Location: Beijing, China

Join Date: Apr 2011
Posts: 23
Default

Wierd.
I suggest you could use the latest release Cufflinks-1.0.3 package and run it again.

Good luck!
Hunny is offline   Reply With Quote
Old 09-08-2011, 05:50 AM   #4
Jane M
Senior Member
 
Location: Paris

Join Date: Aug 2011
Posts: 239
Default

It's already the latest release
Jane M is offline   Reply With Quote
Old 09-08-2011, 05:54 PM   #5
Hunny
Member
 
Location: Beijing, China

Join Date: Apr 2011
Posts: 23
Default

Hi Jane,

Quote:
I remember that I have 6 samples: 2 conditions of 3 replicates each. I'm suprised by the output: it seems that "Inspecting maps and determining fragment length distributions" is done only for 3 samples and "Modeling fragment count overdispersion" for the 3 other samples... How do you interpret this output?
I also encounter this problem when running Cuffdiff on 2 condition samples (3 replicates for each). But my output is an another variant, which "Inspecting maps and determining fragment length distributions" for 5 replicates before one "Modeling fragment count overdispersion", and then "Inspecting maps and determining fragment length distributions" for the last replicate, followed by another "Modeling fragment count overdispersion". The following posts my output.

Code:
You are using Cufflinks v1.0.3, which is the most recent release.
[15:13:54] Loading reference annotation.
[15:13:59] Inspecting maps and determining fragment length distributions.
Warning: Using default Gaussian distribution due to insufficient paired-end reads in open ranges.  It is recommended that correct paramaters (--frag-len-mean and --frag-len-std-dev) be provided.
> Map Properties:
>	Total Map Mass: 5543986.19
>	Read Type: 81bp single-end
>	Fragment Length Distribution: Truncated Gaussian (default)
>	              Default Mean: 200
>	           Default Std Dev: 80
Warning: Using default Gaussian distribution due to insufficient paired-end reads in open ranges.  It is recommended that correct paramaters (--frag-len-mean and --frag-len-std-dev) be provided.
> Map Properties:
>	Total Map Mass: 6792178.65
>	Read Type: 80bp single-end
>	Fragment Length Distribution: Truncated Gaussian (default)
>	              Default Mean: 200
>	           Default Std Dev: 80
Warning: Using default Gaussian distribution due to insufficient paired-end reads in open ranges.  It is recommended that correct paramaters (--frag-len-mean and --frag-len-std-dev) be provided.
> Map Properties:
>	Total Map Mass: 5448746.17
>	Read Type: 81bp single-end
>	Fragment Length Distribution: Truncated Gaussian (default)
>	              Default Mean: 200
>	           Default Std Dev: 80
Warning: Using default Gaussian distribution due to insufficient paired-end reads in open ranges.  It is recommended that correct paramaters (--frag-len-mean and --frag-len-std-dev) be provided.
> Map Properties:
>	Total Map Mass: 10225346.76
>	Read Type: 82bp single-end
>	Fragment Length Distribution: Truncated Gaussian (default)
>	              Default Mean: 200
>	           Default Std Dev: 80
Warning: Using default Gaussian distribution due to insufficient paired-end reads in open ranges.  It is recommended that correct paramaters (--frag-len-mean and --frag-len-std-dev) be provided.
> Map Properties:
>	Total Map Mass: 6787293.17
>	Read Type: 80bp single-end
>	Fragment Length Distribution: Truncated Gaussian (default)
>	              Default Mean: 200
>	           Default Std Dev: 80
[15:18:25] Modeling fragment count overdispersion.
Warning: Using default Gaussian distribution due to insufficient paired-end reads in open ranges.  It is recommended that correct paramaters (--frag-len-mean and --frag-len-std-dev) be provided.
> Map Properties:
>	Total Map Mass: 17084659.40
>	Read Type: 82bp single-end
>	Fragment Length Distribution: Truncated Gaussian (default)
>	              Default Mean: 200
>	           Default Std Dev: 80
[15:19:53] Modeling fragment count overdispersion.
[15:19:53] Testing for differential expression and regulation in locus.
> Processed 20237 loci.                        [*************************] 100%
Performed 20389 isoform-level transcription difference tests
Performed 17515 tss-level transcription difference tests
Performed 15748 gene-level transcription difference tests
Performed 16804 CDS-level transcription difference tests
Performed 1876 splicing tests
Performed 1210 promoter preference tests
Performing 1790 relative CDS output tests
Writing isoform-level FPKM tracking
Writing TSS group-level FPKM tracking
Writing gene-level FPKM tracking
Writing CDS-level FPKM tracking
I guess it is possible that the "Modeling fragment count overdispersion" program is running in another process, which is independent of the "Inspecting maps and determining fragment length distributions". If so, it could happen that the their printings mix with each other. But I am not sure.
I will try to check the source codes later in the day.

Cheers,
Hunny is offline   Reply With Quote
Old 09-09-2011, 01:38 AM   #6
Jane M
Senior Member
 
Location: Paris

Join Date: Aug 2011
Posts: 239
Default

Please let me know Hunny if you find some explanations in the source codes

I am also wondering about something else. With the command that I ran, I got 358 differentially expressed genes and 504 differentially expressed isoforms. Since it seems to be very few, I reran without any options. Then, I got 483 differentially expressed genes and 632 differentially expressed isoforms. That's quite a big difference... Does it mean that I should "trust" the results with upper quartile normalisation and multi reads correction?
I've done the analysis with DESeq and I got 1830 differentially expressed genes. Can I conclude that Cuffdiff is not powerful enough? Has someone encountered such a difference?
Thanks for your feedback.
Jane M is offline   Reply With Quote
Old 09-09-2011, 04:00 AM   #7
Hunny
Member
 
Location: Beijing, China

Join Date: Apr 2011
Posts: 23
Default

Hi Jane,

Your Cuffdiff output printing is actually normal !

Quote:
[10:47:15] Inspecting maps and determining fragment length distributions.
This sentence is printed before all processes below.

Quote:
> Map Properties:
.......
> Map Properties:
.......
> Map Properties:
.......
[12:39:11] Modeling fragment count overdispersion.
The first half block above might be for your first condition sample.
And the following block is likely for your second condition sample. Two "Modeling fragment count overdispersion" are run for two condition samples respectively.
Quote:
> Map Properties:
.......
> Map Properties:
.......
> Map Properties:
.......
[13:48:50] Modeling fragment count overdispersion.
To my problem, it's all because I was using the -p option of Cuffdiff for running multithreads, which affect the printing order. That could be the point.

Quote:
I am also wondering about something else. With the command that I ran, I got 358 differentially expressed genes and 504 differentially expressed isoforms. Since it seems to be very few, I reran without any options. Then, I got 483 differentially expressed genes and 632 differentially expressed isoforms. That's quite a big difference... Does it mean that I should "trust" the results with upper quartile normalisation and multi reads correction?
You mean you got different numbers of "significant yes" genes or isoforms? If so, why don't you run it again only with quartile normalisation to see what will happen?

Quote:
I've done the analysis with DESeq and I got 1830 differentially expressed genes. Can I conclude that Cuffdiff is not powerful enough? Has someone encountered such a difference?
I haven't used DESeq before. Nevertheless, I think DESeq generating much more genes than Cuffdiff doesn't mean the former is better or not.

Cheers,
Hunny is offline   Reply With Quote
Old 09-09-2011, 08:25 AM   #8
Jane M
Senior Member
 
Location: Paris

Join Date: Aug 2011
Posts: 239
Default

Quote:
Originally Posted by Hunny View Post
Hi Jane,

Your Cuffdiff output printing is actually normal !


This sentence is printed before all processes below.


The first half block above might be for your first condition sample.
And the following block is likely for your second condition sample. Two "Modeling fragment count overdispersion" are run for two condition samples respectively.
Ok, I see, thanks for your explanations !


Quote:
You mean you got different numbers of "significant yes" genes or isoforms? If so, why don't you run it again only with quartile normalisation to see what will happen?
Yes, that is what I mean. I'm currently running with on one hand, the UQ normalisation and on the other hand, the pre-estimation for multi reads allocation. I will compare on Monday the results.

Quote:
I haven't used DESeq before. Nevertheless, I think DESeq generating much more genes than Cuffdiff doesn't mean the former is better or not.

Cheers,
I agree, but it shows that either both or one of them are/is wrong....
Jane M is offline   Reply With Quote
Old 11-03-2011, 09:17 AM   #9
pinin4fjords
Member
 
Location: Edinburgh

Join Date: Sep 2011
Posts: 11
Default Similar results

Just wanted to say that I've also been comparing the results of DESeq and Cuffdiff in my experiment with biological replicates over two conditions (7/5 samples each). I'm also seeing a much greater number of significant results with DESeq (in my case ~40 vs 2). The normalised mean counts from DESeq and the FPKM values from Cuffdiff are roughly proportional, and fold-changes near identical, and the difference is not due to variations in multiple testing as I'd initially feared. It's in the initial estimation of significance that the difference lies, and I'm having trouble deciding which is the more sensible approach.
pinin4fjords is offline   Reply With Quote
Old 11-15-2011, 12:44 AM   #10
Jane M
Senior Member
 
Location: Paris

Join Date: Aug 2011
Posts: 239
Default

Thanks for sharing ! I haven't made any progress on this topic yet, but if I do, I will update this post
Jane M is offline   Reply With Quote
Old 07-08-2013, 10:30 AM   #11
twotwo
Member
 
Location: Ohio

Join Date: May 2012
Posts: 40
Default

Hi,
I have some problem with the cuffdiff running. I run the cuffdiff for two conditions (13 samples vs. 8 samples) with the following code:

cuffdiff -o /path/result /path/reference.gtf sample1,sample2,...sample13 sample14,...sample21

I want to look at their gene expression difference. The server kept giving me the error message. And I only got 3 rows of the results after the running stopped. The error message is like the following:
PBS Job Id: 3324389.mast
Job Name: test
Exec host: comp116/11+comp116/10+comp116/9+comp116/8+comp116/7+comp116/6+comp116/5+comp116/4+comp116/3+comp116/2+comp116/1+comp116/0
An error has occurred processing your job, see below.
Post job file processing error; job 3324389.mast on host comp116/11+comp116/10+comp116/9+comp116/8+comp116/7+comp116/6+comp116/5+comp116/4+comp116/3+comp116/2+comp116/1+comp116/0


The cuffdiff is running at the back stage. I am just wondering where I can get the error message from the cuffdiff? which file?
twotwo is offline   Reply With Quote
Old 07-09-2013, 04:18 AM   #12
bioBob
Member
 
Location: Virginia

Join Date: Mar 2011
Posts: 72
Default

Hi,
your error is a PBS error. I get the same type of message every once in a while and it is generally due to high load on the server. I try to copy all the files needed to process a job to the local node and make sure all the temp files etc point to a local directory to reduce the network load during processing. You could also run your job interactively to see what the exact error is. If you run interactively and there is a network error, you will get disconnected, if it is a cuffdiff issue, cuffdiff will error out and you will get a cuffdiff message.
Bob
bioBob is offline   Reply With Quote
Old 07-09-2013, 06:06 AM   #13
twotwo
Member
 
Location: Ohio

Join Date: May 2012
Posts: 40
Default

Quote:
Originally Posted by bioBob View Post
Hi,
your error is a PBS error. I get the same type of message every once in a while and it is generally due to high load on the server. I try to copy all the files needed to process a job to the local node and make sure all the temp files etc point to a local directory to reduce the network load during processing. You could also run your job interactively to see what the exact error is. If you run interactively and there is a network error, you will get disconnected, if it is a cuffdiff issue, cuffdiff will error out and you will get a cuffdiff message.
Bob
Thanks so much! I will try that.
twotwo is offline   Reply With Quote
Old 07-15-2013, 07:55 AM   #14
twotwo
Member
 
Location: Ohio

Join Date: May 2012
Posts: 40
Default

Quote:
Originally Posted by bioBob View Post
Hi,
your error is a PBS error. I get the same type of message every once in a while and it is generally due to high load on the server. I try to copy all the files needed to process a job to the local node and make sure all the temp files etc point to a local directory to reduce the network load during processing. You could also run your job interactively to see what the exact error is. If you run interactively and there is a network error, you will get disconnected, if it is a cuffdiff issue, cuffdiff will error out and you will get a cuffdiff message.
Bob
Hi, Bob,
After I ran this job for another time. There is no error message like before. However the result is still confusing. There is only 3 rows in the gene_exp.diff file. The first part of the log looks OK. They are about some locus processing... And the last part of the log is:
Performed 188097 isoform-level transcription difference tests
Performed 0 tss-level transcription difference tests
Performed 3 gene-level transcription difference tests
Performed 0 CDS-level transcription difference tests
Performed 0 splicing tests
Performed 0 promoter preference tests
Performing 0 relative CDS output tests
Writing isoform-level FPKM tracking
Writing TSS group-level FPKM tracking
Writing gene-level FPKM tracking
Writing CDS-level FPKM tracking
Writing isoform-level count tracking
Writing TSS group-level count tracking
Writing gene-level count tracking
Writing CDS-level count tracking
Writing isoform-level read group tracking
Writing TSS group-level read group tracking
Writing gene-level read group tracking
Writing CDS-level read group tracking
Writing read group info
Writing run info

I also posted my question in another post. Do you have any comments for that? Thanks!
twotwo is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:32 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO