Seqanswers Leaderboard Ad

**Jane M** · 09-07-2011, 11:35 PM

My problem came from the version of Cuffdiff. To handle replicates, the version 1.0.0 at least should be used !
Finally, I got the "right files" and the following output, with the command line :

Code:

 cuffdiff /path/fichier_isoem.gtf -N -u -L Raman,HM1 /path/Cufflinks/L2/accepted_hits.bam,/path/Cufflinks/L4/accepted_hits.bam,/path/Cufflinks/L6/accepted_hits.bam /path/Cufflinks/L3/accepted_hits.bam,/path/Cufflinks/L7/accepted_hits.bam,/path/Cufflinks/L8/accepted_hits.bam

I remember that I have 6 samples: 2 conditions of 3 replicates each. I'm suprised by the output: it seems that "Inspecting maps and determining fragment length distributions" is done only for 3 samples and "Modeling fragment count overdispersion" for the 3 other samples... How do you interpret this output?

Code:

[10:47:14] Loading reference annotation.
[10:47:15] Inspecting maps and determining fragment length distributions.
> Map Properties:
>       Upper Quartile: 1053.00
>       Number of Multi-Reads: 131 (with 234 total hits)
>       Read Type: 100bp x 100bp
>       Fragment Length Distribution: Empirical (learned)
>                     Estimated Mean: 144.16
>                  Estimated Std Dev: 16.97
> Map Properties:
>       Upper Quartile: 2148.00
>       Number of Multi-Reads: 1932 (with 3473 total hits)
>       Read Type: 100bp x 100bp
>       Fragment Length Distribution: Empirical (learned)
>                     Estimated Mean: 136.57
>                  Estimated Std Dev: 15.99
> Map Properties:
>       Upper Quartile: 2419.00
>       Number of Multi-Reads: 10954 (with 23240 total hits)
>       Read Type: 100bp x 100bp
>       Fragment Length Distribution: Empirical (learned)
>                     Estimated Mean: 124.10
>                  Estimated Std Dev: 19.20
[12:39:11] Modeling fragment count overdispersion.
> Map Properties:
>       Upper Quartile: 3427.00
>       Number of Multi-Reads: 4714 (with 8673 total hits)
>       Read Type: 100bp x 100bp
>       Fragment Length Distribution: Empirical (learned)
>                     Estimated Mean: 137.82
>                  Estimated Std Dev: 24.89
> Map Properties:
>       Upper Quartile: 1900.00
>       Number of Multi-Reads: 9871 (with 19464 total hits)
>       Read Type: 100bp x 100bp
>       Fragment Length Distribution: Empirical (learned)
>                     Estimated Mean: 139.56
>                  Estimated Std Dev: 28.99
> Map Properties:
>       Upper Quartile: 2061.00
>       Number of Multi-Reads: 2252 (with 3480 total hits)
>       Read Type: 100bp x 100bp
>       Fragment Length Distribution: Empirical (learned)
>                     Estimated Mean: 146.45
>                  Estimated Std Dev: 18.72
[13:48:50] Modeling fragment count overdispersion.
[13:48:50] Calculating initial abundance estimates for multi-read correction.
> Processed 5272 loci.                         [*************************] 100%
[17:20:35] Testing for differential expression and regulation in locus.
> Processed 5272 loci.                         [*************************] 100%
Performed 6425 isoform-level transcription difference tests
Performed 0 tss-level transcription difference tests
Performed 5289 gene-level transcription difference tests
Performed 0 CDS-level transcription difference tests
Performed 0 splicing tests
Performed 0 promoter preference tests
Performing 0 relative CDS output tests
Writing isoform-level FPKM tracking
Writing TSS group-level FPKM tracking
Writing gene-level FPKM tracking
Writing CDS-level FPKM tracking

**Hunny** · 09-08-2011, 05:20 AM

Wierd.
I suggest you could use the latest release Cufflinks-1.0.3 package and run it again.

Good luck!

**Jane M** · 09-08-2011, 05:50 AM

It's already the latest release

**Hunny** · 09-08-2011, 05:54 PM

Hi Jane,

I remember that I have 6 samples: 2 conditions of 3 replicates each. I'm suprised by the output: it seems that "Inspecting maps and determining fragment length distributions" is done only for 3 samples and "Modeling fragment count overdispersion" for the 3 other samples... How do you interpret this output?

I also encounter this problem when running Cuffdiff on 2 condition samples (3 replicates for each). But my output is an another variant, which "Inspecting maps and determining fragment length distributions" for 5 replicates before one "Modeling fragment count overdispersion", and then "Inspecting maps and determining fragment length distributions" for the last replicate, followed by another "Modeling fragment count overdispersion". The following posts my output.

Code:

You are using Cufflinks v1.0.3, which is the most recent release.
[15:13:54] Loading reference annotation.
[15:13:59] Inspecting maps and determining fragment length distributions.
Warning: Using default Gaussian distribution due to insufficient paired-end reads in open ranges.  It is recommended that correct paramaters (--frag-len-mean and --frag-len-std-dev) be provided.
> Map Properties:
>	Total Map Mass: 5543986.19
>	Read Type: 81bp single-end
>	Fragment Length Distribution: Truncated Gaussian (default)
>	              Default Mean: 200
>	           Default Std Dev: 80
Warning: Using default Gaussian distribution due to insufficient paired-end reads in open ranges.  It is recommended that correct paramaters (--frag-len-mean and --frag-len-std-dev) be provided.
> Map Properties:
>	Total Map Mass: 6792178.65
>	Read Type: 80bp single-end
>	Fragment Length Distribution: Truncated Gaussian (default)
>	              Default Mean: 200
>	           Default Std Dev: 80
Warning: Using default Gaussian distribution due to insufficient paired-end reads in open ranges.  It is recommended that correct paramaters (--frag-len-mean and --frag-len-std-dev) be provided.
> Map Properties:
>	Total Map Mass: 5448746.17
>	Read Type: 81bp single-end
>	Fragment Length Distribution: Truncated Gaussian (default)
>	              Default Mean: 200
>	           Default Std Dev: 80
Warning: Using default Gaussian distribution due to insufficient paired-end reads in open ranges.  It is recommended that correct paramaters (--frag-len-mean and --frag-len-std-dev) be provided.
> Map Properties:
>	Total Map Mass: 10225346.76
>	Read Type: 82bp single-end
>	Fragment Length Distribution: Truncated Gaussian (default)
>	              Default Mean: 200
>	           Default Std Dev: 80
Warning: Using default Gaussian distribution due to insufficient paired-end reads in open ranges.  It is recommended that correct paramaters (--frag-len-mean and --frag-len-std-dev) be provided.
> Map Properties:
>	Total Map Mass: 6787293.17
>	Read Type: 80bp single-end
>	Fragment Length Distribution: Truncated Gaussian (default)
>	              Default Mean: 200
>	           Default Std Dev: 80
[15:18:25] Modeling fragment count overdispersion.
Warning: Using default Gaussian distribution due to insufficient paired-end reads in open ranges.  It is recommended that correct paramaters (--frag-len-mean and --frag-len-std-dev) be provided.
> Map Properties:
>	Total Map Mass: 17084659.40
>	Read Type: 82bp single-end
>	Fragment Length Distribution: Truncated Gaussian (default)
>	              Default Mean: 200
>	           Default Std Dev: 80
[15:19:53] Modeling fragment count overdispersion.
[15:19:53] Testing for differential expression and regulation in locus.
> Processed 20237 loci.                        [*************************] 100%
Performed 20389 isoform-level transcription difference tests
Performed 17515 tss-level transcription difference tests
Performed 15748 gene-level transcription difference tests
Performed 16804 CDS-level transcription difference tests
Performed 1876 splicing tests
Performed 1210 promoter preference tests
Performing 1790 relative CDS output tests
Writing isoform-level FPKM tracking
Writing TSS group-level FPKM tracking
Writing gene-level FPKM tracking
Writing CDS-level FPKM tracking

I guess it is possible that the "Modeling fragment count overdispersion" program is running in another process, which is independent of the "Inspecting maps and determining fragment length distributions". If so, it could happen that the their printings mix with each other. But I am not sure.
I will try to check the source codes later in the day.

Cheers,

**Jane M** · 09-09-2011, 01:38 AM

Please let me know Hunny if you find some explanations in the source codes

I am also wondering about something else. With the command that I ran, I got 358 differentially expressed genes and 504 differentially expressed isoforms. Since it seems to be very few, I reran without any options. Then, I got 483 differentially expressed genes and 632 differentially expressed isoforms. That's quite a big difference... Does it mean that I should "trust" the results with upper quartile normalisation and multi reads correction?
I've done the analysis with DESeq and I got 1830 differentially expressed genes. Can I conclude that Cuffdiff is not powerful enough? Has someone encountered such a difference?
Thanks for your feedback.

**Hunny** · 09-09-2011, 04:00 AM

Hi Jane,

Your Cuffdiff output printing is actually normal !

[10:47:15] Inspecting maps and determining fragment length distributions.

This sentence is printed before all processes below.

> Map Properties:
.......
> Map Properties:
.......
> Map Properties:
.......
[12:39:11] Modeling fragment count overdispersion.

The first half block above might be for your first condition sample.
And the following block is likely for your second condition sample. Two "Modeling fragment count overdispersion" are run for two condition samples respectively.

> Map Properties:
.......
> Map Properties:
.......
> Map Properties:
.......
[13:48:50] Modeling fragment count overdispersion.

To my problem, it's all because I was using the -p option of Cuffdiff for running multithreads, which affect the printing order. That could be the point.

I am also wondering about something else. With the command that I ran, I got 358 differentially expressed genes and 504 differentially expressed isoforms. Since it seems to be very few, I reran without any options. Then, I got 483 differentially expressed genes and 632 differentially expressed isoforms. That's quite a big difference... Does it mean that I should "trust" the results with upper quartile normalisation and multi reads correction?

You mean you got different numbers of "significant yes" genes or isoforms? If so, why don't you run it again only with quartile normalisation to see what will happen?

I've done the analysis with DESeq and I got 1830 differentially expressed genes. Can I conclude that Cuffdiff is not powerful enough? Has someone encountered such a difference?

I haven't used DESeq before. Nevertheless, I think DESeq generating much more genes than Cuffdiff doesn't mean the former is better or not.

Cheers,

**Jane M** · 09-09-2011, 08:25 AM

Originally posted by Hunny View Post

Hi Jane,

Your Cuffdiff output printing is actually normal !

This sentence is printed before all processes below.

The first half block above might be for your first condition sample.
And the following block is likely for your second condition sample. Two "Modeling fragment count overdispersion" are run for two condition samples respectively.

Ok, I see, thanks for your explanations !

You mean you got different numbers of "significant yes" genes or isoforms? If so, why don't you run it again only with quartile normalisation to see what will happen?

Yes, that is what I mean. I'm currently running with on one hand, the UQ normalisation and on the other hand, the pre-estimation for multi reads allocation. I will compare on Monday the results.

I haven't used DESeq before. Nevertheless, I think DESeq generating much more genes than Cuffdiff doesn't mean the former is better or not.

Cheers,

I agree, but it shows that either both or one of them are/is wrong....

**pinin4fjords** · 11-03-2011, 09:17 AM

Similar results

Just wanted to say that I've also been comparing the results of DESeq and Cuffdiff in my experiment with biological replicates over two conditions (7/5 samples each). I'm also seeing a much greater number of significant results with DESeq (in my case ~40 vs 2). The normalised mean counts from DESeq and the FPKM values from Cuffdiff are roughly proportional, and fold-changes near identical, and the difference is not due to variations in multiple testing as I'd initially feared. It's in the initial estimation of significance that the difference lies, and I'm having trouble deciding which is the more sensible approach.

**Jane M** · 11-15-2011, 01:44 AM

Thanks for sharing ! I haven't made any progress on this topic yet, but if I do, I will update this post

**twotwo** · 07-08-2013, 10:30 AM

Hi,
I have some problem with the cuffdiff running. I run the cuffdiff for two conditions (13 samples vs. 8 samples) with the following code:

cuffdiff -o /path/result /path/reference.gtf sample1,sample2,...sample13 sample14,...sample21

I want to look at their gene expression difference. The server kept giving me the error message. And I only got 3 rows of the results after the running stopped. The error message is like the following:
PBS Job Id: 3324389.mast
Job Name: test
Exec host: comp116/11+comp116/10+comp116/9+comp116/8+comp116/7+comp116/6+comp116/5+comp116/4+comp116/3+comp116/2+comp116/1+comp116/0
An error has occurred processing your job, see below.
Post job file processing error; job 3324389.mast on host comp116/11+comp116/10+comp116/9+comp116/8+comp116/7+comp116/6+comp116/5+comp116/4+comp116/3+comp116/2+comp116/1+comp116/0

The cuffdiff is running at the back stage. I am just wondering where I can get the error message from the cuffdiff? which file?

**bioBob** · 07-09-2013, 04:18 AM

Hi,
your error is a PBS error. I get the same type of message every once in a while and it is generally due to high load on the server. I try to copy all the files needed to process a job to the local node and make sure all the temp files etc point to a local directory to reduce the network load during processing. You could also run your job interactively to see what the exact error is. If you run interactively and there is a network error, you will get disconnected, if it is a cuffdiff issue, cuffdiff will error out and you will get a cuffdiff message.
Bob

**twotwo** · 07-09-2013, 06:06 AM

Originally posted by bioBob View Post

Hi,
your error is a PBS error. I get the same type of message every once in a while and it is generally due to high load on the server. I try to copy all the files needed to process a job to the local node and make sure all the temp files etc point to a local directory to reduce the network load during processing. You could also run your job interactively to see what the exact error is. If you run interactively and there is a network error, you will get disconnected, if it is a cuffdiff issue, cuffdiff will error out and you will get a cuffdiff message.
Bob

Thanks so much! I will try that.

**twotwo** · 07-15-2013, 07:55 AM

Originally posted by bioBob View Post

Hi,
your error is a PBS error. I get the same type of message every once in a while and it is generally due to high load on the server. I try to copy all the files needed to process a job to the local node and make sure all the temp files etc point to a local directory to reduce the network load during processing. You could also run your job interactively to see what the exact error is. If you run interactively and there is a network error, you will get disconnected, if it is a cuffdiff issue, cuffdiff will error out and you will get a cuffdiff message.
Bob

Hi, Bob,
After I ran this job for another time. There is no error message like before. However the result is still confusing. There is only 3 rows in the gene_exp.diff file. The first part of the log looks OK. They are about some locus processing... And the last part of the log is:
Performed 188097 isoform-level transcription difference tests
Performed 0 tss-level transcription difference tests
Performed 3 gene-level transcription difference tests
Performed 0 CDS-level transcription difference tests
Performed 0 splicing tests
Performed 0 promoter preference tests
Performing 0 relative CDS output tests
Writing isoform-level FPKM tracking
Writing TSS group-level FPKM tracking
Writing gene-level FPKM tracking
Writing CDS-level FPKM tracking
Writing isoform-level count tracking
Writing TSS group-level count tracking
Writing gene-level count tracking
Writing CDS-level count tracking
Writing isoform-level read group tracking
Writing TSS group-level read group tracking
Writing gene-level read group tracking
Writing CDS-level read group tracking
Writing read group info
Writing run info

I also posted my question in another post. Do you have any comments for that? Thanks!

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 24 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 25 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 21 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

running Cuffdiff with biological replicates

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News