Seqanswers Leaderboard Ad

**leifive** · 03-18-2013, 06:18 AM

Originally posted by fangquan View Post

Hi Dario,

You are right. But if you don't go through compare step, you are still able to get some results from cuffdiff like this:

Performed 3204 isoform-level transcription difference tests
Performed 0 tss-level transcription difference tests
Performed 3179 gene-level transcription difference tests
Performed 0 CDS-level transcription difference tests
Performed 0 splicing tests
Performed 0 promoter preference tests
Performing 0 relative CDS output tests

It's no surprise there are some zero files because "Cuffdiff requires that transcripts in the input GTF be annotated with certain attributes in order to look for changes in primary transcript expression, splicing, coding output, and promoter use."

fangquan

Hi, fangquan.
Could you tell me how you solve the problem. I am facing almost the same puzzle. I used merged.gtf from cuffmerge and combined.gtf from cuffcompare as input alternatively, but cuffdiff performed 0 splicing/promoter preference /relative CDS output tests all the time. Thanks.

**pengchy** · 04-30-2013, 11:38 PM

Hi all,

I have run tophat2/cufflinks2.1.1/cuffmerge successfully. But when I run cuffdiff2 with merged gtf file, all the output *fpkm_tracking files have zero rpkm value, and the message of cuffdiff2 is:

Code:

[11:41:15] Loading reference annotation and sequence.
Warning: No conditions are replicated, switching to 'blind' dispersion method
[11:42:42] Inspecting maps and determining fragment length distributions.
Warning: Using default Gaussian distribution due to insufficient paired-end reads in open ranges.  It is recommended that correct parameters (--frag-len-mean and --frag-len-std-dev) be provided.
Warning: Using default Gaussian distribution due to insufficient paired-end reads in open ranges.  It is recommended that correct parameters (--frag-len-mean and --frag-len-std-dev) be provided.
[11:53:13] Modeling fragment count overdispersion.
> Map Properties:
>       Normalized Map Mass: 0.50
>       Raw Map Mass: 0.12
>       Number of Multi-Reads: 70 (with 71 total hits)
>       Fragment Length Distribution: Truncated Gaussian (default)
>                     Default Mean: 200
>                  Default Std Dev: 80
> Map Properties:
>       Normalized Map Mass: 0.50
>       Raw Map Mass: 1.00
>       Number of Multi-Reads: 154 (with 158 total hits)
>       Fragment Length Distribution: Truncated Gaussian (default)
>                     Default Mean: 200
>                  Default Std Dev: 80
[11:55:34] Calculating preliminary abundance estimates

I have test the gtf file produced by cuffcompare, the results same. Could anyone tell me the reason?

Thank you.

**pengchy** · 05-09-2013, 04:54 PM

I have found the reason for this problem. Because the coordination in the bam
files is not consistent with the gtf.

Thank you.

Originally posted by pengchy View Post

Hi all,

I have run tophat2/cufflinks2.1.1/cuffmerge successfully. But when I run cuffdiff2 with merged gtf file, all the output *fpkm_tracking files have zero rpkm value, and the message of cuffdiff2 is:

Code:

[11:41:15] Loading reference annotation and sequence.
Warning: No conditions are replicated, switching to 'blind' dispersion method
[11:42:42] Inspecting maps and determining fragment length distributions.
Warning: Using default Gaussian distribution due to insufficient paired-end reads in open ranges.  It is recommended that correct parameters (--frag-len-mean and --frag-len-std-dev) be provided.
Warning: Using default Gaussian distribution due to insufficient paired-end reads in open ranges.  It is recommended that correct parameters (--frag-len-mean and --frag-len-std-dev) be provided.
[11:53:13] Modeling fragment count overdispersion.
> Map Properties:
>       Normalized Map Mass: 0.50
>       Raw Map Mass: 0.12
>       Number of Multi-Reads: 70 (with 71 total hits)
>       Fragment Length Distribution: Truncated Gaussian (default)
>                     Default Mean: 200
>                  Default Std Dev: 80
> Map Properties:
>       Normalized Map Mass: 0.50
>       Raw Map Mass: 1.00
>       Number of Multi-Reads: 154 (with 158 total hits)
>       Fragment Length Distribution: Truncated Gaussian (default)
>                     Default Mean: 200
>                  Default Std Dev: 80
[11:55:34] Calculating preliminary abundance estimates

I have test the gtf file produced by cuffcompare, the results same. Could anyone tell me the reason?

Thank you.

**Charitra** · 07-04-2013, 01:02 AM

So, does it mean that all reference files must be from same origin (ensembl or UCSC) ?
Is it okay to ignore this warning (Warning: No conditions are replicated, switching to 'blind' dispersion method) and just let the cuffdiff continue. What impact will it give ?
I have ignored Warning and my cuffdiff is finished. I have everything in my data such as genes and diff expression data without error. I used all ensemble ref but still got the same error ?

Expecting your kind reply..

Thank you.

Originally posted by pengchy View Post

I have found the reason for this problem. Because the coordination in the bam
files is not consistent with the gtf.

Thank you.

**nsl** · 07-04-2013, 06:18 AM

If the gtf file is incompatible you will know, as your cufflinks output will show "0 FPKM" for every gene. As mentioned in the posts above, get it from igenomes just to be safe.

If cuffdiff went back to the 'blind' setting that means that it is assuming you only have 1 replicate per treatment, and

"This method works well when you expect the samples to have very few differentially expressed genes. If there are many differentially expressed genes, Cuffdiff will construct an overly conservative model and you may not get any significant calls. In this case, you will need more replicates in your experiment."

Check the last paragraph on the cufflinks manual.

If you have more than 1 replicate but it is still running it blind, could be that you didn't comma separate your replicates correctly.

Hope this helps

**Charitra** · 07-04-2013, 09:48 PM

Thank you so much for your expert comments.
I have some confusions/questions but searching the answers in previous posts. I will drop my questions here, if I can not find the answers.

However, there is something which I like to ask you: I am sorry if this is too much disturbing you but I really need move on. Please answer if possible. Thank you:
1. After the tophat alignment, I run cufflinks using tophat produced .bam file and then cufflinks stated "Warning: doesnt appear to be a .bam file, trying .sam...OK.." then it continued. Do you think this might has something to do with cuffdiff going to blind ?
2. can you check please these cuffdiff; I used igenome(Ensemble) ref.
Warning: couldn't find fasta record for 'HSCHR9_3_CTG35'!
This contig will not be bias corrected.
Warning: No conditions are replicated, switching to 'blind' dispersion method
[17:12:12] Inspecting maps and determining fragment length distributions.
[17:25:54] Modeling fragment count overdispersion.
> Map Properties:
> Normalized Map Mass: 21977740.46
> Raw Map Mass: 23001324.33
> Number of Multi-Reads: 493145 (with 1171488 total hits)
> Fragment Length Distribution: Empirical (learned)
> Estimated Mean: 233.43
> Estimated Std Dev: 32.95
> Map Properties:
> Normalized Map Mass: 21977740.46
> Raw Map Mass: 20859001.11
> Number of Multi-Reads: 430276 (with 1094508 total hits)
> Fragment Length Distribution: Empirical (learned)
> Estimated Mean: 242.99
> Estimated Std Dev: 19.76
[17:27:41] Calculating preliminary abundance estimates
> Processed 38664 loci. [*************************] 100%
[19:01:04] Learning bias parameters.
[19:24:10] Testing for differential expression and regulation in locus.
> Processed 38664 loci. [*************************] 100%
Performed 61095 isoform-level transcription difference tests
Performed 41310 tss-level transcription difference tests
Performed 18315 gene-level transcription difference tests
Performed 28507 CDS-level transcription difference tests
Performed 0 splicing tests
Performed 0 promoter preference tests
Performing 0 relative CDS output tests
Writing isoform-level FPKM tracking
Writing TSS group-level FPKM tracking
Writing gene-level FPKM tracking
Writing CDS-level FPKM tracking
Writing isoform-level count tracking
Writing TSS group-level count tracking
Writing gene-level count tracking
Writing CDS-level count tracking
Writing isoform-level read group tracking
Writing TSS group-level read group tracking
Writing gene-level read group tracking
Writing CDS-level read group tracking
Writing read group info
Writing run info

3. For the cuffdiff of 5 samples,
3.1 without merging:-
CuffSet instance with:
5 samples
62149 genes
273794 isoforms
146887 TSS
82429 CDS
621490 promoters
1468870 splicing
192450 relCDS
diff_expressed_gene_significant: 3183
3.2 divided the data into 2 categories. First category, merged two .gtf (1 + 2) and, in second, three .gtf (3+4+5). Then run cuffdiff and got following details from cummeRbund:
CuffSet instance with:
2 samples
62149 genes
273794 isoforms
146887 TSS
82429 CDS
62149 promoters
146887 splicing
19245 relCDS
diff_expressed_gene_significant: 95
(FPKM expression plot Image attached)
does it indicate good cuffdiff process by your experience (even though used blind method) ?
4. I think, for 3.1 cuffdiff (1 replicate), and for 3.2 First (2 replicates) and second (3 replicates). It is right ?

Thank you in advance with

Originally posted by nsl View Post

If the gtf file is incompatible you will know, as your cufflinks output will show "0 FPKM" for every gene. As mentioned in the posts above, get it from igenomes just to be safe.

If cuffdiff went back to the 'blind' setting that means that it is assuming you only have 1 replicate per treatment, and

"This method works well when you expect the samples to have very few differentially expressed genes. If there are many differentially expressed genes, Cuffdiff will construct an overly conservative model and you may not get any significant calls. In this case, you will need more replicates in your experiment."

Check the last paragraph on the cufflinks manual.

If you have more than 1 replicate but it is still running it blind, could be that you didn't comma separate your replicates correctly.

Hope this helps

Attached Files

**nsl** · 07-08-2013, 06:57 AM

Hi Charitra,
I'm learning on the job like many and am not an expert.

1. isn't a problem. I've experienced it too. But am not sure why that msg pops up.
2. Could it be that there is no FASTA record b/c it is a pseudo gene? Not sure on this.
3. I am not quite sure about what you are asking as I don't know the design of your experiment. Nevertheless, when you run them as 5 separate samples you have 3183 differentially expressed genes. and this number is reduced drastically to 95 when you run it as replicates. This indicates that there is a lot of variation. You really need to have replicates to make any conclusions about your data.
4. Do your barplots represent 2 different genes? Either way the FPKM values are low and I would not concentrate on genes with very low FPKMs ( unless of course you have a prior knowledge and have reason to). Also, the error overlap terribly, so there is no significance.

hope this help

**Charitra** · 07-08-2013, 06:39 PM

Dear nls
Thank you so much for your expert comments

. I got the point and thank you again for your help

.
I like to write details on your comments no. 3. and 4. :
3. My first two sample (1. and 2.) are of sensitive group, so, I merged them. Sample (3., 4. and 5.) are of resistant group, so, i merged them. Now, I have two conditions, Sensitive vs Resistant. Thereafter, I run cuffdiff and got 93 diff genes. I got questions now:
a). Sensitive and Resistant have 2 and 3 replicates, respectively. It is true in this case ?
b). If the above condition is true (2 replicates in sensitive and 3 in resistant), then should I put the replicate number in when running cuffdiff/cuffmerge because, (as you may remember, it was going to blind method) ?
c). does cuffmerge/cuffdiff consider replicates automatically and switch to blind (Warning: No conditions are replicated, switching to 'blind' dispersion method) Or a command must be provided indicating number of replicates ?
4. In the attachment, ID XLOC_006036 is cuffdiff ID because cuffdiff does not give name of the gene. So, it is a single gene named CYP2C9 with cuffdiff ID XLOC_006036. How much FPKM value would you consider considered good enough or very low to count diff expression, just your point of view / experience ?

the most important question for me is, I think there are not enough replicates as it should be 3 at least and now the experiments are already done. Is there any way to get something out of these data which can be significant ? what would you like to recommend ?

Thank you in advance.

Originally posted by nsl View Post

Hi Charitra,
I'm learning on the job like many and am not an expert.

1. isn't a problem. I've experienced it too. But am not sure why that msg pops up.
2. Could it be that there is no FASTA record b/c it is a pseudo gene? Not sure on this.
3. I am not quite sure about what you are asking as I don't know the design of your experiment. Nevertheless, when you run them as 5 separate samples you have 3183 differentially expressed genes. and this number is reduced drastically to 95 when you run it as replicates. This indicates that there is a lot of variation. You really need to have replicates to make any conclusions about your data.
4. Do your barplots represent 2 different genes? Either way the FPKM values are low and I would not concentrate on genes with very low FPKMs ( unless of course you have a prior knowledge and have reason to). Also, the error overlap terribly, so there is no significance.

hope this help

**jp.** · 07-12-2013, 12:56 AM

Please somebody give me answer of my problem.
My RNAseq (PE) was conducted for 2 samples (antibiotic resistant and sensitive) without thinking of replication.
Is it possible to publish the differential gene, splicing in the journal. Most of the researcher said it is not possible

I want answer from this forum. What it is you think I should do .....?
Many thanks

**nsl** · 07-12-2013, 09:12 AM

jp,

I'm afraid that is fact. no replication would not allow you a stand alone publication

**jp.** · 07-12-2013, 11:59 PM

One more thing,
what about, if I try to get duplicates (1 more seq for each of two, biological replicate), duplicates will be okay as minimum or not ?

Originally posted by nsl View Post

jp,

I'm afraid that is fact. no replication would not allow you a stand alone publication

**nsl** · 07-13-2013, 09:27 AM

jp,

I've been dealing with ngs data for a short 3 yrs and not an expert. I started in 2010 with 1 replicate and after being exposed to the seqanswer and other bioinformatics communities realized the folly of my ways...we would never rely on no replication for bench work and same goes for this stuff. I went on to have 4 replicates each, and did one set a at a different time. I see quite a bit of variation in the samples that i did 6 months later. However, I am dealing with a very dynamic stage in development and variations may be showing the actual biology. long story short.... 2 reps better than 1. but also be mindful of the biology you are going after. cells, tissues, developmental stages can all show true variation at the rna level and the last thing you want is false positives due to library prep and sample handling.

**jp.** · 07-13-2013, 10:51 PM

Dear nls,
Thank you for your valuable advice. Your knowledge and experience is much higher than me. I really appreciate your help. However, it will be very kind of you, if you please answer few more of my questions below:
What is your opinion:
1. Which library size is better for human sample to study diff_exp, transcript discovery, splicing for PE seq Illumina (150bp or 50bp) (short / longer) or ..?
2. What if I for single cell sequencing ?
3. If single cell seq better than, can it be done on the same sequencer (PE Illumina 2000/2500) ?
4. If possible, plz write something about single cells vs normal PE sequencing differences in procedure (just few will be okay)
5. May I get your contact number so that I can call you with prior appointment. my e-mail id (med dot rdgmc at g mail dot com)
I have read enough but get confusion always, your opinion will help me a lot.
My english is not good enough..sorry

Thanks in advance

Originally posted by nsl View Post

jp,

I've been dealing with ngs data for a short 3 yrs and not an expert. I started in 2010 with 1 replicate and after being exposed to the seqanswer and other bioinformatics communities realized the folly of my ways...we would never rely on no replication for bench work and same goes for this stuff. I went on to have 4 replicates each, and did one set a at a different time. I see quite a bit of variation in the samples that i did 6 months later. However, I am dealing with a very dynamic stage in development and variations may be showing the actual biology. long story short.... 2 reps better than 1. but also be mindful of the biology you are going after. cells, tissues, developmental stages can all show true variation at the rna level and the last thing you want is false positives due to library prep and sample handling.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News