Seqanswers Leaderboard Ad

**gringer** · 09-23-2011, 02:14 AM

Does it make sense to run cufflinks on the total set first to get a combined GTF file, rather than using 4 different ones for each condition?

**gfmgfm** · 09-23-2011, 06:14 AM

After running the 2 sets I used cuffcompare to combine all together. So cuffdiff was run with the same gtf for setA and setB.

I also ran cuffdiff for setA and setB on refseq gtf, and did not get consistent log2ratios.

**gringer** · 09-23-2011, 06:17 AM

ah right, sorry. I misread the command sequence and thought the cuffcompare line was something else (not quite sure what...).

**gringer** · 09-23-2011, 06:28 AM

Does this also happen if you use the same subsample instead of a different one (e.g. setA vs setA)? I think cufflinks includes some sampling in its FPKM calculations so there'll be small shifts in the values, but it's unlikely to be within the ranges you've demonstrated.

How are you doing the read subsampling to choose setA/setB? What are the absolute FPKM values for these transcripts (small values will have comparatively larger errors)?

**gfmgfm** · 09-23-2011, 09:46 AM

Thanks for the replies.
Very good questions.

1. I haven't tried giving the same set twice. Anyway - I wouldn't like to have such differences - no matter where they come from.

2. Regarding the read sampling: I wanted to choose reads randomly, but my script was too heavy. So I took 1st 30M reads and 2nd 30M reads of the fastq file. So I agree it might be problematic. However, when I used the the same sets A and B to get RPKM values using the Partek software - I got beautiful correlations (I can attach scatter plots if anyone is interested). So this indicates that the sets are OK and can give consistent results.

**rozovr** · 09-28-2011, 01:46 AM

Is partek doing any filtering/preprocessing? If so, It seems like the sampling is a key issue. Unless you know the reads are randomly arranged in the fastq, using the halves could explain the differential expression you see where none is expected.

What of the sampling script was "heavy?"

**gfmgfm** · 09-28-2011, 01:50 AM

Thanks for the reply.
I am not sure if Partek does filtering - they say very little in their white pages.
As Partek gives high correlation I don't think that the sampling is the problem.

**rozovr** · 10-02-2011, 02:11 AM

did you ever resolve this, or get a response from the authors?

**gfmgfm** · 10-02-2011, 03:09 AM

Still didn't resolve the problem...

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 15 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Cufflinks: a problem with the FPKM ratios?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News