SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
Contig length, k-mer coverage, and differential expression nbogard General 3 09-10-2013 11:30 AM
Expression quantification/differential expression gene analysis by RNA-Seq chenjy Bioinformatics 12 08-02-2013 03:06 AM
Transcript Length Bias in Differential Expression Analysis [DESeq] tdyo Bioinformatics 2 04-29-2013 03:15 PM
picard error: Mismatch between read length and quals length writing read shawpa Bioinformatics 0 08-20-2012 05:52 AM
RNA-Seq: From RNA-seq reads to differential expression results. Newsbot! Literature Watch 0 12-24-2010 02:13 AM

Reply
 
Thread Tools
Old 02-06-2014, 05:44 AM   #1
lincw
Junior Member
 
Location: Taiwan, R.O.C.

Join Date: Dec 2012
Posts: 5
Question Differential expression results of different read length

The story just happened these days. We had 3 rice samples (1 was control and 2 were treated) for RNA-seq. According to the budge issue and the suggestion from the local sequencing provider, we decided to do single end sequencing (10 millions reads and 50 bp length per read). When we got the sequencing results, I found the file sizes had huge different from 1.5 GB to 3 GB‧ This was because they divided our sample into 2 different sequencing batch, one batch was sequenced with 50 bp, and the other was sequenced with 150 bp.

This makes my adviser and me worry can we compare the differential expression within these data. The provider said "Yes, you can. Don't worry." Is it?

The other question is about differential expression comparison.

First, I was trimmed the raw reads into the same length (50 bp per read), and used TopHat and Cufflinks to calculate the RPKMs. Here I got around 600 genes were up-regulated with at least 2 fold changes. Then, I though since we had 150 bp reads, why not using the whole length for the calculation again. This time, I got around 750 genes were up-regulated with at least 2 fold changes. When I compared these 2 results, there were only around 220 genes shown up in both calculations. (The raw reads with 50 bp length was omitted in the comparison.)

The DE results made we confused. Which results should we trust? From the assembly with 50 bp reads or with 150 bp reads? If we using qPCR to qualify them, what will happen?

Does anyone can give me some advises?

Many thanks,

Chung-Wen
lincw is offline   Reply With Quote
Old 02-06-2014, 06:03 AM   #2
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 667
Default

Did you run biological replicates of any of the samples, or did you just have one sample per condition?

You can't really do DE unless you have replicates, especially since they were run under different conditions.
mastal is offline   Reply With Quote
Old 02-06-2014, 06:23 AM   #3
lincw
Junior Member
 
Location: Taiwan, R.O.C.

Join Date: Dec 2012
Posts: 5
Default

I don't have biological replicates. So I can't say DE, but I should able to use the RPKM values to compare the expression level of each genes, right?
lincw is offline   Reply With Quote
Old 02-07-2014, 02:09 AM   #4
bruce01
Senior Member
 
Location: .

Join Date: Mar 2011
Posts: 157
Default

Quote:
Originally Posted by lincw View Post
I don't have biological replicates. So I can't say DE, but I should able to use the RPKM values to compare the expression level of each genes, right?
Without even considering the real problem of not having replicates (ie biological variation) you will have a problem based on the differing amounts of sequence in this case. If you have 2x as much sequence in sample A vs sample B you cannot know if RPKM differs because of this, or because of abundance of transcript. You may be able to reduce the 150bp to 50bp (random sampling?) but I cannot see any reviewers accepting results from such a study because it is not possible to do requisite statistical analysis and so any 'result' is conjecture. You could check RPKM and then do qPCR on genes you found of interest?
bruce01 is offline   Reply With Quote
Old 02-07-2014, 02:58 AM   #5
lincw
Junior Member
 
Location: Taiwan, R.O.C.

Join Date: Dec 2012
Posts: 5
Default

Quote:
Originally Posted by bruce01 View Post
Without even considering the real problem of not having replicates (ie biological variation) you will have a problem based on the differing amounts of sequence in this case. If you have 2x as much sequence in sample A vs sample B you cannot know if RPKM differs because of this, or because of abundance of transcript. You may be able to reduce the 150bp to 50bp (random sampling?) but I cannot see any reviewers accepting results from such a study because it is not possible to do requisite statistical analysis and so any 'result' is conjecture. You could check RPKM and then do qPCR on genes you found of interest?
Thank you, I have more clear idea about this now.
lincw is offline   Reply With Quote
Old 02-07-2014, 05:37 AM   #6
mbblack
Senior Member
 
Location: Research Triangle Park, NC

Join Date: Aug 2009
Posts: 245
Default

Quote:
Originally Posted by lincw View Post

The DE results made we confused. Which results should we trust? From the assembly with 50 bp reads or with 150 bp reads? If we using qPCR to qualify them, what will happen?

Does anyone can give me some advises?

Many thanks,

Chung-Wen
As far as picking genes for qPCR and what you will see, it is impossible to tell. Its already reasonably well known that DGE results correlate best with qPCR when the differentially expressed genes are selected based on the simultaneous application of both a statistical threshold and a fold change threshold.

That is, if the differentially expressed genes were both statistically significant and passed some minimum fold change cutoff (1.5 fold, 2.0 fold or whatever), then the qPCR genes will more often also be statistically significant and changing in the same direction (albeit the actual fold change may still not correlate terribly well, for all sorts of reasons).

In my personal experience, selecting differentially expressed genes solely by fold change generally gives poor or little correlation with qPCR results, at least for genes with moderate changes in expression (extremely high fold change usually correlates, but then again, those are often, at best, only the most trivially interesting genes).

Without biological replicates, you have zero statistics to base your selection on, so the best you can do is pick genes, run the qPCR, and see what you get. But do not be surprised if you get far less validation then you wished for.
__________________
Michael Black, Ph.D.
ScitoVation LLC. RTP, N.C.
mbblack is offline   Reply With Quote
Old 02-07-2014, 07:39 AM   #7
thomasblomquist
Member
 
Location: Ohio

Join Date: Jul 2012
Posts: 68
Default

The issue with not having biological/library replicates, aside, yes, you can trim the 3' 100 bases from your reads to achieve a pseudo-50 base read length.

I've done this for comparison purposes of the same library prep split to a 50 base read on Hiseq and a 150 base read chemistry on miseq.

The results were identical fitting a Poisson sampling curve distribution pattern for the sequencing sampling step. Thus normal sampling laws apply between the two platforms with slightly different colonization kits,etc.. However, this does not account for the cumulative sampling variance that is far greater doing biological replicates, which encompasses (and not limited to) differential extraction of RNA, differential efficiencies of RT to cDNA, differential ligation efficiences which interact with differential fragmentation phenomena between specimens, differential plateau rates of the limited PCR dscDNA creation steps, fractionation of the library with purification... ... Then, ontop of all that is the normal Poisson sampling that occurs on the flow cell of the prepped library. :-)

If these libraries, were prepped separately, I would be extremely cautious in comparing and drawing any costly conclusions. There are a number of articles delineating the issue with comparison between separate library preps, let alone the need for at least 2-3 biological preparations depending on the fold change you expect to see.

Be cautious in interpreting your data.

-Tom
thomasblomquist is offline   Reply With Quote
Reply

Tags
illuminar, rna-seq

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:09 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO