Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Differential expression results of different read length

    The story just happened these days. We had 3 rice samples (1 was control and 2 were treated) for RNA-seq. According to the budge issue and the suggestion from the local sequencing provider, we decided to do single end sequencing (10 millions reads and 50 bp length per read). When we got the sequencing results, I found the file sizes had huge different from 1.5 GB to 3 GB‧ This was because they divided our sample into 2 different sequencing batch, one batch was sequenced with 50 bp, and the other was sequenced with 150 bp.

    This makes my adviser and me worry can we compare the differential expression within these data. The provider said "Yes, you can. Don't worry." Is it?

    The other question is about differential expression comparison.

    First, I was trimmed the raw reads into the same length (50 bp per read), and used TopHat and Cufflinks to calculate the RPKMs. Here I got around 600 genes were up-regulated with at least 2 fold changes. Then, I though since we had 150 bp reads, why not using the whole length for the calculation again. This time, I got around 750 genes were up-regulated with at least 2 fold changes. When I compared these 2 results, there were only around 220 genes shown up in both calculations. (The raw reads with 50 bp length was omitted in the comparison.)

    The DE results made we confused. Which results should we trust? From the assembly with 50 bp reads or with 150 bp reads? If we using qPCR to qualify them, what will happen?

    Does anyone can give me some advises?

    Many thanks,

    Chung-Wen

  • #2
    Did you run biological replicates of any of the samples, or did you just have one sample per condition?

    You can't really do DE unless you have replicates, especially since they were run under different conditions.

    Comment


    • #3
      I don't have biological replicates. So I can't say DE, but I should able to use the RPKM values to compare the expression level of each genes, right?

      Comment


      • #4
        Originally posted by lincw View Post
        I don't have biological replicates. So I can't say DE, but I should able to use the RPKM values to compare the expression level of each genes, right?
        Without even considering the real problem of not having replicates (ie biological variation) you will have a problem based on the differing amounts of sequence in this case. If you have 2x as much sequence in sample A vs sample B you cannot know if RPKM differs because of this, or because of abundance of transcript. You may be able to reduce the 150bp to 50bp (random sampling?) but I cannot see any reviewers accepting results from such a study because it is not possible to do requisite statistical analysis and so any 'result' is conjecture. You could check RPKM and then do qPCR on genes you found of interest?

        Comment


        • #5
          Originally posted by bruce01 View Post
          Without even considering the real problem of not having replicates (ie biological variation) you will have a problem based on the differing amounts of sequence in this case. If you have 2x as much sequence in sample A vs sample B you cannot know if RPKM differs because of this, or because of abundance of transcript. You may be able to reduce the 150bp to 50bp (random sampling?) but I cannot see any reviewers accepting results from such a study because it is not possible to do requisite statistical analysis and so any 'result' is conjecture. You could check RPKM and then do qPCR on genes you found of interest?
          Thank you, I have more clear idea about this now.

          Comment


          • #6
            Originally posted by lincw View Post

            The DE results made we confused. Which results should we trust? From the assembly with 50 bp reads or with 150 bp reads? If we using qPCR to qualify them, what will happen?

            Does anyone can give me some advises?

            Many thanks,

            Chung-Wen
            As far as picking genes for qPCR and what you will see, it is impossible to tell. Its already reasonably well known that DGE results correlate best with qPCR when the differentially expressed genes are selected based on the simultaneous application of both a statistical threshold and a fold change threshold.

            That is, if the differentially expressed genes were both statistically significant and passed some minimum fold change cutoff (1.5 fold, 2.0 fold or whatever), then the qPCR genes will more often also be statistically significant and changing in the same direction (albeit the actual fold change may still not correlate terribly well, for all sorts of reasons).

            In my personal experience, selecting differentially expressed genes solely by fold change generally gives poor or little correlation with qPCR results, at least for genes with moderate changes in expression (extremely high fold change usually correlates, but then again, those are often, at best, only the most trivially interesting genes).

            Without biological replicates, you have zero statistics to base your selection on, so the best you can do is pick genes, run the qPCR, and see what you get. But do not be surprised if you get far less validation then you wished for.
            Michael Black, Ph.D.
            ScitoVation LLC. RTP, N.C.

            Comment


            • #7
              The issue with not having biological/library replicates, aside, yes, you can trim the 3' 100 bases from your reads to achieve a pseudo-50 base read length.

              I've done this for comparison purposes of the same library prep split to a 50 base read on Hiseq and a 150 base read chemistry on miseq.

              The results were identical fitting a Poisson sampling curve distribution pattern for the sequencing sampling step. Thus normal sampling laws apply between the two platforms with slightly different colonization kits,etc.. However, this does not account for the cumulative sampling variance that is far greater doing biological replicates, which encompasses (and not limited to) differential extraction of RNA, differential efficiencies of RT to cDNA, differential ligation efficiences which interact with differential fragmentation phenomena between specimens, differential plateau rates of the limited PCR dscDNA creation steps, fractionation of the library with purification... ... Then, ontop of all that is the normal Poisson sampling that occurs on the flow cell of the prepped library. :-)

              If these libraries, were prepped separately, I would be extremely cautious in comparing and drawing any costly conclusions. There are a number of articles delineating the issue with comparison between separate library preps, let alone the need for at least 2-3 biological preparations depending on the fold change you expect to see.

              Be cautious in interpreting your data.

              -Tom

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:37 PM
              0 responses
              8 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 06:07 PM
              0 responses
              8 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              49 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              66 views
              0 likes
              Last Post seqadmin  
              Working...
              X