Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • RNA seq and DGE using edgeR

    Dear All,

    I would like to have critics/advices regarding how I plan to analyse my RNAseq result.
    Basically, we have sequenced a transcriptome of a non model eukaryote (no genome sequence available).
    Only reference we have is transcript contig from the same species coming from another RNAseq experiment.

    The RNAseq was done in duplicates on 1 control sample and 2 samples with different conditions using illumina 50 nts single reads.
    The bioinformatics analysis of the differential gene expression is the following:
    - mapping versus the reference and get the table unique read count/transcript.
    - then use edgeR to compute common.dispersion
    - compute exactTest with fisher test to compare control vs Sample 1 and control vs sample 2.
    - get the top differentially expressed with an p-value < 0.01.

    I was wondering if this approach makes sense and if the statistical model is adapted to my experiment (mapping on transcript build on a denovo RNAseq).

    Thank you in advance for your help.

    Greg

  • #2
    How do you want to calculate common.dispersion if you don't have replicates?

    Comment


    • #3
      Thanks for the reply but all the sample (2 conditions and control) are in duplicates. That's why I am planning a common dispersion. Sorry if I was not clear.

      Comment


      • #4
        You wrote "The RNAseq was done in duplicates on 1 control sample"; this sounds like you have one sample and sequenced it twice. I hope you have two biologically independent control samples, because just replicating the sequencing would be rather pointless.

        If so, yes, you can go ahead with a standard analysis with edgeR. (As author of DESeq, I would like to suggest to try it as well, of course; but both tools are quite similar.)

        Two further comments:

        - "compute exactTest with fisher test": no, the "exactTest" function of edgeR (and similarly, the "nbinomTest" of DESeq) is not performing Fisher's exact test. Fisher's exact test does not account for overdispersion (i.e., using it assumes the biological variability to be zero) and hence is inappropriate for RNA-Seq analysis. Replacing Fisher's test with something that accounts for overdispersion was the whole point of why Robinson and Smyth developed edgeR.

        - "get the top differentially expressed with an p-value < 0.01": You should not put thresholds on raw p values, but use an adjustment for multiple testing. edgeR (and DESeq) use the Benjamini-Hochberg (BH) procedure by default. If you want to cut BH-adjusted p values at 1%, you are unusually stringent. A commonly chosen threshold is 10%, but whether this is appropriate depends, of course, on what you want to do afterwards with the result.

        Simon

        Comment


        • #5
          Hi Simon,

          Thanks a lot for all your advices and of course, I will give a try with Deseq. I am moving from pure genomics (genome assembly, annotation) to transcriptomics so any new expertise with different tools could help definitely.

          Getting back to the RNAseq project, the reference transcripts are coming from another group working on different condition but same species. I was wondering what is the normal percentage of reads matching the reference.
          If it is not reaching it, I was planning to try a denovo approach on the control sample, and then redo the mapping, Does that make sense?

          Thanks a lot in advance for your help,

          Greg

          Comment


          • #6
            With respect to differential expression analyses, my standard advise is to always map RNA-Seq reads against the genome, not the transcriptome, unless you have a good reason to do otherwise.

            If you must align against the transcriptome, make sure that you count for genes, not transcripts, and remove reads mapping to transcripts from more than one gene.

            How many reads you can expect to map to your transcriptome simply depends on how complete it is, and so this is hard to say.

            Comment


            • #7
              I am mapping vs transcriptome because the genome is not sequenced yet. I guess it is better a transcriptome of the same species than a genome of a relative "close" genome.

              Comment


              • #8
                The main issue I see is the following: Imagine there are two genes, A and B, which share identical sequence over a part of their length. Let's say gene A is differentially expressed in your treatment-control comparison and gene B is not. If you now count those reads that map to the shared part of the genes for both genes, gene B will appear differentially expressed, too, because it counts is elevated by the reads from the paralogous part of gene A.

                If you map to the genome, you can be sure to catch such cases: you will see that the read maps to both loci.

                Transcriptome references are rarely complete, and hence, it could easily happen that gene A is missing and you cannot catch this source for an error. So, you will conclude that gene B is differentially expressed, even though, in reality, gene A is, and that is a gene that does not even appear in your list.

                There are certainly genes that are not expressed in your colleagues' condition and hence missing in your transcriptome, and this can cause more trouble than just unmapped reads.

                Hence, I would pool all transcript reads you can get hold of, assemble a transcriptome from all of them, and then map against this. And then make sure to kick out ambiguous mappings (but don't remove reads mapping to several isoforms of the same gene, because then, you would be left with nothing).

                Comment


                • #9
                  I really agree about this issue. So I am hesitating to :
                  1- mapping only on the published transcriptome and reject the unmapped reads (that imply thinking that this transcriptome is complete which is wrong)
                  2- mapping vs transcriptome + assembl denovo unmapped reads
                  3- do a denovo by pooling all reads I have (control + samples) and considered it has a gold standard. Then do the mapping onto it.
                  I would prefer 2 or 3 ( using for example trinity). The other issue is that solexa 50 nt single reads, which could be difficult to assembl de novo.

                  What do you think?

                  Thanks for your help.

                  Greg

                  Comment


                  • #10
                    Hi greggime,

                    I believe a paper I recently read is closely related to your issue. Briefly, they used RNA-seq to build a reference transcriptome of ~170,000 non-redundant consensus sequences (from a fish species with no mapped genome), then used BLASTX and ESTscan to reveal that ~50,000 of these sequences were reliable coding sequences (CDs) with a high potential for translation into protein. Annotation of these CDs using gene ontology and COG databases yielded ~16,500 consensus sequences and ~10,000 putative proteins.

                    Xiang et al BMC Genomics 2010, 11:472

                    Comment


                    • #11
                      Hi Croissant,

                      Thanks for pointing me that paper.

                      Greg

                      Comment


                      • #12
                        Smearplot and color plot

                        Dear All,

                        I would like to know if anyone could give me advice on how to colorize my smearplot.

                        Basically by default DGE gene are plot in red. What I'd like to do is changing color based on the logFC: e.g. red if gene are up-regulated and green if down-regulated.

                        Does anyone know how to do that?

                        Thanks,

                        Greg

                        Comment


                        • #13
                          Originally posted by Simon Anders View Post
                          ...

                          If you must align against the transcriptome, make sure that you count for genes, not transcripts, and remove reads mapping to transcripts from more than one gene.

                          ...
                          Hi Simon and others,

                          I just get the similar situation as Greg. We have get our 8 samples (2 replicates each) sequenced and put all sequenced reads together to do a de novo transcriptome assembly according to your advice below.
                          Originally posted by Simon Anders View Post

                          There are certainly genes that are not expressed in your colleagues' condition and hence missing in your transcriptome, and this can cause more trouble than just unmapped reads.

                          Hence, I would pool all transcript reads you can get hold of, assemble a transcriptome from all of them, and then map against this. And then make sure to kick out ambiguous mappings (but don't remove reads mapping to several isoforms of the same gene, because then, you would be left with nothing).
                          I am a novice on bioinformatics. So, when we align again the transcriptome, why use count for genes rather than transcripts?

                          Thank you!

                          Comment


                          • #14
                            Originally posted by ngsseq View Post
                            So, when we align again the transcriptome, why use count for genes rather than transcripts?
                            Have you read post #8?

                            Comment


                            • #15
                              Originally posted by Simon Anders View Post
                              Have you read post #8?
                              Hi Simon,

                              Sorry about that. It's my fault.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              9 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              67 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X