Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • dalesan
    Member
    • Feb 2011
    • 15

    Differential Expression: is it better to map reads to genome or transcriptome?

    Hello, hello.

    This may be a very naive question, but I haven't been able to find an established answer yet in the literature, nor here (bad search technique, perhaps).

    I'm about to embark upon a differential expression analysis (Arabidopsis) but before doing so, I wanted to know if any of you can comment on the potential benefits/drawbacks of using the transcriptome as a reference rather than the genome?

    Presumably, by using the transcriptome, one circumvents having to deal with junction libraries and such. On the other hand, when using a transcriptome annotation, one will have to limit the mapping to the most representative transcript isoform so as to avoid multireads.

    In your experience, if we choose to disregard splicing for the moment and focus only on DE, would you map your reads to the genome or the transcriptome, and why?

    I'm thinking I might just do both and compare the results. But it would be nice to hear your thoughts.

    Thanks!!
  • TiborNagy
    Senior Member
    • Mar 2010
    • 329

    #2
    I am prefer the genome mapping, because I can check annotation errors. However, some species does not have assembled genome, only contigs. I this case maybe the transcriptome is a better choice. On the other hand transcriptome some times smaller than the genome, so if you do not have enough computer power, you can choose transcriptome.

    Comment

    • rskr
      Senior Member
      • Oct 2010
      • 249

      #3
      I think there is more to it than having or not having compute power. I've found mappings to transcriptomes to be much cleaner, the biggest culprit in genome mappings are pseudo genes. Furthermore, I'm not sure that there are statistically sound ways to address finding new transcripts at the same time as finding differential expression, seems like two fundamentally different questions. For example if one population had a different transcript than another, would there be any way to quantify that?

      Comment

      • Bukowski
        Senior Member
        • Jan 2010
        • 388

        #4
        My opinion is that the transcriptome is currently not well characterised enough to serve as a suitable reference for RNA-Seq. The genomes of the model organisms may have problems of incompleteness, but at least provide a scaffold to hang your RNA-Seq off and allow the discovery phase.

        I agree if all you're doing is DE of genes with a few million reads, you might as well just map to the transcriptome. But in my experience that's rarely what people want from an RNA-Seq experiment - because that's what arrays are for.

        Comment

        • dalesan
          Member
          • Feb 2011
          • 15

          #5
          Thanks guys for the replies. I appreciate your input. At this point, I think I'm going to run a test comparison between using the genome vs transcriptome to see how congruent the results are when it comes to simple DE testing.

          As Bukowski mentioned, mapping to the genome offers you much more information, including discovery of novel transcripts and isoforms. I do have another phase of my project that will consider alternative splicing and I'll definitely be mapping to the genome for this.

          Comment

          • rskr
            Senior Member
            • Oct 2010
            • 249

            #6
            Originally posted by Bukowski View Post
            - because that's what arrays are for.
            Right that's what arrays are for, but in a limited and expensive manner.

            Comment

            • sazz
              Member
              • Oct 2012
              • 28

              #7
              Originally posted by dalesan View Post
              Thanks guys for the replies. I appreciate your input. At this point, I think I'm going to run a test comparison between using the genome vs transcriptome to see how congruent the results are when it comes to simple DE testing.

              As Bukowski mentioned, mapping to the genome offers you much more information, including discovery of novel transcripts and isoforms. I do have another phase of my project that will consider alternative splicing and I'll definitely be mapping to the genome for this.
              Dalesan,

              I would appreciate if you can share your results, because I also wonder how much it differs when mapped on genome or transcriptome.

              Comment

              • dalesan
                Member
                • Feb 2011
                • 15

                #8
                Originally posted by sazz View Post
                Dalesan,

                I would appreciate if you can share your results, because I also wonder how much it differs when mapped on genome or transcriptome.
                Sure thing, sazz. Maybe by the end of next week I'll have something to share.

                Comment

                • sazz
                  Member
                  • Oct 2012
                  • 28

                  #9
                  Originally posted by dalesan View Post
                  Sure thing, sazz. Maybe by the end of next week I'll have something to share.
                  Well, I have already made a comparison btw genome and transcriptome mapping, while all the other parameters were exactly same.

                  First of all; in my experiment, I have control and target shRNA transduced cell line (human) and for my RNA-seq, I prepared 3 replicates from each. Total read number for all is around 110M (Single End, 50bp).

                  I run Tophat with -g 1 option to get uniquely mapped reads. (it was ~70% hit for transcriptome mapping)

                  When I compared CuffDiff output btw those 2 approach;
                  There are 1983 significantly differentially expressed genes (q<0.01) in intersection and 107 for only whole genome mapping, and 94 for only transcriptome mapping.

                  So for my data, if there is a difference, it seems like a small one and I don't think it will make a change in downstream analysis (I haven't tried yet.)

                  Comment

                  • Zapages
                    Member
                    • Oct 2012
                    • 98

                    #10
                    I used to do transcripts, but I was told specifically to never to use them again as you will get more gene isoforms information through mapping it to the genome.

                    Please test this out by using the BAM files that are outputted through Tophat 2 and map it with the whole genome or transcript using NCBI IGV.

                    Comment

                    • rskr
                      Senior Member
                      • Oct 2010
                      • 249

                      #11
                      Originally posted by Zapages View Post
                      I used to do transcripts, but I was told specifically to never to use them again as you will get more gene isoforms information through mapping it to the genome.

                      Please test this out by using the BAM files that are outputted through Tophat 2 and map it with the whole genome or transcript using NCBI IGV.
                      Well, never do genome mapping because you'll spend more time studying pseudo genes. Now, what are you going to do?

                      Anyway it doesn't make sense to me that you would get isoforms via genome mapping that you wouldn't get via transcriptome mapping, furthermore why would you be looking for different isoforms when you are quantifying relative expression? Is this one of the things where you are just answering the question you want to answer?

                      Comment

                      • Brian Bushnell
                        Super Moderator
                        • Jan 2014
                        • 2709

                        #12
                        Originally posted by rskr View Post
                        Well, never do genome mapping because you'll spend more time studying pseudo genes. Now, what are you going to do?
                        Those are not too hard to identify, as they lack introns and typically have a lot of SNPs with regards to genes. Anyway, pseudogenes also interfere with DNA mapping (in human, for example, many are not in HG19); should DNA mapping be done to the transcriptome as well, to avoid interference?

                        Anyway it doesn't make sense to me that you would get isoforms via genome mapping that you wouldn't get via transcriptome mapping
                        Often the genome is fairly good, but transcriptomes of complex organisms are probably all incomplete. You can't expect a complete transcriptome from organisms with many life stages or tissue types when some isoforms and genes may only be expressed at certain times.

                        furthermore why would you be looking for different isoforms when you are quantifying relative expression? Is this one of the things where you are just answering the question you want to answer?
                        Some isoforms are tissue- or condition-specific, and if a gene changes from 99% isoform A to 99% isoform B, that could be very important. Assuming that all the isoforms of a gene are functionally identical would mean there is no reason for alternative splicing to even exist.

                        Mapping to a transcriptome, you'll be somewhat limited to answering questions that have already been answered, or at least asked. It's like searching for minerals only using a map of known mineral deposits; you'll never discover anything truly novel.

                        Also, mapping to a genome is more objective and repeatable. Mapping to a transcriptome is very subjective, as there are a huge number of ways to design one. Add a single gene, or a single transcript, and the mappings of all reads may be affected. So, how do you choose which transcripts and isoforms to include? All of them? Just the longest for each gene? Just a full concatenation of all exons per gene? Just the ones that were known prior to date XYZ, or also the two new ones your lab found that you think are relevant? You'll get different results based on this purely subjective decision, possibly allowing results to be tweaked as desired.
                        Last edited by Brian Bushnell; 03-16-2014, 10:11 AM.

                        Comment

                        • dalesan
                          Member
                          • Feb 2011
                          • 15

                          #13
                          Originally posted by Brian Bushnell View Post
                          Also, mapping to a genome is more objective and repeatable. Mapping to a transcriptome is very subjective, as there are a huge number of ways to design one. Add a single gene, or a single transcript, and the mappings of all reads may be affected. So, how do you choose which transcripts and isoforms to include? All of them? Just the longest for each gene? Just a full concatenation of all exons per gene? Just the ones that were known prior to date XYZ, or also the two new ones your lab found that you think are relevant? You'll get different results based on this purely subjective decision, possibly allowing results to be tweaked as desired.
                          Excellent points, Brian. I didn't think of it this way, in terms of the repeatability aspect. In my analysis I've limited the mapping to simply the longest isoform in the annotation. Neverthless, I'm curious to see how the results compare when I get back to my desk tomorrow.

                          Comment

                          • gringer
                            David Eccles (gringer)
                            • May 2011
                            • 845

                            #14
                            I would recommend mapping to the genome, but using the transcriptome as a mapping template to pick up splice boundaries, etc.. In other words, something like what Tophat does. Mapping to the genome makes novel isoforms a bit easier to pick up, and mapping to the transcriptome will give you more descriptive output (e.g. proper gene names) with a bit less work. I would expect that thaliana should have a fairly well-annotated transcriptome, so you'll be losing a lot by ignoring annotated genetic features.

                            Comment

                            • rskr
                              Senior Member
                              • Oct 2010
                              • 249

                              #15
                              Originally posted by gringer View Post
                              I would recommend mapping to the genome, but using the transcriptome as a mapping template to pick up splice boundaries, etc.. In other words, something like what Tophat does. Mapping to the genome makes novel isoforms a bit easier to pick up, and mapping to the transcriptome will give you more descriptive output (e.g. proper gene names) with a bit less work. I would expect that thaliana should have a fairly well-annotated transcriptome, so you'll be losing a lot by ignoring annotated genetic features.
                              IMO it is obvious that Tophat went to transcriptome mapping because they were unable to solve the pseudo gene problem, what remains to be seen is does using the genome actually bring anything to the table besides huge hardware requirements, and short leading and trailing non-coding isoform? Could whatever it does bring to the table be done later with the reads that don't map to a transcript? In an analysis that is different than differential expression, like an isoform search...

                              Furthermore, I think most poorly characterized organisms get the transcriptomes done first since they are easier, and provide a majority of the useful information, which sort of renders the argument about uncharacterized organisms, mute.

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                06-02-2026, 10:05 AM
                              • SEQadmin2
                                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                by SEQadmin2


                                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                Introduction

                                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                05-22-2026, 06:42 AM
                              • SEQadmin2
                                Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                by SEQadmin2

                                Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                05-06-2026, 09:04 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Today, 08:59 AM
                              0 responses
                              8 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 12:03 PM
                              0 responses
                              21 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 11:40 AM
                              0 responses
                              15 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-28-2026, 11:40 AM
                              0 responses
                              29 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...