SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
RNA-SEQ (human transcriptome) throughput vinay052003 Illumina/Solexa 4 09-16-2011 01:58 PM
RNA-Seq: Massive-Scale RNA-Seq Analysis of Non Ribosomal Transcriptome in Human Triso Newsbot! Literature Watch 0 05-03-2011 02:00 AM
bwasw scoring parameters when aligning 454 RNA-seq reads on the human genome dgacquer Bioinformatics 0 09-02-2010 05:24 AM
RNA-Seq: Screening the human exome: a comparison of whole genome and whole transcript Newsbot! Literature Watch 0 07-06-2010 02:00 AM

Reply
 
Thread Tools
Old 03-25-2011, 10:44 AM   #1
JueFish
Member
 
Location: Connecticut

Join Date: May 2010
Posts: 42
Default Mapping Human RNA Seq: Transcriptome vs. Genome

Would anyone out there like to share their opinions about the relative merits and pitfalls of using the human transcriptome vs. the human genome as a reference for mapping some Solid RNA-Seq runs? I am guessing that this probably comes down to questions about the relative quality of the transcriptome sequence vs. the genome sequence (in other words, how complete is the transcriptome build relative to the genome build) and the relative role of splice-prediction algorithms (e.g. tophat) and their effects on read mapping. Any thoughts out there? To be honest, I don't know a whole lot on how "complete" the human transcriptome is supposed to be (# of tissues, life stages, etc.). I'm just looking for what would be the "best" way to do this. I could run both, but thought I'd start with first principles and go from there as these bam files are huge and a pain to store.

Thanks
JueFish is offline   Reply With Quote
Old 02-22-2013, 02:52 AM   #2
Derek-C
Junior Member
 
Location: U.S.A.

Join Date: Nov 2012
Posts: 7
Default

Sorry to bump an old question, but I'm also wondering about this at the moment and I can't seem to find an answer anywhere.

What are the merits of using the human transcriptome vs human genome for RNA-Seq mapping?
Derek-C is offline   Reply With Quote
Old 02-22-2013, 04:49 AM   #3
kopi-o
Senior Member
 
Location: Stockholm, Sweden

Join Date: Feb 2008
Posts: 319
Default

Transcriptome:
+ better specificity, easier to resolve isoforms, need less seq depth (probably)
- restricted to known transcripts

Genome:
+ can find new things
- need to sequence more to do accurate isoform assignment, will miss more known splice junctions
kopi-o is offline   Reply With Quote
Old 02-22-2013, 08:25 AM   #4
NGSfan
Senior Member
 
Location: Austria

Join Date: Apr 2009
Posts: 181
Default

Quote:
Originally Posted by Derek-C View Post
Sorry to bump an old question, but I'm also wondering about this at the moment and I can't seem to find an answer anywhere.

What are the merits of using the human transcriptome vs human genome for RNA-Seq mapping?

I am of the opinion that it is better to align to the genome. With STAR it can be done very quickly.

The question is, do you believe the transcriptome annotation is really complete? We know from the ENCODE project that something like 80% of the genome is transcribed. If you only align reads to the transcriptome, you could be forcing some reads to align to known transcripts, some of which could have been better placed on an unannotated region of the genome, thus reducing ambiguity.

Keep in mind that hardly any genome is really complete... in fact, you should align not only to the chromosomes, but to all available random contigs and "decoy" sequences. So if genomes are never really complete - how can we expect the transcriptome to be anything close to complete?

The only advantage to transcriptome alignment is speed and memory savings... but I think with STAR this is not so much an issue anymore.
NGSfan is offline   Reply With Quote
Old 02-22-2013, 08:27 AM   #5
NGSfan
Senior Member
 
Location: Austria

Join Date: Apr 2009
Posts: 181
Default

Quote:
Originally Posted by kopi-o View Post
Transcriptome:
+ better specificity, easier to resolve isoforms, need less seq depth (probably)
- restricted to known transcripts

Genome:
+ can find new things
- need to sequence more to do accurate isoform assignment, will miss more known splice junctions
If you input a GTF file into STAR you can have it index the known splice junctions for you...
NGSfan is offline   Reply With Quote
Old 02-22-2013, 01:57 PM   #6
timydaley
Member
 
Location: Los Angeles

Join Date: Jun 2010
Posts: 26
Default

One problem about mapping to the transcriptome is that you can mistake transcription of paralogous genes, see Schrider et al.'s PLoS One paper critiquing Cheung's Science paper on RNA editing. Since ~70% of the human genome is transcribed, you may miss a lot of information mapping to the transcriptome.
timydaley is offline   Reply With Quote
Old 07-30-2013, 06:11 AM   #7
rskr
Senior Member
 
Location: Santa Fe, NM

Join Date: Oct 2010
Posts: 250
Default

Not to bump an old thread, but it seems maybe still an open question. I think cufflinks for example can use both the transcriptome annotation and the genome to resolve certain problems with pseudogenes and homologous genes, which seems like should be a better approach, I am partial to mapping to the transcriptome at least for differential expression. It seems like a different question "Is there evidence for a transcript that hasn't been seen before?", furthermore these questions can be verified with lab work. There is also a theory that the transcripts should be able to be assembled before mapping, which should remove most of the dominant allele bias, though I don't think the assemblers are quite upto it yet.
rskr is offline   Reply With Quote
Old 01-16-2018, 06:34 AM   #8
sudhan
Junior Member
 
Location: bangalore

Join Date: Jan 2018
Posts: 1
Default

SO finally, it is good or bad to use transcriptome references for differential gene expression study?
sudhan is offline   Reply With Quote
Old 01-16-2018, 04:53 PM   #9
rskr
Senior Member
 
Location: Santa Fe, NM

Join Date: Oct 2010
Posts: 250
Default

I think now you can do both at the same time. HISAT2 builds suffix indexes with annotations built in, so whichever mapping best explains the data are chosen.
rskr is offline   Reply With Quote
Old 01-17-2018, 09:47 AM   #10
sdriscoll
I like code
 
Location: San Diego, CA, USA

Join Date: Sep 2009
Posts: 438
Default

I kinda take issue with both approaches. With alignment to genome I always miss some alignments because aligning RNA-Seq to the genome is relatively difficult. STAR misses some alignments that GSNAP picks up and, on occasion, even bowtie2 picks up alignments STAR misses (not spliced ones, of course). Furthermore when I take reads that failed to align to the genome and map them directly to the transcriptome many of those reads align. And this is true even within low error rates. If I go the other way - map to the transcriptome first - I run some risk of mapping reads to genes that would be more ambiguously mapped to the genome. I have no idea how much of a problem that is in part because I'm not confident in any aligner's ability to find all possible alignments of a read to the genome. With some data I may map to the genome first and throw out reads with MAPQ==0 and then take the remaining aligned and unaligned reads to map to the transcriptome. In the end the transcriptome probabilistic methods (RSEM, eXpress, Kallisto, Salmon) have been shown to produce more accurate gene expression than genome approaches (cufflinks, stringtie, etc). The necessity for accurate expression to detect accurate differential expression is up for debate. I'd guess it's not as big of a deal. However when it comes to publication we like to report TPM expressions for genes since it's the closest thing to a standard that we have in RNA-Seq and in order to get accurate TPM you have to use some type of probabilistic isoform level expression estimation and it's the direct to transcriptome methods that seem to work the best.
__________________
/* Shawn Driscoll, Gene Expression Laboratory, Pfaff
Salk Institute for Biological Studies, La Jolla, CA, USA */
sdriscoll is offline   Reply With Quote
Reply

Tags
human reference quality, mapping, rna seq, solid

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:13 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO