SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
de novo Transcriptome analysis to get differential expression data nareshvasani Ion Torrent 1 07-23-2015 03:12 AM
Alignment/transcriptome assembly/differential expression analysis with 40bp reads? heytreeful Illumina/Solexa 4 03-11-2013 08:54 AM
de novo transcriptome differential expression problem slavailn Bioinformatics 6 05-18-2012 08:40 AM

Reply
 
Thread Tools
Old 04-19-2014, 10:22 AM   #21
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

This is much more of a problem when mapping a transcriptome than genome, which is one reason I recommend genome mapping of RNA-seq. But either way, you can do one of three things with ambiguously-mapping reads, each with disadvantages:

1) Discard them, causing underrepresentation of transcripts homologous to other transcripts.
2) Pick one site at random, which will overrepresent the transcripts that occur less frequently and underrepresent the ones that occur more frequently.
3) Pick all top mapping sites, which will overrepresent everything.

There's no perfect answer; they'll all incur a bias. It probably doesn't matter too much which one you go with as long as every 'treatment' you compare uses identical methodology, and the same read length (since longer reads will have less ambiguity).
Brian Bushnell is offline   Reply With Quote
Old 04-19-2014, 08:40 PM   #22
Wallysb01
Senior Member
 
Location: San Francisco, CA

Join Date: Feb 2011
Posts: 286
Default

Quote:
Originally Posted by geneart View Post
Hi I have a very basic question about read mapping. For differential expression analysis of NGS data , many papers I have read , mention that they discard non -unique mapping reads. However I could not find a good summarized explanation for doing so. From what I gather and understand, the more unique the read is the better certainty it is to call its location as the technique itself could introduce some mismatches and bring about non specific mapping and also the unique location depth would still account for the naturally exisiting SNPs if any.
Have I understood this right or is there a better explanation of why we take only unique mapping reads to perform differential expression?
THanks in advance
I think in general that people should not be disregarding non-unique reads, especially if you’re doing single end sequencing. With RNA-seq its entirely possible the many genes are sequenced at just absurd coverage and duplicates are just going to be a normal sampling process. And if you were to remove the duplicates, you’d just be reducing your power to detect expression changes for your most highly expressed genes. Meaning, all genes would basically be capped at 1 read per base pair of their length in all samples or conditions, so if any genes are expressed above that, you can’t detect changes in them if you remove duplicates.

Now, if you have some reason to believe that the libraries were over amplified and you see lots of duplicates even in your more lowly expressed genes, then you may want to remove duplicates, but you should still keep in mind the issues mentioned above and expect that your most highly expressed genes won’t be coming out as differentially expressed.
Wallysb01 is offline   Reply With Quote
Old 04-20-2014, 02:34 AM   #23
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

@Wallysb01: Non-unique reads have nothing to do with duplicates. Non-unique in this context refers to multimappers, which either may or may not be used in RNAseq, depending on the tool being used and the question being asked. I think most people agree with you that removing duplicates from RNAseq datasets is a good way to shoot yourself in the foot.
dpryan is offline   Reply With Quote
Old 04-20-2014, 02:51 AM   #24
geneart
Member
 
Location: DC area

Join Date: Sep 2011
Posts: 42
Default

dpryan: you are correct. By Non unique I meant not the reads as non unique but the mapping location of reads as non unique. So in essence what I meant is I kept reads mapping to unique location while disregarding reads mapping to multiple locations , for my differential analysis.
I am looking at miRNA and hence was wondering if at all it matters that I discarded reads mapping to multiple locations? I did use single end sequencing. I had 95% of reads mapped to the genome while 4% of this are uniquely mapped.

As I am looking at miRNA expression in exosomes I expect to have all other kinds of reads mapping to tRNA rRNA etc. Hence the ambiguity is amplified even more. That is the reason I consider only uniquely mapped reads. Does this hold good? Any suggestions on this with respect to my final question of looking at miRNAs? Opinions appreciated Thanks very much in advance.

Last edited by geneart; 04-20-2014 at 03:01 AM.
geneart is offline   Reply With Quote
Old 04-20-2014, 09:42 AM   #25
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

It'd be good to know how many of the reads that map to miRNAs are also multimappers.
dpryan is offline   Reply With Quote
Old 04-21-2014, 07:25 PM   #26
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 838
Default

If a read from a transcript maps to multiple *genomic* locations, then you can't be confident about what particular transcript that read came from. The reported transcript chosen for multiply mapped reads will not be all of the mapped locations (and even if it were, that would mess up statistics due to multiple counting), so the counts are not presenting an accurate representation of your sample.

However, mapping to a transcriptome and discarding multiple reads doesn't make sense to me (assuming you're working with a species that has transcript isoforms).
gringer is offline   Reply With Quote
Old 04-22-2014, 03:02 AM   #27
dalesan
Member
 
Location: portugal

Join Date: Feb 2011
Posts: 15
Default

Quote:
Originally Posted by geneart View Post
Hi I have a very basic question about read mapping. For differential expression analysis of NGS data , many papers I have read , mention that they discard non -unique mapping reads. However I could not find a good summarized explanation for doing so. From what I gather and understand, the more unique the read is the better certainty it is to call its location as the technique itself could introduce some mismatches and bring about non specific mapping and also the unique location depth would still account for the naturally exisiting SNPs if any.
Have I understood this right or is there a better explanation of why we take only unique mapping reads to perform differential expression?
THanks in advance
From what I have understood, reads that map to multiple locations in the genome can not be reliably used in calculating differential gene expression. This is because there is no biologically meaningful way to know where in the genome such a read really belongs, thus knowing the "true" level of gene expression becomes confounded by including multi-reads (as such as read could align to multiple genes).

Imagine you have 100 reads that do not map uniquely. How will you determine where to assign them? Do you split them evenly across locations or implement some other ad hoc solution? In any case, I think it turns out to be a guessing game that may bias your results.
dalesan is offline   Reply With Quote
Reply

Tags
differential expression, genome, mapping, transcriptome

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:21 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO