SEQanswers

Go Back   SEQanswers > General



Similar Threads
Thread Thread Starter Forum Replies Last Post
Minimum number of reads mapping to a gene chadn737 RNA Sequencing 1 05-08-2012 07:17 AM
Double reads for one gene jimineep Illumina/Solexa 16 10-28-2011 07:46 AM
Detection of Fusion Gene Events from RNASeq PE reads zee Bioinformatics 2 05-12-2011 09:14 PM
Question on counting number of reads per gene gen2prot Bioinformatics 3 06-25-2010 10:45 AM
Binning of aligned mRNA-reads into gene models jwaage Bioinformatics 1 03-04-2009 04:40 AM

Reply
 
Thread Tools
Old 09-07-2009, 05:48 AM   #1
CompBio
Member
 
Location: Bristol, UK

Join Date: Aug 2009
Posts: 26
Default Why Do a Gene's Reads Appear on Both Strands?

I've got a naive question about Illumina short reads (though it may apply to other technologies as well). I'm a computer scientist, so there are some serious gaps in my understanding of molecular biology.

When we map short reads to a genome, we see clusters on both strands, not just the strand where a gene is annotated. In fact, the read depth appears to be almost perfectly symmetrical. I've seen the same phenomenon in published papers, though I've yet to find an explanation. What would cause this to happen?
CompBio is offline   Reply With Quote
Old 09-07-2009, 06:18 AM   #2
ECO
--Site Admin--
 
Location: SF Bay Area, CA, USA

Join Date: Oct 2007
Posts: 1,358
Default

Hey Compbio,

I'm assuming you're referring to an RNA-seq library. The read depth you're describing is a consequence of how the library is constructed. While the RNA is indeed strand specific, the conversion to cDNA can result in the loss of strand information.

It is possible to construct RNA-seq libraries that retain the strand information, but from a molecular biology standpoint it is much easier to discard it. See the following two papers for a couple of methods for maintaining strand information:

Lister, et al: http://www.ncbi.nlm.nih.gov/pubmed/18423832

Cloonan, et al: http://www.ncbi.nlm.nih.gov/pubmed/18516046

Hope that at least helps a bit.
ECO is offline   Reply With Quote
Old 09-07-2009, 07:56 AM   #3
CompBio
Member
 
Location: Bristol, UK

Join Date: Aug 2009
Posts: 26
Default

Quote:
Originally Posted by ECO View Post
Hey Compbio,

I'm assuming you're referring to an RNA-seq library. The read depth you're describing is a consequence of how the library is constructed. While the RNA is indeed strand specific, the conversion to cDNA can result in the loss of strand information.

It is possible to construct RNA-seq libraries that retain the strand information, but from a molecular biology standpoint it is much easier to discard it. See the following two papers for a couple of methods for maintaining strand information:

Lister, et al: http://www.ncbi.nlm.nih.gov/pubmed/18423832

Cloonan, et al: http://www.ncbi.nlm.nih.gov/pubmed/18516046

Hope that at least helps a bit.
That was a fast reply - thanks! However, I'm still a bit confused by what I've been reading.

For example, one review covers both the papers you mention (http://www.ncbi.nlm.nih.gov/pubmed/18587314). The author states that in the procedure used by Lister et al. "directional information is captured", just as in the Cloonan paper. But the figures in the Lister paper show symmetrical read distributions on both strands, while the figures in the Cloonan paper, reads appear only on one strand.

If directional information is captured, shouldn't that give the strand implicitly?
CompBio is offline   Reply With Quote
Old 09-07-2009, 11:13 AM   #4
ECO
--Site Admin--
 
Location: SF Bay Area, CA, USA

Join Date: Oct 2007
Posts: 1,358
Default

It's been a while since I've read Lister's paper, but I think they have two different approaches buried in there...one which is very labor intensive and retains the strand info...

Ryan lurks on this forum occasionally...maybe he'll chime in.
ECO is offline   Reply With Quote
Old 09-07-2009, 03:11 PM   #5
ScottC
Senior Member
 
Location: Monash University, Melbourne, Australia.

Join Date: Jan 2008
Posts: 246
Default

Also, there's this paper which deals with strand-specific RNA-seq data from a bacterium.

A strand-specific RNA-Seq analysis of the transcriptome of the typhoid bacillus Salmonella typhi.

Perkins et al., PLoS Genet. 2009 Jul;5(7):e1000569.

http://www.ncbi.nlm.nih.gov/pubmed/19609351
http://www.plosgenetics.org/article/...l.pgen.1000569
ScottC is offline   Reply With Quote
Old 10-05-2009, 09:57 PM   #6
marrykonta
Guest
 

Posts: n/a
Default Re:Why Do a Gene's Reads Appear on Both Strands? Reload this Page

Hi CompBio,

Well,Yes they can appear on two strands.Thanks for asking these query with all of us.
A different way to answer is to start with the estimate that two unrelated humans differ by about 3 million nucleotides (plus some copy number differences etc). This is 1 difference about every 1000 nucleotides.
Assume that the two parents of the siblings are unrelated, then each sibling receives a genome which is essentially the consensus (the 999 nucleotides which are present in both parents and then a random choice between 2 options for the 1000th nucleotide. The two siblings should therefore differ by about 1.5 million SNPs (plus some other differences.
These 1.5 million SNP differences far outweigh the approximately 100 new mutations for each sibling (200 total differences).

My conclusion is that difference between siblings has very little to do with mutation and is mainly due to random inheritance of different sequences from the two parents.
Mutation is important over spans of hundreds and thousands of generations.

Thanks
  Reply With Quote
Old 10-05-2009, 10:01 PM   #7
ECO
--Site Admin--
 
Location: SF Bay Area, CA, USA

Join Date: Oct 2007
Posts: 1,358
Default

Quote:
Originally Posted by marrykonta View Post
Hi CompBio,

Well,Yes they can appear on two strands.Thanks for asking these query with all of us.
A different way to answer is to start with the estimate that two unrelated humans differ by about 3 million nucleotides (plus some copy number differences etc). This is 1 difference about every 1000 nucleotides.
Assume that the two parents of the siblings are unrelated, then each sibling receives a genome which is essentially the consensus (the 999 nucleotides which are present in both parents and then a random choice between 2 options for the 1000th nucleotide. The two siblings should therefore differ by about 1.5 million SNPs (plus some other differences.
These 1.5 million SNP differences far outweigh the approximately 100 new mutations for each sibling (200 total differences).

My conclusion is that difference between siblings has very little to do with mutation and is mainly due to random inheritance of different sequences from the two parents.
Mutation is important over spans of hundreds and thousands of generations.

Thanks
Nice sneaky spammer. Copy pasting text from other similarly related websites. Bye!
ECO is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:53 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO