![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
RNA-seq tag distribution | RadAniba | Bioinformatics | 1 | 01-12-2012 01:54 PM |
Chromosome distribution of RNA seq data | lintianfeng | Bioinformatics | 1 | 09-30-2011 08:05 AM |
RNA-Seq: ConReg-R: Extrapolative recalibration of the empirical distribution of p-val | Newsbot! | Literature Watch | 0 | 05-21-2011 03:31 AM |
RNA-Seq: Using non-uniform read distribution models to improve isoform expression inf | Newsbot! | Literature Watch | 0 | 12-21-2010 03:00 AM |
rna-seq read distribution | wenhuang | Bioinformatics | 1 | 06-17-2010 10:07 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: Raleigh, NC Join Date: Feb 2010
Posts: 30
|
![]()
Hi,
I wonder how reads mapped to the genome (contiguously or to junctions) are distributed. My own experience has surprisingly high fraction mapped to introns (over 30% of reads mapped to known genes). There could be many explanations: 1) pre-mRNA 2) DNA contamination, which I expect to be relatively uniform across all genes, but not in my case. But I found over 15% of mapped reads were to the mitochondrial genome. Well, it does contain genes (especially rRNA and tRNA), so not all of the reads may be from DNA. But I am not sure what this number really means. 3) erroneous mapping 4) novel exons 5) splicing that retains introns etc. Of course, introns are much longer, so if you count reads per unit length, the fraction goes down. There are also conflicting evidence in the literature: The Mortazavi (2008) paper reported 4% intronic reads and 93% exonic, while Marioni (2008) had a similar number (32% of reads mapped to genes are intronic) with what I have seen. I am wondering what people on this forum have seen in their experience. Thanks! Wen |
![]() |
![]() |
![]() |
#2 |
Member
Location: Rockville Join Date: May 2009
Posts: 40
|
![]()
I got the similar data as yours. What is the length of your reads, and what is your method to do the purification.poly-A or ribo-minus sth?
|
![]() |
![]() |
![]() |
#3 |
Member
Location: Raleigh, NC Join Date: Feb 2010
Posts: 30
|
![]()
I have limited amount of RNA ~10ng, so I amplified it using Ambion's MessageAmp and sequenced the aRNA by a 75x2 GA run.
I read somewhere on this forum that intron retention is more than frequent, but I cannot find it anymore... |
![]() |
![]() |
![]() |
#4 |
Member
Location: NY Join Date: Mar 2009
Posts: 11
|
![]()
These metrics are highly annotation dependent. Consider, for example, the variation in the number of hg18 annotated bases according to the following databases,
knownGene = 79,498,653 refGene = 66,601,430 ensGene = 70,647,021 acembly = 177,417,935 (as retrieved from UCSC Table Browser, May 31, 2010). |
![]() |
![]() |
![]() |
#5 |
Member
Location: Raleigh, NC Join Date: Feb 2010
Posts: 30
|
![]()
I am not sure it is so dependent on annotation.
The 30% intronic reads I got was the fraction of reads mapped to known genes, not total mapped reads. If you have a less complete annotation, exonic reads are less too. |
![]() |
![]() |
![]() |
#6 |
Senior Member
Location: Southern France Join Date: Aug 2009
Posts: 269
|
![]()
It is easier to compare if you keep the proportion with respect to the total number of mapped reads. The annotation does matter, but it is true that this impact should be limited if you consider the ratio exon vs. intron. It depends more on the protocol. For instance, Li et al. (PNAS, 2008) also reported about 40% of exonic and 20% of intronic, but i think it was about microRNAs. You can find a related thread here
|
![]() |
![]() |
![]() |
#7 |
Member
Location: North Carolina Join Date: Jan 2010
Posts: 82
|
![]()
I'm getting roughly 17% intronic.
Clearly the % depends on the genome/annotation but I am wondering how people are handling this? This seems to be quite a challenge for Cufflinks (for example) to predict transcripts. Does anyone have any strategies for filtering intronic reads (particularly ones that are likely to represent background/ precursor mRNA). Such reads seem to be vastly inflating the number of predicted transcripts I get. Cufflinks does have an option (-j) that is aimed at dealing with this, but I haven't found it to help much. Does anyone have any experience with this? Suggested values for that parameter? Thanks! Chris |
![]() |
![]() |
![]() |
#8 |
Member
Location: Raleigh, NC Join Date: Feb 2010
Posts: 30
|
![]()
I don't think intronic read fraction depends on annotation, unless you count "intergenic" reads as intronic.
I did a highly simplified calculation to see the effect of pre-mRNA fraction. Assuming that exons are 1/20 of transcripts (roughly right for bovine), and reads are uniformly distributed across the transcripts, I got Pre-mRNA fraction Intronic read fraction 1% 16% 2% 28% 5% 49% I think pre-mRNA "contamination" is a more likely explanation. I did see the same problem as yours that Cufflinks assembled many transcripts. Scripture appeared to outperform an earlier version of Cufflinks in this respect. It seemed to me Scripture also models the significance of seeing reads above background. |
![]() |
![]() |
![]() |
#9 |
Member
Location: North Carolina Join Date: Jan 2010
Posts: 82
|
![]()
I was thinking - for example - that there could be undescribed exons in the "introns" . I am not working on a standard model system... But yes, my assumption is that pre-mRNA is the problem.
Scripture looks interesting ... BUt raises another question: Scripture seems to be rely on paired end data? (haven't read closely yet) How much improvement in assembly (dealing specifically with pre-mRNA) does one get with paired-end data. Cufflinks too is primarily described for paired end data, but the manual suggests that it "works well" with single-end. I haven't seen anything in the way of single end assembly benchmarks? |
![]() |
![]() |
![]() |
#10 |
Member
Location: Raleigh, NC Join Date: Feb 2010
Posts: 30
|
![]()
I think both Cufflinks and Scripture can do single end data, at least their strategy (very similar) to stitch alignments together does not seem to need paired end data. Of course, paired end data will improve sizes of assemblies. I personally think junction reads are much more important than paired end reads in assembling RNA-Seq alignments, as most protocols select an insert size around 300bp, the gain you get by sequencing the other end is probably not that much. And junction reads are where alignment errors are more likely to occur, which mess up with assembly as well. I have seen apparently wrong gene models from Scripture/Cufflinks because of wrong junction alignments.
|
![]() |
![]() |
![]() |
Thread Tools | |
|
|