Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • RNA-seq read distribution

    Hi,

    I wonder how reads mapped to the genome (contiguously or to junctions) are distributed.

    My own experience has surprisingly high fraction mapped to introns (over 30% of reads mapped to known genes). There could be many explanations:

    1) pre-mRNA
    2) DNA contamination, which I expect to be relatively uniform across all genes, but not in my case. But I found over 15% of mapped reads were to the mitochondrial genome. Well, it does contain genes (especially rRNA and tRNA), so not all of the reads may be from DNA. But I am not sure what this number really means.
    3) erroneous mapping
    4) novel exons
    5) splicing that retains introns
    etc.

    Of course, introns are much longer, so if you count reads per unit length, the fraction goes down.

    There are also conflicting evidence in the literature:

    The Mortazavi (2008) paper reported 4% intronic reads and 93% exonic, while Marioni (2008) had a similar number (32% of reads mapped to genes are intronic) with what I have seen.

    I am wondering what people on this forum have seen in their experience.

    Thanks!

    Wen

  • #2
    I got the similar data as yours. What is the length of your reads, and what is your method to do the purification.poly-A or ribo-minus sth?

    Comment


    • #3
      I have limited amount of RNA ~10ng, so I amplified it using Ambion's MessageAmp and sequenced the aRNA by a 75x2 GA run.

      I read somewhere on this forum that intron retention is more than frequent, but I cannot find it anymore...

      Comment


      • #4
        These metrics are highly annotation dependent. Consider, for example, the variation in the number of hg18 annotated bases according to the following databases,

        knownGene = 79,498,653
        refGene = 66,601,430
        ensGene = 70,647,021
        acembly = 177,417,935

        (as retrieved from UCSC Table Browser, May 31, 2010).

        Comment


        • #5
          I am not sure it is so dependent on annotation.

          The 30% intronic reads I got was the fraction of reads mapped to known genes, not total mapped reads. If you have a less complete annotation, exonic reads are less too.

          Comment


          • #6
            It is easier to compare if you keep the proportion with respect to the total number of mapped reads. The annotation does matter, but it is true that this impact should be limited if you consider the ratio exon vs. intron. It depends more on the protocol. For instance, Li et al. (PNAS, 2008) also reported about 40% of exonic and 20% of intronic, but i think it was about microRNAs. You can find a related thread here

            Comment


            • #7
              intronic reads

              I'm getting roughly 17% intronic.

              Clearly the % depends on the genome/annotation but I am wondering how people are handling this? This seems to be quite a challenge for Cufflinks (for example) to predict transcripts.

              Does anyone have any strategies for filtering intronic reads (particularly ones that are likely to represent background/ precursor mRNA). Such reads seem to be vastly inflating the number of predicted transcripts I get.

              Cufflinks does have an option (-j) that is aimed at dealing with this, but I haven't found it to help much. Does anyone have any experience with this? Suggested values for that parameter?

              Thanks!

              Chris

              Comment


              • #8
                I don't think intronic read fraction depends on annotation, unless you count "intergenic" reads as intronic.

                I did a highly simplified calculation to see the effect of pre-mRNA fraction.

                Assuming that exons are 1/20 of transcripts (roughly right for bovine), and reads are uniformly distributed across the transcripts, I got

                Pre-mRNA fraction Intronic read fraction
                1% 16%
                2% 28%
                5% 49%

                I think pre-mRNA "contamination" is a more likely explanation.

                I did see the same problem as yours that Cufflinks assembled many transcripts. Scripture appeared to outperform an earlier version of Cufflinks in this respect. It seemed to me Scripture also models the significance of seeing reads above background.

                Comment


                • #9
                  Scripture

                  I was thinking - for example - that there could be undescribed exons in the "introns" . I am not working on a standard model system... But yes, my assumption is that pre-mRNA is the problem.

                  Scripture looks interesting ...

                  BUt raises another question:

                  Scripture seems to be rely on paired end data? (haven't read closely yet)

                  How much improvement in assembly (dealing specifically with pre-mRNA) does one get with paired-end data. Cufflinks too is primarily described for paired end data, but the manual suggests that it "works well" with single-end. I haven't seen anything in the way of single end assembly benchmarks?

                  Comment


                  • #10
                    I think both Cufflinks and Scripture can do single end data, at least their strategy (very similar) to stitch alignments together does not seem to need paired end data. Of course, paired end data will improve sizes of assemblies. I personally think junction reads are much more important than paired end reads in assembling RNA-Seq alignments, as most protocols select an insert size around 300bp, the gain you get by sequencing the other end is probably not that much. And junction reads are where alignment errors are more likely to occur, which mess up with assembly as well. I have seen apparently wrong gene models from Scripture/Cufflinks because of wrong junction alignments.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM
                    • seqadmin
                      Techniques and Challenges in Conservation Genomics
                      by seqadmin



                      The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                      Avian Conservation
                      Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                      03-08-2024, 10:41 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, Yesterday, 06:37 PM
                    0 responses
                    8 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, Yesterday, 06:07 PM
                    0 responses
                    8 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-22-2024, 10:03 AM
                    0 responses
                    49 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-21-2024, 07:32 AM
                    0 responses
                    66 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X