Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Low gene count assignment for ribo-depleted RNA-Seq data

    I originally posted this question on biostars but received little response. I'll be sure to update either post if I receive any more detail.

    Until recently, we have used a poly(A) selection process to prepare our RNA-Seq libraries. In our last run we had to use a ribo-depletion approach instead, as we want to study some formalin-fixed (FF) material with degraded RNA. The facility use Illumina's Ribo-Zero kit. We otherwise kept the same sequencing parameters: paired-end 75bp reverse stranded on an Illumina HiSeq 4000.

    Since we don't know how well the FF material represents the original tissue, we also sequenced a few frozen tissue samples, with the intention of comparing the two (though they are _not_ perfectly matched). In total we have 3 FF samples and 2 frozen samples.

    Short version:

    Both frozen tissue and FFPE results show a low number of reads being assigned to an exon. This is ~60% for FFPE and ~25% for frozen samples, which I did not expect. Is this an issue and can I still compare the two after normalisation for different effective library sizes?

    More detail:

    I ran the reads through my usual pipeline:

    fastQC all looked OK, some highly duplicated sequences, probably rRNA associated, but nothing too major.
    STAR alignment resulted in ~90% reads being uniquely assigned in all cases (similar to our poly(A) samples)
    I had STAR run gene counts during alignment. The results differed from what I've typically seen in the poly(A) data in terms of the % of reads that assign to a (unique) gene.

    Poly(A): we usually get 80-85%
    Ribo-depleted FF samples: 24%, 24%, 26%
    Ribo-depleted frozen samples: 58%, 59%
    So in both cases the numbers assigned are far lower than for poly(A), and this is especially bad for the FF samples. Most of the reads that were not assigned belonged in the 'no feature' category, i.e. they didn't overlap with any exon.

    It occurs to me that this difference is probably due to the larger variety of RNA species: poly(A) should enrich primarily for mRNA, while ribo-depletion leaves in ncRNA species, etc. Therefore fewer reads will be mRNA and fall within an exon for gene counting purposes. I ran ezBAMqc to check the distribution of the aligned reads in the BAM files:

    FF sample



    frozen sample



    I dug out a similar plot for one of our poly(A) samples (below). The % intronic reads is indeed much lower.



    My hypothesis: the FF library is dominated by species other than mRNA.

    Does this sound like a reasonable explanation?

    Is the very low proportion of exon-assigned counts a problem (other than being wasteful)?

    Is it still reasonable to compare the gene counts of the FF and frozen samples? I would normalise for the total number of reads, but is that sufficient?

    Thanks for any thoughts.

  • #2
    What do you mean with your comment that the frozen and FF samples are "not perfectly matched".

    The only unexpected aspect of these result to me are the differences between the FF and frozen number of intron-mapping reads. But given the large size of animal introns, it would not take a large percentage of non-spliced reads to produce a large number of intron-mapping reads.

    But as to why the number of intron-mapping reads would be different between FF and frozen cell RNA preps -- that has me mystified. Unless the FF cells were treated with something that would stop transcription but allow transcript maturation to continue.

    --
    Phillip

    Comment


    • #3
      Originally posted by pmiguel View Post
      What do you mean with your comment that the frozen and FF samples are "not perfectly matched".
      Sorry, that was a bit cryptic in retrospect! I mean that the samples are clinical; the surgeon was not attempting to capture the exact same tissue in the FF and frozen samples. So whilst they are 'patient matched', we can't be certain that their epigenetic profiles are comparable, because they may have differing tumour content and differing amounts of healthy cells. I wouldn't imagine that this could lead to such a large difference in the intronic content, though?

      The only unexpected aspect of these result to me are the differences between the FF and frozen number of intron-mapping reads. But given the large size of animal introns, it would not take a large percentage of non-spliced reads to produce a large number of intron-mapping reads.

      But as to why the number of intron-mapping reads would be different between FF and frozen cell RNA preps -- that has me mystified. Unless the FF cells were treated with something that would stop transcription but allow transcript maturation to continue.
      I'll check, but I think the formalin fixing process was fairly standard clinical practice. Also quite straightforward for frozen tissue: mash it up (a joyful task, I'm assured), extract total RNA, run library preparation.

      Thanks for your thoughts.
      Last edited by gabe_rosser; 04-21-2017, 02:27 AM. Reason: Accidentally referred to frozen as FF

      Comment


      • #4
        You should expect to see considerably lower counts to exons, mostly replaced by intron derived reads when comparing a ribo depletion to dT purified.
        I'm surprised that your FF has lower exon counts than the FFPE, though. Generally, I've seen decreased exon and increased intron counts in FFPE samples, supposedly because retained introns are protected from degradation by the nuclear envelope.

        Figure of expected mapping statistics from dT vs depletion from Clontech

        Comment


        • #5
          Originally posted by cmbetts View Post
          You should expect to see considerably lower counts to exons, mostly replaced by intron derived reads when comparing a ribo depletion to dT purified.
          I'm surprised that your FF has lower exon counts than the FFPE, though. Generally, I've seen decreased exon and increased intron counts in FFPE samples, supposedly because retained introns are protected from degradation by the nuclear envelope.
          Apologies, I think I've accidentally caused confusion by using the abbreviation FF = FFPE in my original post! This means my results do agree with what you've said: frozen has higher exon counts than FFPE.

          Thanks for the plot, that's helpful.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM
          • seqadmin
            Techniques and Challenges in Conservation Genomics
            by seqadmin



            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

            Avian Conservation
            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
            03-08-2024, 10:41 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 06:37 PM
          0 responses
          8 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, Yesterday, 06:07 PM
          0 responses
          8 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-22-2024, 10:03 AM
          0 responses
          49 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-21-2024, 07:32 AM
          0 responses
          67 views
          0 likes
          Last Post seqadmin  
          Working...
          X