Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • RNA-seq read depths: observed vs. expected

    Dear all,

    We recently submitted RNA-seq samples for sequencing to a local facility (20 cancer samples, 20 controls), where for each sample we sequenced approximately 100 million paired end 50 bp reads. However, after sequencing we found that several samples after sequencing only contained 6 million sequenced reads and others 50-80 million reads. Only about 5 samples had 90 million reads and above, with one sample having 180 million reads.

    I am not sure what to make of this. There were some concerns regarding the RNA quality, of some of the samples, but I am not sure if that could lead to such low output. Our contact at the facility seems to suggest it is the RNA-quality, but I wanted to ask you experts just to be sure.

    The fastQC analysis on the sequences do not show any significant issues in terms of quality, however it appears the filtering may be occuring during or just after the sequencing itself. If anyone has any ideas I would be much appreciative.

  • #2
    If you're pooling 40 samples together to spread across all the lanes then its very important to get the molar ratios correct. I suppose that much is obvious. Also, it is important that all the samples have a similar size distribution. Illumina tech just prefers inserts that are smaller, so if you had some samples with 250bp average sizes and some with 600, that could make a big difference in clustering efficiency of each sample. Now RNA integrity is an issue too, but of the three things RNA integrity shouldn't have as much effect on sequencing depth in pooled samples. If you got the other two things right, that just shouldn't be an issue in this case (it could lead to poor data for other reasons just not really relative sequencing depth).

    Also, if only 5 samples came with in 90% of your expected depth of sequencing, I would suspect something went wrong with the actual sequencing. Either clustering didn't work well or barcodes weren't read correctly for a lot of the reads. Though I am curious, you say you have 40 samples total, and you're sequencing to 100M PE reads each, that would be 20 HiSeq lanes, as you shouldn't expect more than 200M PE reads per lane. Is that what you actually did? Or are you counting them like single end reads, leading to 400M total reads per lane?

    Most people are now sequencing about 50M reads per sample (either 2x100, so really 100M reads but they are paired so statistically its still 50M, or 1x50). So, if most of your samples are around 50M-80M that should be fine.

    Comment


    • #3
      Thank you for the reply! For clarification, I meant that we are getting only 50-80 million paired end reads, that is, only 100-160 million reads total for a given sample. It sounds like something went wrong with the sequencing but the facility may not want to tell us (this is not through illumina, a local university).

      In terms of downstream analyses, we tried to look at alternative splicing (our main interest) using cuffdiff using all samples, and when we did so we found no significant alternative splicing events. When I filtered out samples that had less than 90 million paired end reads off the sequencer we got about 600 significant alternatively spliced genes and a lot of DE genes. I am wondering whether filtering out samples based on the resulting sequencing depth is the way to go, or if we should question the entire set to begin with. In mapping with Tophat, in almost all samples I am seeing a lot of reads mapping to multiple places in the genome. So if we had 200m sequenced reads (100m paired end reads) we observe almost 300-400m reads in the accepted_hits.bam file. This is all making me a bit nervous.

      Comment


      • #4
        50-80M paired end reads per sample and 20 replicates for control and cancer cells is a huge data set for RNA-seq. Even if that's not what you paid for, you should be able to find plenty of differentially expressed isoforms, if they are there to find. And tophat -> cuffdiff is probably the best way to go with isoforms. Though the other option is to use DESeq for exon level tests, to find differentially expressed exons, then track them back to what isoforms they could be from. Its interesting you chose 2x50 reads for isoform tests. While its good you went for paired end, the extra 50bp on each read would have been pretty helpful when it comes to resolving isoforms.

        I think you are right to set a read depth cut off to include your replicates. I'd suggest maybe 20M-30M PE reads. But it might depend on what your read depth per sample distribution looks like.

        As for if there was a problem with the sequencing, do you know how many lanes you payed for? Without knowing that, its hard to judge just how wrong the sequencing might have went.

        Comment


        • #5
          I just followed up on this, and it looks like we sequenced two individuals per lane, so we duplexed the sequencing. From what we are seeing, it looking like the variability in sequencing depth is specific to this set of samples, and not seen as much in other projects we have done.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM
          • seqadmin
            Techniques and Challenges in Conservation Genomics
            by seqadmin



            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

            Avian Conservation
            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
            03-08-2024, 10:41 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 03-27-2024, 06:37 PM
          0 responses
          16 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-27-2024, 06:07 PM
          0 responses
          13 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-22-2024, 10:03 AM
          0 responses
          56 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-21-2024, 07:32 AM
          0 responses
          70 views
          0 likes
          Last Post seqadmin  
          Working...
          X