Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Number of reads per gene per sample

    Hey everyone. I'm very new to next gen sequencing and will be planning on using a MiSeq in the near future. I had a question about number of reads. If we are looking at targeted gene expression, the number of reads required depends on how many genes we are targeting correct? So if we were to target around 200 genes what would be the calculation for finding out how many reads would be sufficient for each gene?

    Here's my thoughts and calculations using Illumina's Reagent kit V3:

    (25 million reads)/(200 targeted genes)] = 125,000 reads/gene

    (125,000 reads/gene)/(50 samples) = 2,500 reads/(gene*sample)

    Would 2,500 reads per gene in each sample be overkill? Would increasing the number of samples instead be more beneficial? Thanks for all the help and I apologize in advance if this is basic stuff.

  • #2
    It sounds like overkill, but there are a few factors I wonder about. Is each gene present at the same level (I wasn't entirely sure what kind of library prep you are doing)? If they are present at different levels then you need enough sequence reads to get sufficient depth on the gene with the smallest presence.

    Are all the genes the same size? Again, depends on the prep method, but if differently-sized then longer genes will either get or need more reads.

    Probably no matter the prep method you'll get some variation in presence of genes and of samples. If the genes vary 5-fold and the samples vary 5-fold, the best gene in the best sample will get 25-fold more reads than the worst. Design everything so the worst samples and the worst genes still get enough reads.

    So if some genes are expressed 100-fold higher than others and the sample variation is 5-fold then the worst sample's lowest-expressed gene may get under 10 reads. Or maybe all these potential factors don't come into play at all!
    Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

    Comment


    • #3
      Thanks for the info! The library prep is using a TruSeq Targeted RNA Expression kit (http://www.illumina.com/products/tru...sion-kits.html). So the RNA will be fragmented into roughly equal sizes. The genes I assume will be of differing lengths and differing levels. For the variation in gene presence and samples, what you said makes sense and should be considered. Thanks. A further question that arises is that is it better to increase the reads/gene and lower the number of samples or increase the samples and lower the number of reads/gene? If it helps we are looking at differential gene expression of cells grown in one condition vs. another.

      Comment


      • #4
        That's a neat protocol... I hadn't noticed it before. I guess if I were looking to maximize my return using this I would look up expression levels of my targets to see what kind of range they typically have, and then balance picking genes of interest with the desire to have a small dynamic range of expression across the set of genes and conditions.

        Samples vs reads... gene expression measurements when expression is low will be dominated by sampling statistics (i.e. 4 reads vs 8 reads could be due to random sampling variation). But more sample repeats will give you much more power for all genes, particularly if there is high biological variation. I'd add samples since it will help your stats for all the genes, whereas adding reads will help just the bottom end.
        Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

        Comment


        • #5
          Ahh yes that makes sense. Thanks so much for your help!

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM
          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          30 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          32 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          28 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          52 views
          0 likes
          Last Post seqadmin  
          Working...
          X