Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • RNA-Seq quality

    Hello everyone,

    we have just received our first RNA-Seq results/analysis and since I am completely new to interpretating the data I would like to ask for some pointers.

    My main concern is the reliability/quality of the sequencing. Our samples contained amplified mRNAs from cortical neurons (rat and human) and the sequencing was done on the Illumina HiSeq 2000 platform.

    Here are my questions:

    1. It is stated in the report file that reads should be evenly distributed on reference genes, otherwise the low level of randomness will affect following analysis. How exactly is the analysis affected by this? Our reads are all concentrated on the 3' end.

    2. The base percentage composition curves do not really overlap (A with T and C with G) and according to figures the G-C content of our samples is very low, around 20 %. Is that normal? I can't find any data on the supposed base composition of the transcriptome of rat cortical neurons. I guess it should be tissue- and species-specific, but in the mouse brain transcriptome data I was able to find the percentage of the bases was really even.

    3. Two of our samples were sequenced twice and the subsequent analysis found that there are hundreds of differentially expressed genes between the identical samples. What does this say about the reliability of the results in overall?

    4. The percentage of unmapped reads (to genome) is 60-70 %. I would say that this is really high, but again, I don't know what portion of the reads can be mapped back in a sequencing reaction. The percentage of multi-position matches also seems high, 20-35 %.

    5. Regarding to the pathway analysis there are a lot of signaling pathways that are affected in these samples, even if they have nothing to do with neurons. Like bacterial and viral infections, several types of tumors. My main concern here is how can you trust the portion of tha data that seems relevant when the whole contains very unlikely details?

  • #2
    Originally posted by rat_seq View Post
    1. It is stated in the report file that reads should be evenly distributed on reference genes, otherwise the low level of randomness will affect following analysis. How exactly is the analysis affected by this? Our reads are all concentrated on the 3' end.
    3' bias can be due to how the libraries were made (i.e., which kit was used and how purification was done) and if the samples were degraded. I flash freeze my samples on liquid nitrogen immediately upon extraction. Since the brain is extremely metabolically demanding, you also need to be rather quick about things (I work with mice and can sacrifice, dissect and flash freeze brain subregions in about 2 minutes/mouse when I have an assistant). The biggest issue is if there are different amounts of 3' bias between your sample or, worse, between your groups.

    Originally posted by rat_seq View Post
    2. The base percentage composition curves do not really overlap (A with T and C with G) and according to figures the G-C content of our samples is very low, around 20 %. Is that normal? I can't find any data on the supposed base composition of the transcriptome of rat cortical neurons. I guess it should be tissue- and species-specific, but in the mouse brain transcriptome data I was able to find the percentage of the bases was really even.
    You'd need to post an image, preferably one from FastQC to get any worthwhile feedback on this.

    Originally posted by rat_seq View Post
    3. Two of our samples were sequenced twice and the subsequent analysis found that there are hundreds of differentially expressed genes between the identical samples. What does this say about the reliability of the results in overall?
    The subsequent analyses were directly comparing technical replicates? Depending on how these were done, they're likely meaningless. Technical variability should be much less than biological variability, which you should be able to estimate with your replicates.

    Originally posted by rat_seq View Post
    4. The percentage of unmapped reads (to genome) is 60-70 %. I would say that this is really high, but again, I don't know what portion of the reads can be mapped back in a sequencing reaction. The percentage of multi-position matches also seems high, 20-35 %.
    OK, now this is a big problem. You should absolutely not be having such a high percentage of unmapped reads (are the mapped reads really almost all multimappers?). I take it that your sequencing provider did the analysis for you. I've had bad experiences in the past with sequencing providers producing absolutely incompetent analyses, such that I won't trust an analysis that I didn't do.

    This actually sounds like the sort of thing that would happen if your samples were switched and you received samples for a different organism. Maybe you can blast some of the unmapped reads and see what happens. Also, were the reads adapter trimmed?

    Originally posted by rat_seq View Post
    5. Regarding to the pathway analysis there are a lot of signaling pathways that are affected in these samples, even if they have nothing to do with neurons. Like bacterial and viral infections, several types of tumors. My main concern here is how can you trust the portion of tha data that seems relevant when the whole contains very unlikely details?
    I wouldn't believe anything when such a low percentage of your reads map. I would suggest that you simply redo the analysis yourselves. Regarding things like "bacterial infection", keep in mind that anything related to the immune system will cause those to pop up. Depending on your treatments, perhaps one causes a bit of neural inflammation.
    Last edited by dpryan; 09-28-2013, 01:21 PM. Reason: Fix tags

    Comment


    • #3
      Thank you for your quick reply! Unfortunately I have no information on the adapter.

      Comment


      • #4
        I also have similar problem with base composition with the PE RNA-seq data I was given. Even the mapped reads still have this problem. I wonder if it has something to do with the lib prep kit? We used Nextera


        Originally posted by rat_seq View Post
        Hello everyone,

        we have just received our first RNA-Seq results/analysis and since I am completely new to interpretating the data I would like to ask for some pointers.

        My main concern is the reliability/quality of the sequencing. Our samples contained amplified mRNAs from cortical neurons (rat and human) and the sequencing was done on the Illumina HiSeq 2000 platform.

        Here are my questions:

        1. It is stated in the report file that reads should be evenly distributed on reference genes, otherwise the low level of randomness will affect following analysis. How exactly is the analysis affected by this? Our reads are all concentrated on the 3' end.

        2. The base percentage composition curves do not really overlap (A with T and C with G) and according to figures the G-C content of our samples is very low, around 20 %. Is that normal? I can't find any data on the supposed base composition of the transcriptome of rat cortical neurons. I guess it should be tissue- and species-specific, but in the mouse brain transcriptome data I was able to find the percentage of the bases was really even.

        3. Two of our samples were sequenced twice and the subsequent analysis found that there are hundreds of differentially expressed genes between the identical samples. What does this say about the reliability of the results in overall?

        4. The percentage of unmapped reads (to genome) is 60-70 %. I would say that this is really high, but again, I don't know what portion of the reads can be mapped back in a sequencing reaction. The percentage of multi-position matches also seems high, 20-35 %.

        5. Regarding to the pathway analysis there are a lot of signaling pathways that are affected in these samples, even if they have nothing to do with neurons. Like bacterial and viral infections, several types of tumors. My main concern here is how can you trust the portion of tha data that seems relevant when the whole contains very unlikely details?

        Comment


        • #5
          Have you done QC on your data (FastQC is a popular choice)? Can you post quality plots from FastQC analysis?

          Comment


          • #6
            I am not too bothered with the first 15bp as we all know why. For the rest of the read length, the difference between AT and GC is small although they are not exact 25% each. Does it look acceptable for mapped reads in RNA-seq from NExtera kit?
            Or does it pose a problem?
            Attached Files

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin




              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
              04-22-2024, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Today, 08:47 AM
            0 responses
            12 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            60 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            59 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            54 views
            0 likes
            Last Post seqadmin  
            Working...
            X