Seqanswers Leaderboard Ad

**dpryan** · 09-28-2013, 01:20 PM

Originally posted by rat_seq View Post

1. It is stated in the report file that reads should be evenly distributed on reference genes, otherwise the low level of randomness will affect following analysis. How exactly is the analysis affected by this? Our reads are all concentrated on the 3' end.

3' bias can be due to how the libraries were made (i.e., which kit was used and how purification was done) and if the samples were degraded. I flash freeze my samples on liquid nitrogen immediately upon extraction. Since the brain is extremely metabolically demanding, you also need to be rather quick about things (I work with mice and can sacrifice, dissect and flash freeze brain subregions in about 2 minutes/mouse when I have an assistant). The biggest issue is if there are different amounts of 3' bias between your sample or, worse, between your groups.

Originally posted by rat_seq View Post

2. The base percentage composition curves do not really overlap (A with T and C with G) and according to figures the G-C content of our samples is very low, around 20 %. Is that normal? I can't find any data on the supposed base composition of the transcriptome of rat cortical neurons. I guess it should be tissue- and species-specific, but in the mouse brain transcriptome data I was able to find the percentage of the bases was really even.

You'd need to post an image, preferably one from FastQC to get any worthwhile feedback on this.

Originally posted by rat_seq View Post

3. Two of our samples were sequenced twice and the subsequent analysis found that there are hundreds of differentially expressed genes between the identical samples. What does this say about the reliability of the results in overall?

The subsequent analyses were directly comparing technical replicates? Depending on how these were done, they're likely meaningless. Technical variability should be much less than biological variability, which you should be able to estimate with your replicates.

Originally posted by rat_seq View Post

4. The percentage of unmapped reads (to genome) is 60-70 %. I would say that this is really high, but again, I don't know what portion of the reads can be mapped back in a sequencing reaction. The percentage of multi-position matches also seems high, 20-35 %.

OK, now this is a big problem. You should absolutely not be having such a high percentage of unmapped reads (are the mapped reads really almost all multimappers?). I take it that your sequencing provider did the analysis for you. I've had bad experiences in the past with sequencing providers producing absolutely incompetent analyses, such that I won't trust an analysis that I didn't do.

This actually sounds like the sort of thing that would happen if your samples were switched and you received samples for a different organism. Maybe you can blast some of the unmapped reads and see what happens. Also, were the reads adapter trimmed?

Originally posted by rat_seq View Post

5. Regarding to the pathway analysis there are a lot of signaling pathways that are affected in these samples, even if they have nothing to do with neurons. Like bacterial and viral infections, several types of tumors. My main concern here is how can you trust the portion of tha data that seems relevant when the whole contains very unlikely details?

I wouldn't believe anything when such a low percentage of your reads map. I would suggest that you simply redo the analysis yourselves. Regarding things like "bacterial infection", keep in mind that anything related to the immune system will cause those to pop up. Depending on your treatments, perhaps one causes a bit of neural inflammation.

**rat_seq** · 09-30-2013, 01:34 AM

Thank you for your quick reply! Unfortunately I have no information on the adapter.

**bbl** · 10-04-2013, 04:58 AM

I also have similar problem with base composition with the PE RNA-seq data I was given. Even the mapped reads still have this problem. I wonder if it has something to do with the lib prep kit? We used Nextera

Originally posted by rat_seq View Post

Hello everyone,

we have just received our first RNA-Seq results/analysis and since I am completely new to interpretating the data I would like to ask for some pointers.

My main concern is the reliability/quality of the sequencing. Our samples contained amplified mRNAs from cortical neurons (rat and human) and the sequencing was done on the Illumina HiSeq 2000 platform.

Here are my questions:

1. It is stated in the report file that reads should be evenly distributed on reference genes, otherwise the low level of randomness will affect following analysis. How exactly is the analysis affected by this? Our reads are all concentrated on the 3' end.

2. The base percentage composition curves do not really overlap (A with T and C with G) and according to figures the G-C content of our samples is very low, around 20 %. Is that normal? I can't find any data on the supposed base composition of the transcriptome of rat cortical neurons. I guess it should be tissue- and species-specific, but in the mouse brain transcriptome data I was able to find the percentage of the bases was really even.

3. Two of our samples were sequenced twice and the subsequent analysis found that there are hundreds of differentially expressed genes between the identical samples. What does this say about the reliability of the results in overall?

4. The percentage of unmapped reads (to genome) is 60-70 %. I would say that this is really high, but again, I don't know what portion of the reads can be mapped back in a sequencing reaction. The percentage of multi-position matches also seems high, 20-35 %.

5. Regarding to the pathway analysis there are a lot of signaling pathways that are affected in these samples, even if they have nothing to do with neurons. Like bacterial and viral infections, several types of tumors. My main concern here is how can you trust the portion of tha data that seems relevant when the whole contains very unlikely details?

**GenoMax** · 10-04-2013, 05:50 AM

Have you done QC on your data (FastQC is a popular choice)? Can you post quality plots from FastQC analysis?

**bbl** · 10-04-2013, 06:32 AM

I am not too bothered with the first 15bp as we all know why. For the rest of the read length, the difference between AT and GC is small although they are not exact 25% each. Does it look acceptable for mapped reads in RNA-seq from NExtera kit?
Or does it pose a problem?

Attached Files

per_base_sequence_content.png (34.1 KB, 23 views)

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Today, 08:47 AM	0 responses 12 views 0 likes	Last Post by seqadmin Today, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

RNA-Seq quality

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News