SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   RNA Sequencing (http://seqanswers.com/forums/forumdisplay.php?f=26)
-   -   SNP calling from RNAseq data (http://seqanswers.com/forums/showthread.php?t=77494)

NinaG 08-09-2017 07:07 AM

SNP calling from RNAseq data
 
Hello NGS fellows,

I am a newbie here and would highly appreciate your advice about one particular experimental design.

We have data from RNAseq experiment which was originally designed to assess differential expression. The details of experiment are as follows:

2 modalities of the phenotype

Each phenotype is represented by 4 samples. 1 sample = 60 individuals pooled together at the stage of RNA isolation.

Molecule – polyadenylated mRNA

Sequencing chemistry – Illumina paired-end, read length - 2*100 bp

My question is whether it is correct to use this RNAseq data to call for SNPs? I made previous search and found that most of people calling SNP from RNAseq use 40-1000 samples (= individuals). But they initially designed RNAseq experiment for further GWAS. I see that this analysis cannot be applied to my data (at least because in my case individual flies were pooled without barcoding – 60 flies per a sample). However, can I still call for SNPs and upload the list to database as a list of potential targets for GWAS with, for example, estimation of functional impact upon protein structure? Will they be “true” SNPs, or our experimental design makes even this step invalid?

I found this paper https://www.ncbi.nlm.nih.gov/pubmed/27458203 where people used 2 phenotypes each represented by 2 samples what is almost like our experiment, but still have doubts.

Brian Bushnell 08-09-2017 10:55 AM

I think it would be easy to find exonic SNPs that are shared by all or most of the individuals and in expressed genes. It would be difficult to say much about them - for example, even if 100% of reads indicate a SNP that does not mean it's in 100% of the individuals, and if 0% of reads indicate a SNP, that does not mean it's absent in the population. But in general, you should be able to discover approximately which SNPs are present in the population and to what extent. Using simulated data may help determine how accurate this is.

NinaG 08-10-2017 04:49 AM

Brian Bushnell, we have already found those exonic SNPs and annotated them, but, as you say, the question is the interpretation.


Quote:

Originally Posted by Brian Bushnell (Post 210006)
But in general, you should be able to discover approximately which SNPs are present in the population and to what extent. Using simulated data may help determine how accurate this is.

But even using simulation to confirm the accuracy, could the data be published in database or they will be rejected as not reliable?

colindaven 08-10-2017 11:27 PM

Pooling is always problematic, whether simulation is done before or after or not.

I always try to dissuade experimentalists from pooling. There are so many biases in the data anyway. Pools are rarely if ever clean - i.e. derived from one phenotype - so a range of biological biases are there as well. Also, expression is highly divergent between individuals.

I am no GWAS expert, but would advise against advertising this as GWAS. Perhaps a followup tests using Sanger sequencing etc of PCRs amplicons from individual (non-pooled) samples from the most important identified regions might provide clarity as to whether this is a true phenomenon or artifact of the expt design ?

NinaG 08-11-2017 06:49 AM

colindaven, thank you! It seems that my doubts had reasons.


All times are GMT -8. The time now is 12:52 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.