SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
Variant Calling from paired end RNAseq data ron128 Bioinformatics 2 03-30-2013 12:33 AM
SNP calling and allele specific expression by RNAseq lewewoo Bioinformatics 12 11-25-2012 10:50 PM
SNP calling on 454 data bioinfosm 454 Pyrosequencing 13 12-23-2009 03:35 AM
SNP calling on 454 data bioinfosm Bioinformatics 0 10-15-2008 10:53 AM

Reply
 
Thread Tools
Old 08-09-2017, 07:07 AM   #1
NinaG
Junior Member
 
Location: Pushchino, Russia

Join Date: Nov 2014
Posts: 7
Default SNP calling from RNAseq data

Hello NGS fellows,

I am a newbie here and would highly appreciate your advice about one particular experimental design.

We have data from RNAseq experiment which was originally designed to assess differential expression. The details of experiment are as follows:

2 modalities of the phenotype

Each phenotype is represented by 4 samples. 1 sample = 60 individuals pooled together at the stage of RNA isolation.

Molecule – polyadenylated mRNA

Sequencing chemistry – Illumina paired-end, read length - 2*100 bp

My question is whether it is correct to use this RNAseq data to call for SNPs? I made previous search and found that most of people calling SNP from RNAseq use 40-1000 samples (= individuals). But they initially designed RNAseq experiment for further GWAS. I see that this analysis cannot be applied to my data (at least because in my case individual flies were pooled without barcoding – 60 flies per a sample). However, can I still call for SNPs and upload the list to database as a list of potential targets for GWAS with, for example, estimation of functional impact upon protein structure? Will they be “true” SNPs, or our experimental design makes even this step invalid?

I found this paper https://www.ncbi.nlm.nih.gov/pubmed/27458203 where people used 2 phenotypes each represented by 2 samples what is almost like our experiment, but still have doubts.
NinaG is offline   Reply With Quote
Old 08-09-2017, 10:55 AM   #2
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,638
Default

I think it would be easy to find exonic SNPs that are shared by all or most of the individuals and in expressed genes. It would be difficult to say much about them - for example, even if 100% of reads indicate a SNP that does not mean it's in 100% of the individuals, and if 0% of reads indicate a SNP, that does not mean it's absent in the population. But in general, you should be able to discover approximately which SNPs are present in the population and to what extent. Using simulated data may help determine how accurate this is.
Brian Bushnell is offline   Reply With Quote
Old 08-10-2017, 04:49 AM   #3
NinaG
Junior Member
 
Location: Pushchino, Russia

Join Date: Nov 2014
Posts: 7
Default

Brian Bushnell, we have already found those exonic SNPs and annotated them, but, as you say, the question is the interpretation.


Quote:
Originally Posted by Brian Bushnell View Post
But in general, you should be able to discover approximately which SNPs are present in the population and to what extent. Using simulated data may help determine how accurate this is.
But even using simulation to confirm the accuracy, could the data be published in database or they will be rejected as not reliable?
NinaG is offline   Reply With Quote
Old 08-10-2017, 11:27 PM   #4
colindaven
Senior Member
 
Location: Germany

Join Date: Oct 2008
Posts: 391
Default

Pooling is always problematic, whether simulation is done before or after or not.

I always try to dissuade experimentalists from pooling. There are so many biases in the data anyway. Pools are rarely if ever clean - i.e. derived from one phenotype - so a range of biological biases are there as well. Also, expression is highly divergent between individuals.

I am no GWAS expert, but would advise against advertising this as GWAS. Perhaps a followup tests using Sanger sequencing etc of PCRs amplicons from individual (non-pooled) samples from the most important identified regions might provide clarity as to whether this is a true phenomenon or artifact of the expt design ?
colindaven is offline   Reply With Quote
Old 08-11-2017, 06:49 AM   #5
NinaG
Junior Member
 
Location: Pushchino, Russia

Join Date: Nov 2014
Posts: 7
Default

colindaven, thank you! It seems that my doubts had reasons.
NinaG is offline   Reply With Quote
Reply

Tags
rna seq, rna seq experiment design, snp calling

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:44 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO