![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Variant Calling from paired end RNAseq data | ron128 | Bioinformatics | 2 | 03-30-2013 01:33 AM |
SNP calling and allele specific expression by RNAseq | lewewoo | Bioinformatics | 12 | 11-25-2012 11:50 PM |
SNP calling on 454 data | bioinfosm | 454 Pyrosequencing | 13 | 12-23-2009 04:35 AM |
SNP calling on 454 data | bioinfosm | Bioinformatics | 0 | 10-15-2008 11:53 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Junior Member
Location: Pushchino, Russia Join Date: Nov 2014
Posts: 7
|
![]()
Hello NGS fellows,
I am a newbie here and would highly appreciate your advice about one particular experimental design. We have data from RNAseq experiment which was originally designed to assess differential expression. The details of experiment are as follows: 2 modalities of the phenotype Each phenotype is represented by 4 samples. 1 sample = 60 individuals pooled together at the stage of RNA isolation. Molecule – polyadenylated mRNA Sequencing chemistry – Illumina paired-end, read length - 2*100 bp My question is whether it is correct to use this RNAseq data to call for SNPs? I made previous search and found that most of people calling SNP from RNAseq use 40-1000 samples (= individuals). But they initially designed RNAseq experiment for further GWAS. I see that this analysis cannot be applied to my data (at least because in my case individual flies were pooled without barcoding – 60 flies per a sample). However, can I still call for SNPs and upload the list to database as a list of potential targets for GWAS with, for example, estimation of functional impact upon protein structure? Will they be “true” SNPs, or our experimental design makes even this step invalid? I found this paper https://www.ncbi.nlm.nih.gov/pubmed/27458203 where people used 2 phenotypes each represented by 2 samples what is almost like our experiment, but still have doubts. |
![]() |
![]() |
![]() |
#2 |
Super Moderator
Location: Walnut Creek, CA Join Date: Jan 2014
Posts: 2,707
|
![]()
I think it would be easy to find exonic SNPs that are shared by all or most of the individuals and in expressed genes. It would be difficult to say much about them - for example, even if 100% of reads indicate a SNP that does not mean it's in 100% of the individuals, and if 0% of reads indicate a SNP, that does not mean it's absent in the population. But in general, you should be able to discover approximately which SNPs are present in the population and to what extent. Using simulated data may help determine how accurate this is.
|
![]() |
![]() |
![]() |
#3 |
Junior Member
Location: Pushchino, Russia Join Date: Nov 2014
Posts: 7
|
![]()
Brian Bushnell, we have already found those exonic SNPs and annotated them, but, as you say, the question is the interpretation.
But even using simulation to confirm the accuracy, could the data be published in database or they will be rejected as not reliable? |
![]() |
![]() |
![]() |
#4 |
Senior Member
Location: Germany Join Date: Oct 2008
Posts: 415
|
![]()
Pooling is always problematic, whether simulation is done before or after or not.
I always try to dissuade experimentalists from pooling. There are so many biases in the data anyway. Pools are rarely if ever clean - i.e. derived from one phenotype - so a range of biological biases are there as well. Also, expression is highly divergent between individuals. I am no GWAS expert, but would advise against advertising this as GWAS. Perhaps a followup tests using Sanger sequencing etc of PCRs amplicons from individual (non-pooled) samples from the most important identified regions might provide clarity as to whether this is a true phenomenon or artifact of the expt design ? |
![]() |
![]() |
![]() |
#5 |
Junior Member
Location: Pushchino, Russia Join Date: Nov 2014
Posts: 7
|
![]()
colindaven, thank you! It seems that my doubts had reasons.
|
![]() |
![]() |
![]() |
Tags |
rna seq, rna seq experiment design, snp calling |
Thread Tools | |
|
|