![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
CLC and SNP discovery | extari | Bioinformatics | 9 | 04-15-2011 02:32 AM |
reference-free SNP discovery | Marius | De novo discovery | 5 | 03-30-2011 12:23 PM |
PubMed: SNP discovery by transcriptome pyrosequencing. | Newsbot! | Literature Watch | 0 | 03-03-2011 03:00 AM |
HELP: some suggestions for SNP discovery in 454? | linikujp | Bioinformatics | 1 | 04-07-2010 01:39 AM |
Nonsynonymous SNP (nsSNP) discovery tools? | jpeaco02 | Bioinformatics | 2 | 11-08-2009 02:13 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: Montreal Join Date: Oct 2009
Posts: 63
|
![]()
I have paired-end (76bp) output from a GA in which I would like to try snp discovery. The hiccup is there is no reference genome for my specie.
Does anyone have any ideas, or know any tool that could do this? Most of the tools that do snp discovery well, use a pre aligned dataset to work on. If I were to assemble the data, is there something that could to ace->(snp discovery tool format) to do the work? Thanks |
![]() |
![]() |
![]() |
#2 |
Member
Location: Norway Join Date: Aug 2008
Posts: 35
|
![]()
Hi,
you could do a de novo assembly with several tools, such as SOAPdenovo, Velvet, Abyss, MIRA etc. and then use the contigs as a reference (separately or joined into a single sequence) to align your reads back to. Mosaik will output an assembly in Gigabayes format for SNP discovery. I have also used SOAP to align my reads back to contigs generated by SOAPdenovo, and then used MapView to view the alignment and find SNPs. Commercial software like Seqman NGen will do de novo assembly and SNP detection together from what I understand. Matt |
![]() |
![]() |
![]() |
#3 |
Member
Location: Montreal Join Date: Oct 2009
Posts: 63
|
![]()
I thought about this and, without any good reason, I wondered if any 'bias' or something of the sort would be added to the results since the reads used to build an assembly would be aligned to themselves.
Can't hurt trying though (except for a few lost CPU hours :-) ) thanks |
![]() |
![]() |
![]() |
#4 |
Member
Location: Norway Join Date: Aug 2008
Posts: 35
|
![]()
I can't think of any reason why this wouldn't work myself....but stand to be corrected
![]() Matt |
![]() |
![]() |
![]() |
#5 |
Junior Member
Location: Nebraska, USA Join Date: Jun 2009
Posts: 2
|
![]()
I am in the middle of trying this approach for SNP discovery. My starting material was normalized cDNA from several individuals. I used SSAKE for the assembly and maq to look for SNPs. I am hoping to test some of the putative SNPs soon.
|
![]() |
![]() |
![]() |
#6 |
Member
Location: it Join Date: Oct 2009
Posts: 40
|
![]()
Hi,
why cant you try using the ESTs as the reference for aligning.. |
![]() |
![]() |
![]() |
#7 |
Member
Location: Montreal Join Date: Oct 2009
Posts: 63
|
![]()
There are no ESTs on my fungi genome, as far as I know.
I tried MattB's approach and it seemed to work well. I have a bit too many snps compared to what would be expected, but the lab will validate a few as QC. |
![]() |
![]() |
![]() |
#8 |
Member
Location: Norway Join Date: Aug 2008
Posts: 35
|
![]()
I'd be suspicious about SNPs only found on the last one or two bases of your reads (I posted a separate thread on this), as they could well be remnants of adaptor sequence (adaptor trimming won't work when only one or few bases of adaptor are present on the ends of your reads).
|
![]() |
![]() |
![]() |
#9 |
Junior Member
Location: Memphis Join Date: Mar 2009
Posts: 6
|
![]()
Is there a need to obtain flanking sequence to design a genotyping assay? If so, how will you get sufficient flanking sequence if you are mapping short reads to the contig consensus seqs (assuming no reference genome).
|
![]() |
![]() |
![]() |
#10 |
Member
Location: Norway Join Date: Aug 2008
Posts: 35
|
![]()
Boonie, it depends on the type of genotyping assay (ie. number of SNPs) that are interested in. For the Illumina Infinium iSelect assay, Illumina specify minimum 50bp on EITHER side of the SNP for probe design, so short contigs in theory aren't such a problem (although it would be nice to have 50bp both sides so Illumina can pick the 'best' probe). For other genotyping applications like Sequenom iPlex, then you will need more flanking sequence on both sides..
|
![]() |
![]() |
![]() |
#11 | |
Junior Member
Location: Canada Join Date: Mar 2010
Posts: 1
|
![]()
This is great MattB.
I am trying to develop SNP from a de novo assembled EST library. How do you joined them contigs into a single sequence? Do you put them together according to some sort of order or just simply join all contig sequences? Thanks. Quote:
|
|
![]() |
![]() |
![]() |
#12 | |
Senior Member
Location: 41°17'49"N / 2°4'42"E Join Date: Oct 2008
Posts: 323
|
![]() Quote:
Let us know how it goes.
__________________
-drd |
|
![]() |
![]() |
![]() |
#13 |
Member
Location: Norway Join Date: Aug 2008
Posts: 35
|
![]()
We just joined the contigs in the order they were output by the denovo assember, so essentially at random. Since I posted that however, I have been using the CLC NGS Cell software to perform de novo assembly, reference guided alignment and SNP detection on the contigs separately...
![]() So naturally if the alignment/SNP detection software can handle thousands of separate contigs, then this is probably preferable, and makes life easier if you are BLASTing your assembled ESTs... Matt |
![]() |
![]() |
![]() |
#14 |
Member
Location: Cape Town Join Date: May 2009
Posts: 19
|
![]()
Hi, We are starting a project aiming to detect SNPs in a species without reference genome.
I also have thought to assembly my short reads de novo and use the obtained contigs as reference. From your experience, what is the best NGS technology for an approach like this? We are wondering between 454 Titanium and Solexa (75 bp reads). Then, how many individuals are necessary for a reliable SNPs detection? Thanks for you help! P |
![]() |
![]() |
![]() |
#15 |
Member
Location: Montreal Join Date: Oct 2009
Posts: 63
|
![]()
We worked with hybrid assemblies using the bigger PE 454 to builder bigger scaffolds (we used 8k because our lab had trouble with the 20k protocol) and we used illuminas 76 short insert PE to have bigger depth of coverge (we didn't use the 5k long inserts again because the lab had some trouble in the past).
We used wgs-celera to assemble and remapped the reads and used samtools to call the snps. It worked rather well. The drawback is in costs, since you need double the number of librairies. |
![]() |
![]() |
![]() |
#16 | |
Member
Location: Montreal Join Date: Oct 2009
Posts: 63
|
![]() Quote:
But starting from an assembly which won't be perfect to start with, I don't really know but it should probably be around the same. Actually you could use only one individual for the 454 run, and use all the individuals (separately) for the alignment part. Use individual A 454PE + individual A GAPE to assemble Use all individuals on that assembly to find snps. |
|
![]() |
![]() |
![]() |
#17 |
Member
Location: Norway Join Date: Aug 2008
Posts: 35
|
![]()
We will be using paired-end 75bp Illumina reads for our next project, since we believe the higher sequence output will outweigh the longer read lengths of 454. Ultimately, if you are just trying to identify SNPs more or less at random then you don't necessarily need big contigs, just enough to have sufficient flanking sequence.
Depth will of course be related to what you originally sequence, but I'd suggest transcriptome or reduced representation library sequencing to ensure adequate depth without resorting to huge amounts of sequencing. We have used 10-20 pooled individuals, I think it is reasonably important here that these individuals are representative of any downstream SNP genotyping that you have in mind (if that is what you plan to do). |
![]() |
![]() |
![]() |
#18 |
Member
Location: Montreal Join Date: Oct 2009
Posts: 63
|
![]()
I agree it depends on what you want.
In our case we wanted the assembly (we're working on finishing...the painful part), but if the only part of interest are snps, long PE aren't necessary like you mentioned. The transcriptome is fine for exonic snps, but if you're looking at regulatory or others, it's not really an option. |
![]() |
![]() |
![]() |
#19 |
Member
Location: Norway Join Date: Aug 2008
Posts: 35
|
![]()
yep, agree with lletourn that the optimal strategy very much depends on what type of SNPs you want to find and what you want to do with them afterwards
![]() |
![]() |
![]() |
![]() |
#20 |
Member
Location: Montreal Join Date: Oct 2009
Posts: 63
|
![]()
Again it depends (I hate that sentence and it keeps croping up).
The more individuals are pooled, the less you'll see rare snps except if you have higher coverage. But, the more 'frequent' snp in your population you'll see. If you want 'all' the snps between a ref and an individual, with a coverage around 30x you probably won't find false negatives using GA. But if you have 2 individual pooled, your reads a spread between them so you'll miss rarer snps. So if you want population genetics, pool away if you want a specific mutation for a phenotype (say ENU induced), don't pool. (this is extreme since you know only one individual has the mutation, but same goes for rare diseases). BTW, I never thanked you for the first reply...thanks :-) |
![]() |
![]() |
![]() |
Tags |
de novo, illumina, snp, snp discovery, solexa |
Thread Tools | |
|
|