SEQanswers

Go Back   SEQanswers > Applications Forums > De novo discovery



Similar Threads
Thread Thread Starter Forum Replies Last Post
CLC and SNP discovery extari Bioinformatics 9 04-15-2011 01:32 AM
reference-free SNP discovery Marius De novo discovery 5 03-30-2011 11:23 AM
PubMed: SNP discovery by transcriptome pyrosequencing. Newsbot! Literature Watch 0 03-03-2011 02:00 AM
HELP: some suggestions for SNP discovery in 454? linikujp Bioinformatics 1 04-07-2010 12:39 AM
Nonsynonymous SNP (nsSNP) discovery tools? jpeaco02 Bioinformatics 2 11-08-2009 01:13 PM

Reply
 
Thread Tools
Old 11-04-2009, 06:20 AM   #1
lletourn
Member
 
Location: Montreal

Join Date: Oct 2009
Posts: 63
Default Snp discovery without a reference

I have paired-end (76bp) output from a GA in which I would like to try snp discovery. The hiccup is there is no reference genome for my specie.

Does anyone have any ideas, or know any tool that could do this?

Most of the tools that do snp discovery well, use a pre aligned dataset to work on. If I were to assemble the data, is there something that could to ace->(snp discovery tool format) to do the work?

Thanks
lletourn is offline   Reply With Quote
Old 11-06-2009, 06:49 PM   #2
MattB
Member
 
Location: Norway

Join Date: Aug 2008
Posts: 35
Default

Hi,

you could do a de novo assembly with several tools, such as SOAPdenovo, Velvet, Abyss, MIRA etc. and then use the contigs as a reference (separately or joined into a single sequence) to align your reads back to. Mosaik will output an assembly in Gigabayes format for SNP discovery. I have also used SOAP to align my reads back to contigs generated by SOAPdenovo, and then used MapView to view the alignment and find SNPs.

Commercial software like Seqman NGen will do de novo assembly and SNP detection together from what I understand.

Matt
MattB is offline   Reply With Quote
Old 11-09-2009, 04:21 AM   #3
lletourn
Member
 
Location: Montreal

Join Date: Oct 2009
Posts: 63
Default

I thought about this and, without any good reason, I wondered if any 'bias' or something of the sort would be added to the results since the reads used to build an assembly would be aligned to themselves.

Can't hurt trying though (except for a few lost CPU hours :-) )

thanks
lletourn is offline   Reply With Quote
Old 11-09-2009, 04:28 AM   #4
MattB
Member
 
Location: Norway

Join Date: Aug 2008
Posts: 35
Default

I can't think of any reason why this wouldn't work myself....but stand to be corrected In fact, I think it makes for an interesting comparison between the denovo assembly program and parameters that are used in that to the corresponding parameters in the reference guided assembler.

Matt
MattB is offline   Reply With Quote
Old 11-13-2009, 08:43 AM   #5
Nick Miller
Junior Member
 
Location: Nebraska, USA

Join Date: Jun 2009
Posts: 2
Default

I am in the middle of trying this approach for SNP discovery. My starting material was normalized cDNA from several individuals. I used SSAKE for the assembly and maq to look for SNPs. I am hoping to test some of the putative SNPs soon.
Nick Miller is offline   Reply With Quote
Old 11-16-2009, 01:57 AM   #6
bioenvisage
Member
 
Location: it

Join Date: Oct 2009
Posts: 40
Default

Hi,


why cant you try using the ESTs as the reference for aligning..
bioenvisage is offline   Reply With Quote
Old 11-16-2009, 04:47 AM   #7
lletourn
Member
 
Location: Montreal

Join Date: Oct 2009
Posts: 63
Default

There are no ESTs on my fungi genome, as far as I know.

I tried MattB's approach and it seemed to work well. I have a bit too many snps compared to what would be expected, but the lab will validate a few as QC.
lletourn is offline   Reply With Quote
Old 11-16-2009, 04:52 AM   #8
MattB
Member
 
Location: Norway

Join Date: Aug 2008
Posts: 35
Default

I'd be suspicious about SNPs only found on the last one or two bases of your reads (I posted a separate thread on this), as they could well be remnants of adaptor sequence (adaptor trimming won't work when only one or few bases of adaptor are present on the ends of your reads).
MattB is offline   Reply With Quote
Old 11-18-2009, 12:29 PM   #9
Boonie
Junior Member
 
Location: Memphis

Join Date: Mar 2009
Posts: 6
Default

Is there a need to obtain flanking sequence to design a genotyping assay? If so, how will you get sufficient flanking sequence if you are mapping short reads to the contig consensus seqs (assuming no reference genome).
Boonie is offline   Reply With Quote
Old 11-18-2009, 10:18 PM   #10
MattB
Member
 
Location: Norway

Join Date: Aug 2008
Posts: 35
Default

Boonie, it depends on the type of genotyping assay (ie. number of SNPs) that are interested in. For the Illumina Infinium iSelect assay, Illumina specify minimum 50bp on EITHER side of the SNP for probe design, so short contigs in theory aren't such a problem (although it would be nice to have 50bp both sides so Illumina can pick the 'best' probe). For other genotyping applications like Sequenom iPlex, then you will need more flanking sequence on both sides..
MattB is offline   Reply With Quote
Old 03-02-2010, 04:38 PM   #11
little_beetle
Junior Member
 
Location: Canada

Join Date: Mar 2010
Posts: 1
Default

This is great MattB.
I am trying to develop SNP from a de novo assembled EST library.
How do you joined them contigs into a single sequence? Do you put them together according to some sort of order or just simply join all contig sequences?
Thanks.

Quote:
Originally Posted by MattB View Post
Hi,

you could do a de novo assembly with several tools, such as SOAPdenovo, Velvet, Abyss, MIRA etc. and then use the contigs as a reference (separately or joined into a single sequence) to align your reads back to. Mosaik will output an assembly in Gigabayes format for SNP discovery. I have also used SOAP to align my reads back to contigs generated by SOAPdenovo, and then used MapView to view the alignment and find SNPs.

Commercial software like Seqman NGen will do de novo assembly and SNP detection together from what I understand.

Matt
little_beetle is offline   Reply With Quote
Old 03-02-2010, 06:31 PM   #12
drio
Senior Member
 
Location: 4117'49"N / 24'42"E

Join Date: Oct 2008
Posts: 323
Default

Quote:
Originally Posted by MattB View Post
Hi,

you could do a de novo assembly with several tools, such as SOAPdenovo, Velvet, Abyss, MIRA etc. and then use the contigs as a reference (separately or joined into a single sequence) to align your reads back to. Mosaik will output an assembly in Gigabayes format for SNP discovery. I have also used SOAP to align my reads back to contigs generated by SOAPdenovo, and then used MapView to view the alignment and find SNPs.

Commercial software like Seqman NGen will do de novo assembly and SNP detection together from what I understand.

Matt
Once you have your de novo assembly treat that as your reference (as MattB is saying here). After that, remap the reads back to the "new" reference and pileup the alignments. Finally you can setup your filters to try to get the best snps possible.

Let us know how it goes.
__________________
-drd
drio is offline   Reply With Quote
Old 03-02-2010, 10:44 PM   #13
MattB
Member
 
Location: Norway

Join Date: Aug 2008
Posts: 35
Default

We just joined the contigs in the order they were output by the denovo assember, so essentially at random. Since I posted that however, I have been using the CLC NGS Cell software to perform de novo assembly, reference guided alignment and SNP detection on the contigs separately...

So naturally if the alignment/SNP detection software can handle thousands of separate contigs, then this is probably preferable, and makes life easier if you are BLASTing your assembled ESTs...

Matt
MattB is offline   Reply With Quote
Old 03-25-2010, 02:09 AM   #14
pfranchini
Member
 
Location: Cape Town

Join Date: May 2009
Posts: 19
Default

Hi, We are starting a project aiming to detect SNPs in a species without reference genome.
I also have thought to assembly my short reads de novo and use the obtained contigs as reference.
From your experience, what is the best NGS technology for an approach like this? We are wondering between 454 Titanium and Solexa (75 bp reads).
Then, how many individuals are necessary for a reliable SNPs detection?
Thanks for you help!
P
pfranchini is offline   Reply With Quote
Old 03-25-2010, 04:42 AM   #15
lletourn
Member
 
Location: Montreal

Join Date: Oct 2009
Posts: 63
Default

We worked with hybrid assemblies using the bigger PE 454 to builder bigger scaffolds (we used 8k because our lab had trouble with the 20k protocol) and we used illuminas 76 short insert PE to have bigger depth of coverge (we didn't use the 5k long inserts again because the lab had some trouble in the past).

We used wgs-celera to assemble and remapped the reads and used samtools to call the snps.

It worked rather well. The drawback is in costs, since you need double the number of librairies.
lletourn is offline   Reply With Quote
Old 03-25-2010, 04:47 AM   #16
lletourn
Member
 
Location: Montreal

Join Date: Oct 2009
Posts: 63
Default

Quote:
Originally Posted by pfranchini View Post
how many individuals are necessary for a reliable SNPs detection?
P
I'm still not sure about the right answer here. For mapping to a ref, to eliminate many of the false positives, I would say to go as high as 25x-30x (for hets, for homozygous, lower would still be good).

But starting from an assembly which won't be perfect to start with, I don't really know but it should probably be around the same.

Actually you could use only one individual for the 454 run, and use all the individuals (separately) for the alignment part.

Use individual A 454PE + individual A GAPE to assemble
Use all individuals on that assembly to find snps.
lletourn is offline   Reply With Quote
Old 03-25-2010, 05:06 AM   #17
MattB
Member
 
Location: Norway

Join Date: Aug 2008
Posts: 35
Default

We will be using paired-end 75bp Illumina reads for our next project, since we believe the higher sequence output will outweigh the longer read lengths of 454. Ultimately, if you are just trying to identify SNPs more or less at random then you don't necessarily need big contigs, just enough to have sufficient flanking sequence.

Depth will of course be related to what you originally sequence, but I'd suggest transcriptome or reduced representation library sequencing to ensure adequate depth without resorting to huge amounts of sequencing.

We have used 10-20 pooled individuals, I think it is reasonably important here that these individuals are representative of any downstream SNP genotyping that you have in mind (if that is what you plan to do).
MattB is offline   Reply With Quote
Old 03-25-2010, 05:11 AM   #18
lletourn
Member
 
Location: Montreal

Join Date: Oct 2009
Posts: 63
Default

I agree it depends on what you want.

In our case we wanted the assembly (we're working on finishing...the painful part), but if the only part of interest are snps, long PE aren't necessary like you mentioned.

The transcriptome is fine for exonic snps, but if you're looking at regulatory or others, it's not really an option.
lletourn is offline   Reply With Quote
Old 03-25-2010, 05:15 AM   #19
MattB
Member
 
Location: Norway

Join Date: Aug 2008
Posts: 35
Default

yep, agree with lletourn that the optimal strategy very much depends on what type of SNPs you want to find and what you want to do with them afterwards
MattB is offline   Reply With Quote
Old 03-25-2010, 05:16 AM   #20
lletourn
Member
 
Location: Montreal

Join Date: Oct 2009
Posts: 63
Default

Quote:
Originally Posted by MattB View Post
We have used 10-20 pooled individuals
Again it depends (I hate that sentence and it keeps croping up).

The more individuals are pooled, the less you'll see rare snps except if you have higher coverage.

But, the more 'frequent' snp in your population you'll see.

If you want 'all' the snps between a ref and an individual, with a coverage around 30x you probably won't find false negatives using GA.

But if you have 2 individual pooled, your reads a spread between them so you'll miss rarer snps.

So if you want population genetics, pool away
if you want a specific mutation for a phenotype (say ENU induced), don't pool. (this is extreme since you know only one individual has the mutation, but same goes for rare diseases).

BTW, I never thanked you for the first reply...thanks :-)
lletourn is offline   Reply With Quote
Reply

Tags
de novo, illumina, snp, snp discovery, solexa

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:13 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO