SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
De novo SNP calling in absence of complete reference assembly fcr De novo discovery 15 09-21-2012 03:34 AM
RNA-seq SNP-calling without a complete reference shoegame2001 RNA Sequencing 6 07-04-2012 01:55 AM
Editing fasta , reference base in snp calling samtools moriah Bioinformatics 2 08-10-2011 12:11 AM
SNP calling from a reference sequence blackrabite Genomic Resequencing 2 05-21-2011 09:48 PM
reference-free SNP discovery Marius De novo discovery 5 03-30-2011 12:23 PM

Reply
 
Thread Tools
Old 12-20-2010, 07:13 AM   #1
Marius
8armed
 
Location: Germany

Join Date: Dec 2010
Posts: 29
Default Hierarchical reference-free SNP calling

Dear all,

I'm aware there are several similar questions posted already (some almost a bit too old regarding the fast growing possibilities in this field), but I'm wondering how you would solve my specific case in the most efficient way:

I have Illumina short reads from which I want to call SNPs WITHOUT
using a reference genome. What I have are reads that are defined by a specific restriction enzyme site in the genome of several individuals per population. And I have several populations. These defined loci are in average 25 times replicated per individual (25 reads per locus/ind.), what allows me to first find SNPs within an individual (heterozygote positions), then compare all individuals belonging to the same population (looking for WITHIN population SNPs) and ultimatively compare populations between each other (3 "hierarchical" steps). If possible I'd like to do this SNP-calling quality aware. One of the problems I see is to get consensus sequences for an individual without a reference. How I imagine this should be done by a program is to make stacks of reads that belong to the same locus in the genome (as I said, about 25 reads per locus in average). Since there will be heterozygous single nucleotides already within an individual, when collapsing these stacks to a consensus sequence, one should maybe use the ambiguity code for polymorphic sites.

Do you have suggestions (i.e. programs or a pipeline) for how to do this? Especially making such stacks and then get a consensus sequence without a reference would help a lot. Once I've done that for every individual, I could then again make stacks from the individual consensus sequences per population and compare these among the populations.

Thank you a lot for cour help,

Marius

Last edited by Marius; 12-20-2010 at 02:11 PM.
Marius is offline   Reply With Quote
Old 12-27-2010, 09:38 AM   #2
Marius
8armed
 
Location: Germany

Join Date: Dec 2010
Posts: 29
Default

As it seems, what would be best is to do a denovo Assembly (contigs) with all (of all the individuals) my reads (I expect about 40'000 loci), so I'd get about 40'000 contigs, and then use these as a reference to do consensus calling for each individual, since I have many replicates for each locus per individual to check for heterozygot positions.

What would be the best assembler to do that? Nice features to have would be:
-I'd like only to regard good quality reads as "true" reads and only use these to build these contigs (so there should be some kind of quality filtering before contig-building).
-Once I have these 40'000 contigs (I guess I would name them simply with the numbers 1 to 40'000 or so), I'd like to use these to align the reads of every single individual to these contigs to call for the individual consensus-sequences for every locus (can be heterozygous). Therefore, when building the contigs I will have many many reads for every locus (thze sum of all replicates of all the individuals of one locus), which will have different alleles (SNPs) already. So the contig-sequences (concensus of all these biol. replicates) should make a "N" at these positions, which will allow all variants to align to this locus later on correctly when I call for the individual-consensus sequences.
Marius is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:15 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO