SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
find areas of very low and extra high coverage in NGS data willMD Bioinformatics 4 04-04-2013 10:33 AM
Better accuracy in assembling, and SNP calling in, low-complexity sequence regions? Genomics101 Bioinformatics 2 08-30-2012 02:18 AM
genotype calling & low coverage pifpaf Bioinformatics 0 02-29-2012 11:04 PM
SliderII: High Quality SNP Calling Using Illumina Data at Shallow Coverage nmalhis Bioinformatics 10 10-09-2011 11:49 PM
New Paper: High Quality SNP Calling Using Illumina Data at Shallow Coverage nmalhis Bioinformatics 0 03-01-2010 02:40 PM

Reply
 
Thread Tools
Old 02-13-2014, 03:21 AM   #1
JdeBruin
Member
 
Location: South Africa

Join Date: Jun 2013
Posts: 25
Default SNP calling on low coverage NGS data

Hi,

I am new to SNP calling and would appreciate some input. I want to try and call SNP's for a specific organisms. The hurdles I have is, I do not have a reference genome to map against and I have a very low coverage 4x per organisms.

Any suggestions would be appreciated.
JdeBruin is offline   Reply With Quote
Old 02-13-2014, 04:27 AM   #2
TiborNagy
Senior Member
 
Location: Budapest

Join Date: Mar 2010
Posts: 329
Default

The problem is the following: If you do not have reference genome, you need a de-novo assembly. You can find heterozygous positions in de-novo (which can be called SNPs), but in you case only two reads support this hipothesis.
So, use this sequences to create a reference and sequence other samples to find SNPs or make a higher coverage sequencing from the initial sample.
TiborNagy is offline   Reply With Quote
Old 02-13-2014, 07:48 AM   #3
SNPsaurus
Registered Vendor
 
Location: Eugene, OR

Join Date: May 2013
Posts: 521
Default

4X read depth can be even more misleading, since at heterozygous loci you will sometimes get 2 reads per allele, but also 3 reads to one allele vs 1 read for the other, or even 4 reads to one allele and no reads for the other. On the other hand, any allele supported by a few reads is probably real; you just can't fully genotype the sample but you can identify SNPs.

Genotyping by sequencing was developed to solve this problem. By sequencing only 10,000-100,000 loci across a genome, the reads attain good depth with a small amount of sequencing. We do genotyping by sequencing, as do other companies and academic cores. You don't need a reference for the informatics, but rather the reads can just be compared from sample to sample using a core set of reads representing the loci as a reference.
__________________
Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com
SNPsaurus is offline   Reply With Quote
Old 02-13-2014, 09:06 AM   #4
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

You are in trouble. 4x is not enough for de novo assembly, and it is not really enough for calling homozygous SNPs, let alone heterozygous ones.

If these are all the same species, I'd combine all the reads together, and do de novo assembly on that. Then align each sample's reads to that assembly, call SNPs on those alignments, and realize that you are going to miss most of the heterozygous SNPs, and many homozygous ones too.
swbarnes2 is offline   Reply With Quote
Old 04-22-2015, 01:01 AM   #5
Naibin Duan
Junior Member
 
Location: ithaca

Join Date: Jan 2014
Posts: 3
Smile 4X coveragedata without reference is almost impossible to call

In my opinion,4X coverage reseq-data without any reference is almost impossible to call SNP.
Best.
Naibin
Naibin Duan is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:07 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO