Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
find areas of very low and extra high coverage in NGS data willMD Bioinformatics 4 04-04-2013 10:33 AM
Better accuracy in assembling, and SNP calling in, low-complexity sequence regions? Genomics101 Bioinformatics 2 08-30-2012 02:18 AM
genotype calling & low coverage pifpaf Bioinformatics 0 02-29-2012 11:04 PM
SliderII: High Quality SNP Calling Using Illumina Data at Shallow Coverage nmalhis Bioinformatics 10 10-09-2011 11:49 PM
New Paper: High Quality SNP Calling Using Illumina Data at Shallow Coverage nmalhis Bioinformatics 0 03-01-2010 02:40 PM

Thread Tools
Old 02-13-2014, 03:21 AM   #1
Location: South Africa

Join Date: Jun 2013
Posts: 25
Default SNP calling on low coverage NGS data


I am new to SNP calling and would appreciate some input. I want to try and call SNP's for a specific organisms. The hurdles I have is, I do not have a reference genome to map against and I have a very low coverage 4x per organisms.

Any suggestions would be appreciated.
JdeBruin is offline   Reply With Quote
Old 02-13-2014, 04:27 AM   #2
Senior Member
Location: Budapest

Join Date: Mar 2010
Posts: 329

The problem is the following: If you do not have reference genome, you need a de-novo assembly. You can find heterozygous positions in de-novo (which can be called SNPs), but in you case only two reads support this hipothesis.
So, use this sequences to create a reference and sequence other samples to find SNPs or make a higher coverage sequencing from the initial sample.
TiborNagy is offline   Reply With Quote
Old 02-13-2014, 07:48 AM   #3
Registered Vendor
Location: Eugene, OR

Join Date: May 2013
Posts: 521

4X read depth can be even more misleading, since at heterozygous loci you will sometimes get 2 reads per allele, but also 3 reads to one allele vs 1 read for the other, or even 4 reads to one allele and no reads for the other. On the other hand, any allele supported by a few reads is probably real; you just can't fully genotype the sample but you can identify SNPs.

Genotyping by sequencing was developed to solve this problem. By sequencing only 10,000-100,000 loci across a genome, the reads attain good depth with a small amount of sequencing. We do genotyping by sequencing, as do other companies and academic cores. You don't need a reference for the informatics, but rather the reads can just be compared from sample to sample using a core set of reads representing the loci as a reference.
Providing nextRAD genotyping and PacBio sequencing services.
SNPsaurus is offline   Reply With Quote
Old 02-13-2014, 09:06 AM   #4
Senior Member
Location: San Diego

Join Date: May 2008
Posts: 912

You are in trouble. 4x is not enough for de novo assembly, and it is not really enough for calling homozygous SNPs, let alone heterozygous ones.

If these are all the same species, I'd combine all the reads together, and do de novo assembly on that. Then align each sample's reads to that assembly, call SNPs on those alignments, and realize that you are going to miss most of the heterozygous SNPs, and many homozygous ones too.
swbarnes2 is offline   Reply With Quote
Old 04-22-2015, 01:01 AM   #5
Naibin Duan
Junior Member
Location: ithaca

Join Date: Jan 2014
Posts: 3
Smile 4X coveragedata without reference is almost impossible to call

In my opinion,4X coverage reseq-data without any reference is almost impossible to call SNP.
Naibin Duan is offline   Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 12:07 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO