SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
help: get the SV data set of NA12878 Guangzhu Bioinformatics 0 08-07-2014 02:57 AM
database snpeff vs dbsnp in variant annotation bongbimit Bioinformatics 0 04-05-2014 10:40 PM
help finding indels, etc. in dbSNP database! adaptivegenome Bioinformatics 0 03-26-2013 06:33 PM
NA12878 truth sets? brofallon Bioinformatics 1 11-09-2012 10:02 AM
Truth set for NA12878 SNVs alonie Bioinformatics 3 09-20-2012 10:53 PM

Reply
 
Thread Tools
Old 06-16-2016, 08:34 AM   #1
donquijotes
Junior Member
 
Location: Michigan

Join Date: Jul 2015
Posts: 7
Default dbSNP database for just NA12878

So I may be completely wrong here but please correct me since I'm mostly used to doing RNA splicing analysis.

I have sequenced sheared DNA from about 10 NA12878 cells using Illumina and a library prep that uses a non proof reading polymerase so we expect it to introduce lots of errors even early on. What I would like to do is figure out how many false positives it gives me (Allele frequency of >=15%.
To do that I need the NA12878 ref genome, and it's dbSNP database for heterozygous loci. Right? If I just align to NA12878 then the het loci that are endogenous "SNPs" would look as false positives. Where can I find the NA12878 specific dbSNP?

If there is another more logical way of doing this analysis please feel free to call me out.

Thank you!
donquijotes is offline   Reply With Quote
Old 06-16-2016, 08:57 AM   #2
donquijotes
Junior Member
 
Location: Michigan

Join Date: Jul 2015
Posts: 7
Default

I've actually found this thread that might help.

http://seqanswers.com/forums/showthread.php?t=23093
Two files that I found useful:

ftp://ftp.1000genomes.ebi.ac.uk/vol1...populations.md
shows the population that the vcf files came from. The NA12878 should be the CEU CEPH Utah residents (CEPH) with Northern and Western European ancestry

ftp://ftp.1000genomes.ebi.ac.uk/vol1.../2010_07/trio/
has all the different datasets

If someone else has better options or suggestions please let me know
donquijotes is offline   Reply With Quote
Old 06-20-2016, 06:11 AM   #3
sklages
Senior Member
 
Location: Berlin, DE

Join Date: May 2008
Posts: 628
Default

This might be interesting for you: http://www.illumina.com/platinumgenomes/
sklages is offline   Reply With Quote
Old 06-20-2016, 06:55 AM   #4
donquijotes
Junior Member
 
Location: Michigan

Join Date: Jul 2015
Posts: 7
Default

Thank you sklages! It looks that people update these files frequently so the databases should be way better than the 2010 version the 1000genomes pilot study offers.
donquijotes is offline   Reply With Quote
Old 06-20-2016, 07:43 AM   #5
HESmith
Senior Member
 
Location: Bethesda MD

Join Date: Oct 2009
Posts: 509
Default

Note that some of your false-positives will be alignment artifacts rather than polymerase errors. If it's important to discriminate those classes, see this paper.
HESmith is offline   Reply With Quote
Old 06-28-2016, 09:06 AM   #6
donquijotes
Junior Member
 
Location: Michigan

Join Date: Jul 2015
Posts: 7
Default

HESmith, That was a fantastic paper, thank you for sharing!
donquijotes is offline   Reply With Quote
Reply

Tags
allele frequency, dbsnp, false positives, na12878

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:02 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO