Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • donquijotes
    Junior Member
    • Jul 2015
    • 7

    dbSNP database for just NA12878

    So I may be completely wrong here but please correct me since I'm mostly used to doing RNA splicing analysis.

    I have sequenced sheared DNA from about 10 NA12878 cells using Illumina and a library prep that uses a non proof reading polymerase so we expect it to introduce lots of errors even early on. What I would like to do is figure out how many false positives it gives me (Allele frequency of >=15%.
    To do that I need the NA12878 ref genome, and it's dbSNP database for heterozygous loci. Right? If I just align to NA12878 then the het loci that are endogenous "SNPs" would look as false positives. Where can I find the NA12878 specific dbSNP?

    If there is another more logical way of doing this analysis please feel free to call me out.

    Thank you!
  • donquijotes
    Junior Member
    • Jul 2015
    • 7

    #2
    I've actually found this thread that might help.

    Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

    Two files that I found useful:

    ftp://ftp.1000genomes.ebi.ac.uk/vol1...populations.md
    shows the population that the vcf files came from. The NA12878 should be the CEU CEPH Utah residents (CEPH) with Northern and Western European ancestry

    ftp://ftp.1000genomes.ebi.ac.uk/vol1.../2010_07/trio/
    has all the different datasets

    If someone else has better options or suggestions please let me know

    Comment

    • sklages
      Senior Member
      • May 2008
      • 628

      #3
      This might be interesting for you: http://www.illumina.com/platinumgenomes/

      Comment

      • donquijotes
        Junior Member
        • Jul 2015
        • 7

        #4
        Thank you sklages! It looks that people update these files frequently so the databases should be way better than the 2010 version the 1000genomes pilot study offers.

        Comment

        • HESmith
          Senior Member
          • Oct 2009
          • 512

          #5
          Note that some of your false-positives will be alignment artifacts rather than polymerase errors. If it's important to discriminate those classes, see this paper.

          Comment

          • donquijotes
            Junior Member
            • Jul 2015
            • 7

            #6
            HESmith, That was a fantastic paper, thank you for sharing!

            Comment

            Latest Articles

            Collapse

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by SEQadmin2, 06-05-2026, 10:09 AM
            0 responses
            14 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-04-2026, 08:59 AM
            0 responses
            24 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-02-2026, 12:03 PM
            0 responses
            28 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-02-2026, 11:40 AM
            0 responses
            22 views
            0 reactions
            Last Post SEQadmin2  
            Working...