Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Need help identifying SNPs and allele frequencies

    Hi everyone,

    Long-time reader, first-time poster looking for advice.

    I have 100 bp, SE Illumina data (~500x coverage) and a short (~150 kb) reference genome. My difficulties stem from the fact that my DNA isn't from a single individual. Instead, it is a pool of an unknown number (but we're talking lots) of individuals. My goal is to identify SNPs, and accurately quantify allele frequencies at these sites.

    Currently, I'm mapping my reads back to the reference genome using 'bowtie.' But mapping back to a reference is almost certainly biasing my allele frequencies in favor of the reference genome. Does anyone have a suggestion for alternative methods that eliminate or correct for this bias? I've considered de novo assembly (i.e. velvet) but I've been told that pooled DNA causes velvet problems.

    I also have strong evidence of reads mis-mapping in some regions. I tried throwing out reads that map to multiple regions, but that didn't seem to solve the problem. Is there a technique for identifying mis-mapped reads, or to post-hoc identify problematic regions?

    Thanks for any thoughts/suggestions you may have,
    Dave
    Last edited by dkenned1; 11-06-2011, 01:28 PM. Reason: posted to wrong section

  • #2
    It looks like you have two issues here. One is genotyping SNPs in a pooled DNA samples. There are lots of threads on seqanswers you can look at but here is one in particular:

    Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


    The second issue regarding mapping is more complex. Without really knowing anything about your project or your data I would at least suggest you try a couple different alignment tools to see if the problem is aligner-independent. For example BWA and STAMPY both are well regarded for accurately mapping reads.

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Strategies for Sequencing Challenging Samples
      by seqadmin


      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
      03-22-2024, 06:39 AM
    • seqadmin
      Techniques and Challenges in Conservation Genomics
      by seqadmin



      The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

      Avian Conservation
      Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
      03-08-2024, 10:41 AM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, Yesterday, 06:37 PM
    0 responses
    11 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, Yesterday, 06:07 PM
    0 responses
    10 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 03-22-2024, 10:03 AM
    0 responses
    51 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 03-21-2024, 07:32 AM
    0 responses
    68 views
    0 likes
    Last Post seqadmin  
    Working...
    X