Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • SNPs from populations

    Hi,
    I'm a cetacean population geneticist in San Diego, CA, and I'm trying to use Illumina resequencing of nuclear loci (from capture-array enriched libraries) from 50-150 individuals to detect SNPs in the population, and determine the genotypes of the individuals. I use CLC Genomics Workbench to analyze the data, but have found that it (and other software that I've looked at) is focused only on finding SNPs in individual alignments to a reference sequence, not comparing across individual assemblies.

    Does anyone have suggestions for how I can 1) assemble all of my reads from individuals to a set of reference sequences (CLC does this), 2) do SNP discovery based on variation in reads from the reference for each individual and each locus (CLC also does this), and then 3) compare the SNP positions across all individuals and loci to generate tables of SNP genotypes for each individual and position within each locus?

    Thanks!

  • #2
    This is easy to do with some simple Perl/Python or other favorite language code, given the information you describe in #2 written into simple text files.

    If you convert your genotypes to "1", "2", "3" (AA, AB & BB respectively) & have it all in a single table with columns
    markerName, individual, genotype

    then you can get a consolidated view in Excel (or similar program) using PivotTables (rows=individuals; columns=markers; values=average of genotype (ugly trick; since there is only one value you'll get that value)

    If CLC generates such text files, it shouldn't be too bad to manually merge them & fix them, though it would easy to learn the necessary Perl/Python to do this.

    Comment


    • #3
      ->krobson, We've tried to compile data with R scripts, but the problem is that the SNP detection software will find different SNPs in different individuals, and there is also associated SNP quality data that I'd like to compile to be able to tell whether a SNP is likely to be valid or not. For example, one individual may have a SNP at site 150, but the next will have a SNP at site 175, and so on. There's no way to know all of the sites in a sequence that have SNPs in the population of samples unless the assemblies are all aligned and each SNP site in any one assembly is evaluated in every other assembly. If we use scripts to assemble the data from individual assemblies, we have to assume that SNPs that aren't detected in an individual are homozygous for the reference allele, but that's a bad assumption when quality and coverage varies across individuals.

      Comment


      • #4
        (deleted due to misunderstanding of question)
        Last edited by kopi-o; 05-11-2011, 12:38 AM.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM
        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 06:37 PM
        0 responses
        10 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Yesterday, 06:07 PM
        0 responses
        9 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-22-2024, 10:03 AM
        0 responses
        50 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-21-2024, 07:32 AM
        0 responses
        67 views
        0 likes
        Last Post seqadmin  
        Working...
        X