Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • SOAPsnp - SNP information file

    SOAPsnp is able to take a SNP information file for use in calling SNPs. Unfortunately, the format of this file seems to be specific to SOAPsnp. I can pull data from NCBI's dbSNP archive but this file is in a different format AND does not contain all the information that SOAPsnp takes as input. I can parse the dbSNP files (one for each chromosome) and write to the SOAPsnp format but if this can be pulled from somewhere I don't know of, that would be preferred.

    Does anyone use the -s option when using SOAPsnp?
    If you use it, did you have to generate your own SNP info file?

    Also, it looks like some of the attributes that SOAPsnp takes can be "zeroed" out if the data is unknown - e.g.
    frequency of A,
    frequency of C,
    frequency of T,
    frequency of G
    ...or given a default value if unknown - e.g.
    whether the SNP is validated by experiment (1 is true, 0 is false),
    whether the SNP is actually an indel (1 is true, 0 is false).

    This is not the method I would prefer but I guess its better than not using the -s option at all.

    Thanks,
    Dan
    Last edited by dmb; 02-15-2011, 01:10 PM.

  • #2
    follow up

    To those that might be interested...

    I found somewhat of a solution. I used the Table Browser from UCSC to extract SNP data using the following parameters.
    group: All Tables
    database: hg19
    table: snp131
    region: genome
    I output this to a file (tab-delim) that has the following header information:
    #bin
    chrom
    chromStart
    chromEnd
    name
    score
    strand
    refNCBI
    refUCSC
    observed
    molType
    classvalid
    avHet
    avHetSE
    func
    locType
    weight

    One of the rows of data looks like this:
    585 chr1 10491 10492 rs55998931 0 + C C C/T genomic single unknown 0 0 near-gene-5 exact 1

    Using some simple R code I should be able to create the SOAPsnp SNP info file.

    If anyone has another method, I would love to hear about it.

    Comment


    • #3
      the file from UCSC has 26,033,054 rows.

      Comment


      • #4
        As far as I know (form my experience)
        the format of dbsnp file which soapsnp needs is:
        chromosome name: should same with reference chr name
        position:
        whether including allele frequency: (0/1) 0:no 1:yes
        whether this mark be validate: (0/1)
        is this mark is a indel?: (0/1)
        frequency of A,
        frequency of C,
        frequency of T,
        frequency of G
        snp ‘s number

        so you’d better not use “-“ , the SoapSNP may cannot realize it( the bug of soapsnp, may be ,I guess)

        Comment


        • #5
          Originally posted by dmb View Post
          SOAPsnp is able to take a SNP information file for use in calling SNPs. Unfortunately, the format of this file seems to be specific to SOAPsnp. I can pull data from NCBI's dbSNP archive but this file is in a different format AND does not contain all the information that SOAPsnp takes as input. I can parse the dbSNP files (one for each chromosome) and write to the SOAPsnp format but if this can be pulled from somewhere I don't know of, that would be preferred.

          Does anyone use the -s option when using SOAPsnp?
          If you use it, did you have to generate your own SNP info file?

          Also, it looks like some of the attributes that SOAPsnp takes can be "zeroed" out if the data is unknown - e.g.
          frequency of A,
          frequency of C,
          frequency of T,
          frequency of G
          ...or given a default value if unknown - e.g.
          whether the SNP is validated by experiment (1 is true, 0 is false),
          whether the SNP is actually an indel (1 is true, 0 is false).

          This is not the method I would prefer but I guess its better than not using the -s option at all.

          Thanks,
          Dan
          I have the same promoble!

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin




            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
            04-22-2024, 07:01 AM
          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 11:49 AM
          0 responses
          15 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-24-2024, 08:47 AM
          0 responses
          16 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          61 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          60 views
          0 likes
          Last Post seqadmin  
          Working...
          X