Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Maone
    Junior Member
    • Jun 2011
    • 4

    Qs in exome sequencing data analysis

    Hello everyone,

    I am Maone, just start my project in exome sequencing + filter analysis to investigate causitive gene for a rare genetic disorder. I have no background in Linux.

    So now we have got our exome sequencing result back and we would like to narrow the variants down by filters like: exclude common varients in dpSNP, only express in muscle...etc. Could anyone give me some advices on what should I do, like:
    1. Is there any software windows-based but not Linux or Linux is a must-have if I want to continue my project?
    2. If solely for this project, what is the extent of Linux I have to learn? Could anyone recall some good books or website to start with?
    3. Where can I access these database such as muscular genes, dpSNP, fibroblast genes?

    Thank you.

    Maone
    Last edited by Maone; 06-15-2011, 11:00 AM.
  • ulz_peter
    Senior Member
    • Feb 2010
    • 219

    #2
    Quite a lot of questions at once. If you've got a Linux computer in reach try to get some basic command line knowledge (google is your friend)

    If you've got your SNPs in anyone of these formats: MAQ, GFF, VCF or CASAVA you could use the SeattleSeq SNP annotation tool (http://gvs.gs.washington.edu/SeattleSeqAnnotation/). That gives you annotations like dbSNP, conservation, HAPMAP frequencies, genes and a lot more. The result is stored as csv file (as far as I remember right) and you can download it and filter the SNPs using Excel.

    As the gene names are stored as well you could search for muscle-associated genes in the literature and screen the result for that gene...

    I think thats the most easy way of doing it without looking too deeply into Linux (which I actually would recommend anyone dealing with NGS data as long as they got time to do that)

    Comment

    • Maone
      Junior Member
      • Jun 2011
      • 4

      #3
      Originally posted by ulz_peter View Post
      Quite a lot of questions at once. If you've got a Linux computer in reach try to get some basic command line knowledge (google is your friend)

      If you've got your SNPs in anyone of these formats: MAQ, GFF, VCF or CASAVA you could use the SeattleSeq SNP annotation tool (http://gvs.gs.washington.edu/SeattleSeqAnnotation/). That gives you annotations like dbSNP, conservation, HAPMAP frequencies, genes and a lot more. The result is stored as csv file (as far as I remember right) and you can download it and filter the SNPs using Excel.

      As the gene names are stored as well you could search for muscle-associated genes in the literature and screen the result for that gene...

      I think thats the most easy way of doing it without looking too deeply into Linux (which I actually would recommend anyone dealing with NGS data as long as they got time to do that)
      Thank you very much for your advices.

      Actually, I have got the CSV file exported from DNAnexus.com using their nucleotide-level variation with settings as Genome: hg18, Gene annotations: RefSeq Genes.

      As taking your advice, I opened the CSV file with Excel and went through the data. Now I get some new questions in interpreting the data:
      1. In column of "where_in_transcript", I have CDS, non-coding exon, introns, upstream and downstream, UTRs. If I am only looking for exon mutation, should I look solely in CDS?
      2. For some variants, I got duplicates having the same Var_index with the only difference in "transcript_name"
      eg: NM_002026 NM_054034 NM_212474 NM_212475 NM_212476 NM_212478 are all for FN1 transcript variants
      Is it the general way to count them as one variant on a gene?
      3. In the name of columns, do "var_seq1" and "var_seq2" mean Homo or Hetro variants? I found out if they are same the zygosity of the variants is Homo, otherwise it is Hetro.
      Please bear my dumb questions, I only start my learning.

      Thanks again

      Comment

      • ulz_peter
        Senior Member
        • Feb 2010
        • 219

        #4
        I actually have never workd with data from DNAnexus so I can't really help you with that. Didn't it come with a manual? That should explain everything.

        Be sure not to discard intronic SNPs too fast, they could contain a splice site mutation.
        I guess the duplicates in the file are just the SNPs found in the different isoforms of the same gene but in the same genomic location.
        No idea about the var_seq1 and the var_seq2 columns...

        Comment

        • Maone
          Junior Member
          • Jun 2011
          • 4

          #5
          Thanks ulz peter. I did read their manu and got no clue on this. I will be more careful on intronic SNPs.

          Comment

          Latest Articles

          Collapse

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by SEQadmin2, 06-09-2026, 11:58 AM
          0 responses
          14 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-05-2026, 10:09 AM
          0 responses
          26 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-04-2026, 08:59 AM
          0 responses
          37 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-02-2026, 12:03 PM
          0 responses
          61 views
          0 reactions
          Last Post SEQadmin2  
          Working...