Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Qs in exome sequencing data analysis

    Hello everyone,

    I am Maone, just start my project in exome sequencing + filter analysis to investigate causitive gene for a rare genetic disorder. I have no background in Linux.

    So now we have got our exome sequencing result back and we would like to narrow the variants down by filters like: exclude common varients in dpSNP, only express in muscle...etc. Could anyone give me some advices on what should I do, like:
    1. Is there any software windows-based but not Linux or Linux is a must-have if I want to continue my project?
    2. If solely for this project, what is the extent of Linux I have to learn? Could anyone recall some good books or website to start with?
    3. Where can I access these database such as muscular genes, dpSNP, fibroblast genes?

    Thank you.

    Maone
    Last edited by Maone; 06-15-2011, 11:00 AM.

  • #2
    Quite a lot of questions at once. If you've got a Linux computer in reach try to get some basic command line knowledge (google is your friend)

    If you've got your SNPs in anyone of these formats: MAQ, GFF, VCF or CASAVA you could use the SeattleSeq SNP annotation tool (http://gvs.gs.washington.edu/SeattleSeqAnnotation/). That gives you annotations like dbSNP, conservation, HAPMAP frequencies, genes and a lot more. The result is stored as csv file (as far as I remember right) and you can download it and filter the SNPs using Excel.

    As the gene names are stored as well you could search for muscle-associated genes in the literature and screen the result for that gene...

    I think thats the most easy way of doing it without looking too deeply into Linux (which I actually would recommend anyone dealing with NGS data as long as they got time to do that)

    Comment


    • #3
      Originally posted by ulz_peter View Post
      Quite a lot of questions at once. If you've got a Linux computer in reach try to get some basic command line knowledge (google is your friend)

      If you've got your SNPs in anyone of these formats: MAQ, GFF, VCF or CASAVA you could use the SeattleSeq SNP annotation tool (http://gvs.gs.washington.edu/SeattleSeqAnnotation/). That gives you annotations like dbSNP, conservation, HAPMAP frequencies, genes and a lot more. The result is stored as csv file (as far as I remember right) and you can download it and filter the SNPs using Excel.

      As the gene names are stored as well you could search for muscle-associated genes in the literature and screen the result for that gene...

      I think thats the most easy way of doing it without looking too deeply into Linux (which I actually would recommend anyone dealing with NGS data as long as they got time to do that)
      Thank you very much for your advices.

      Actually, I have got the CSV file exported from DNAnexus.com using their nucleotide-level variation with settings as Genome: hg18, Gene annotations: RefSeq Genes.

      As taking your advice, I opened the CSV file with Excel and went through the data. Now I get some new questions in interpreting the data:
      1. In column of "where_in_transcript", I have CDS, non-coding exon, introns, upstream and downstream, UTRs. If I am only looking for exon mutation, should I look solely in CDS?
      2. For some variants, I got duplicates having the same Var_index with the only difference in "transcript_name"
      eg: NM_002026 NM_054034 NM_212474 NM_212475 NM_212476 NM_212478 are all for FN1 transcript variants
      Is it the general way to count them as one variant on a gene?
      3. In the name of columns, do "var_seq1" and "var_seq2" mean Homo or Hetro variants? I found out if they are same the zygosity of the variants is Homo, otherwise it is Hetro.
      Please bear my dumb questions, I only start my learning.

      Thanks again

      Comment


      • #4
        I actually have never workd with data from DNAnexus so I can't really help you with that. Didn't it come with a manual? That should explain everything.

        Be sure not to discard intronic SNPs too fast, they could contain a splice site mutation.
        I guess the duplicates in the file are just the SNPs found in the different isoforms of the same gene but in the same genomic location.
        No idea about the var_seq1 and the var_seq2 columns...

        Comment


        • #5
          Thanks ulz peter. I did read their manu and got no clue on this. I will be more careful on intronic SNPs.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM
          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          18 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          22 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          16 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          47 views
          0 likes
          Last Post seqadmin  
          Working...
          X