Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Qs in exome sequencing data analysis

    Hello everyone,

    I am Maone, just start my project in exome sequencing + filter analysis to investigate causitive gene for a rare genetic disorder. I have no background in Linux.

    So now we have got our exome sequencing result back and we would like to narrow the variants down by filters like: exclude common varients in dpSNP, only express in muscle...etc. Could anyone give me some advices on what should I do, like:
    1. Is there any software windows-based but not Linux or Linux is a must-have if I want to continue my project?
    2. If solely for this project, what is the extent of Linux I have to learn? Could anyone recall some good books or website to start with?
    3. Where can I access these database such as muscular genes, dpSNP, fibroblast genes?

    Thank you.

    Maone
    Last edited by Maone; 06-15-2011, 11:00 AM.

  • #2
    Quite a lot of questions at once. If you've got a Linux computer in reach try to get some basic command line knowledge (google is your friend)

    If you've got your SNPs in anyone of these formats: MAQ, GFF, VCF or CASAVA you could use the SeattleSeq SNP annotation tool (http://gvs.gs.washington.edu/SeattleSeqAnnotation/). That gives you annotations like dbSNP, conservation, HAPMAP frequencies, genes and a lot more. The result is stored as csv file (as far as I remember right) and you can download it and filter the SNPs using Excel.

    As the gene names are stored as well you could search for muscle-associated genes in the literature and screen the result for that gene...

    I think thats the most easy way of doing it without looking too deeply into Linux (which I actually would recommend anyone dealing with NGS data as long as they got time to do that)

    Comment


    • #3
      Originally posted by ulz_peter View Post
      Quite a lot of questions at once. If you've got a Linux computer in reach try to get some basic command line knowledge (google is your friend)

      If you've got your SNPs in anyone of these formats: MAQ, GFF, VCF or CASAVA you could use the SeattleSeq SNP annotation tool (http://gvs.gs.washington.edu/SeattleSeqAnnotation/). That gives you annotations like dbSNP, conservation, HAPMAP frequencies, genes and a lot more. The result is stored as csv file (as far as I remember right) and you can download it and filter the SNPs using Excel.

      As the gene names are stored as well you could search for muscle-associated genes in the literature and screen the result for that gene...

      I think thats the most easy way of doing it without looking too deeply into Linux (which I actually would recommend anyone dealing with NGS data as long as they got time to do that)
      Thank you very much for your advices.

      Actually, I have got the CSV file exported from DNAnexus.com using their nucleotide-level variation with settings as Genome: hg18, Gene annotations: RefSeq Genes.

      As taking your advice, I opened the CSV file with Excel and went through the data. Now I get some new questions in interpreting the data:
      1. In column of "where_in_transcript", I have CDS, non-coding exon, introns, upstream and downstream, UTRs. If I am only looking for exon mutation, should I look solely in CDS?
      2. For some variants, I got duplicates having the same Var_index with the only difference in "transcript_name"
      eg: NM_002026 NM_054034 NM_212474 NM_212475 NM_212476 NM_212478 are all for FN1 transcript variants
      Is it the general way to count them as one variant on a gene?
      3. In the name of columns, do "var_seq1" and "var_seq2" mean Homo or Hetro variants? I found out if they are same the zygosity of the variants is Homo, otherwise it is Hetro.
      Please bear my dumb questions, I only start my learning.

      Thanks again

      Comment


      • #4
        I actually have never workd with data from DNAnexus so I can't really help you with that. Didn't it come with a manual? That should explain everything.

        Be sure not to discard intronic SNPs too fast, they could contain a splice site mutation.
        I guess the duplicates in the file are just the SNPs found in the different isoforms of the same gene but in the same genomic location.
        No idea about the var_seq1 and the var_seq2 columns...

        Comment


        • #5
          Thanks ulz peter. I did read their manu and got no clue on this. I will be more careful on intronic SNPs.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM
          • seqadmin
            Techniques and Challenges in Conservation Genomics
            by seqadmin



            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

            Avian Conservation
            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
            03-08-2024, 10:41 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 06:37 PM
          0 responses
          8 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, Yesterday, 06:07 PM
          0 responses
          8 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-22-2024, 10:03 AM
          0 responses
          49 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-21-2024, 07:32 AM
          0 responses
          66 views
          0 likes
          Last Post seqadmin  
          Working...
          X