Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • ANNOVAR to annotate variants called with the SAMtools pipeline

    Hello,

    I'm new to this forum and bioinformatics for that matter.

    I'm studying functional genomics and ecology of hybridization in flying squirrels, with the goal of identifying SNPs between two flying squirrel species, and profiling hybrid individuals based on a handful of those variable markers.

    I've de novo assembled a flying squirrel transcriptome with Trinity, and have called SNPs using the SAMtools variant calling pipeline (mpileup etc).

    I now would like to use ANNOVAR to annotate the SNPs that SAMtools has called. ANNOVAR provides detailed documentation on annotating SNPs based of human genomic databases. I believe that the closest open-source annotated genome to my study species in the thirteen-lined ground squirrel (SpeTri2), followed by the mouse (mm9).

    I'm confused as to what databases from the SpeTri2 annotated assembly I should be downloading and using in ANNOVAR, as I noticed that in the ANNOVAR quick-start guide, the directions called for many human genomic databases.

    Any info would be great.

    Cheers.
    Last edited by MGCBrown; 05-25-2017, 09:59 AM.

  • #2
    It seems that a more pressing issue is that the SAMtools variant calling pipeline has created vCalendar files and not VCF (variant call format) files.

    I followed this pipeline:
    samtools mpileup -uf ref.fa aln1.bam aln2.bam | bcftools view -bvcg - > var.raw.bcf
    bcftools view var.raw.bcf | vcfutils.pl varFilter -D100 > var.flt.vcf

    Does anyone know why I would end up with a vCalendar file? I'm guessing it is a completely different format from a VCF file?

    Comment


    • #3
      On your computer .vcf extension may be associated with the calendar app.

      If you use a programmer's editor (Notepad++ for Windows or BBEdit/Textedit for macOS) you should be able to view the vcf file (File --> Open --> FInd the file). If all looks ok then you would want to remove that extension association.

      Comment


      • #4
        Thanks for the tips!

        I've opened the .vcf file in Notepad++. I've copied two components of my .vcf file below as one of the components looks OK but the other, not so much. Do you think someone went wrong during the SAMtools pipeline?


        ##contig=<ID=TRINITY_DN393704_c0_g1_i1,length=379,IDX=665834>
        ##contig=<ID=TRINITY_DN393738_c0_g1_i1,length=262,IDX=665835>
        ##contig=<ID=TRINITY_DN393713_c0_g1_i1,length=215,IDX=665836>
        ##contig=<ID=TRINITY_DN393739_c0_g1_i1,length=208,IDX=665837>
        ##contig=<ID=TRINITY_DN393753_c0_g1_i1,length=358,IDX=665838>
        ##contig=<ID=TRINITY_DN393757_c0_g1_i1,length=304,IDX=665839>
        ##contig=<ID=TRINITY_DN393784_c0_g1_i1,length=556,IDX=665840>
        ##contig=<ID=TRINITY_DN393725_c0_g1_i1,length=221,IDX=665841>
        ##contig=<ID=TRINITY_DN393774_c0_g1_i1,length=245,IDX=665842>
        ##contig=<ID=TRINITY_DN393710_c0_g1_i1,length=382,IDX=665843>
        ##contig=<ID=TRINITY_DN393772_c0_g1_i1,length=276,IDX=665844>
        ##contig=<ID=TRINITY_DN393785_c0_g1_i1,length=531,IDX=665845>
        ##contig=<ID=TRINITY_DN393793_c0_g1_i1,length=274,IDX=665846>
        ##contig=<ID=TRINITY_DN393728_c0_g1_i1,length=291,IDX=665847>
        ##contig=<ID=TRINITY_DN393767_c0_g1_i1,length=226,IDX=665848>
        ##contig=<ID=TRINITY_DN393716_c0_g1_i1,length=222,IDX=665849>



            AG7<*>  õ €? €? ŽB ˆE B ¢D 0B òD 0B òD ÈA @D ÈA @D 5 €? €? 
        À¿ €? €? €?  a ,,,$ $${ *
        @     T7<*>  õ @ ÞB ¨ÓE °B rE HB @œD  % @   1 , &{ *
        A     T7<*>  õ @ ¬B @qE °B rE HB @œD  % @   1 , "{ *
        B     A7<*>  õ @ ÐB ÄE °B rE HB @œD  % @   1 , { *
        C     C7<*>  õ @ àB @ØE °B rE HB @œD  % @   1 , &{ *
        D     A7<*>  õ @ ÆB h³E °B rE HB @œD  % @   1 , { *
        E     G7<*>  õ @ ÌB ²E °B rE HB @œD  % @   1 , #{ *
        F     A7<*>  õ @ ¶B è…E °B rE HB @œD  % @   1 , %{ *
        G     T7<*>  õ @ âB èÜE °B rE HB @œD  % @   1 , &{ *
        H     A7<*>  õ @ àB ÝE °B rE HB @œD  % @   1 , ${ *
        I     G7<*>  õ @ ÜB PÑE °B rE HB @œD  % @   1 , %{ *
        J     A7<*>  õ @ ÄB ºE °B rE HB @œD  % @   1 , { *
        K     C7<*>  õ @ ÀB ±E °B rE HB @œD  % @   1 , { *
        L     T7<*>  õ @ ÊB ÈE °B rE HB @œD  % @   1 , { *
        M     T7<*>  õ @ ¸B €¼E °B rE HB @œD  % @   1 , { *
        N     G7<*>  õ @ ÚB ˆÖE °B rE HB @œD  % @   1 , !{ *
        O     A7<*>  õ @ ¢B ŒE °B rE HB @œD  % @   1 , { *
        P     A7<*>  õ

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        27 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        30 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        26 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        52 views
        0 likes
        Last Post seqadmin  
        Working...
        X