Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • dindel output

    Hello All,

    I run Dindel and now I am trying to understand the output. I did not find an example describing each column in the output.
    Can someone please point me to the right place or help me understand the glf and VCF formats.

    Thanks!
    EHC

    (1) glf file:
    -what are the column mark as nBQT, nmmBQT, mLogBQ, nMMLeft, nMMRight, glf?
    -Which is the likelihood score? Can one infer its quality?
    -What is the dip.map line?
    -Why some positions appear multiple time? Are these multiple indels?

    msg index analysis_type tid lpos rpos center_position realigned_position was_candidate_in_window ref_all nref_all num_reads post_prob_variant qual est_freq logZ hapfreqs indidx msq numOffAll num_indel num_cover_forward num_cover_reverse num_unmapped_realigned var_coverage_forward var_coverage_reverse nBQT nmmBQT mLogBQ nMMLeft nMMRight glf
    ok 17 dip.map Chr1 11843 11962 11903 11903 1 NA -C 6 NA 0.000429941 NA NA NA 0 0 NA NA 0 0 0 0 1 NA NA NA NA NA 0/1:0.000429941
    ok 17 dip Chr1 11843 11962 11903 11892 0 NA R=>T 6 NA NA NA -67.6043 NA 0 24.1039 6 4 0 0 0 0 0 200 0 -6.88 0 0 0/0:-67.6037,0/1:-69.906,1/1:-69.9063
    ok 17 dip Chr1 11843 11962 11903 11900 0 NA R=>C 6 NA NA NA -67.6043 NA 0 24.1039 6 4 0 0 0 0 0 200 0 -6.88 0 0 0/0:-67.6037,0/1:-69.906,1/1:-69.9063
    ok 17 dip Chr1 11843 11962 11903 11903 1 NA -C 6 NA NA NA -67.6043 NA 0 24.1039 6 4 0 0 0 0 1 200 0 -6.88 0 0 0/0:-67.6037,0/1:-67.6134,1/1:-67.6238
    ok 17 dip Chr1 11843 11962 11903 11913 0 NA R=>A 6 NA NA NA -67.6043 NA 0 24.1039 6 4 0 0 0 0 0 200 0 -6.88 0 0 0/0:-67.6037,0/1:-69.906,1/1:-69.9063
    ok 18 dip.map Chr1 13262 13381 13321 13292 0 NA R=>G 294 NA 2545.49 NA NA NA 0 0 NA NA 0 0 0 0 0 NA NA NA NA NA 1/1:15.3077
    ok 18 dip.map Chr1 13262 13381 13321 13298 0 NA R=>G 294 NA 2545.49 NA NA NA 0 0 NA NA 0 0 0 0 0 NA NA NA NA NA 1/1:24.2244
    ok 18 dip.map Chr1 13262 13381 13321 13301 0 NA R=>G 294 NA 2545.49 NA NA NA 0 0 NA NA 0 0 0 0 0 NA NA NA NA NA 1/1:39.2562
    ok 18 dip.map Chr1 13262 13381 13321 13322 1 NA +AGTGAAAGTACCGGTCCATGGTTC 294 NA 2545.49 NA NA NA 0 29 NA NA 79 0 0 68 0 NA NA NA NA NA 1/1:56.3833

    (2) variantCalls.VCF
    - what is the last SAMPLE column. what does 1/1:124 mean?
    - what is the meaning of GT:GQ? I saw this appear in header but still can not understand that.
    (##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
    ##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype quality">)


    #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE
    Chr3 6048 . T TG 1712 PASS DP=93;NF=31;NR=11;NRS=32;NFS=12;HP=1 GT:GQ 1/1:124
    Chr3 6535 . C CG 37 PASS DP=98;NF=1;NR=0;NRS=2;NFS=0;HP=3 GT:GQ 1/1:7
    Chr3 7873 . C CT,CTTT 49 PASS DP=70;NF=4;NR=5;NRS=18;NFS=23;HP=1 GT:GQ 1/2:3
    Chr3 8105 . TAAA T 435 PASS DP=69;NF=12;NR=6;NRS=14;NFS=6;HP=1 GT:GQ 1/1:60
    Chr3 8703 . T TTTA 423 hp10 DP=77;NF=1;NR=7;NRS=3;NFS=12;HP=15 GT:GQ 1/1:16

  • #2
    Originally posted by EHC View Post
    (2) variantCalls.VCF
    - what is the last SAMPLE column. what does 1/1:124 mean?
    - what is the meaning of GT:GQ? I saw this appear in header but still can not understand that.
    That column is telling you the genotype (GT) and genotype quality (GQ). You can find full information about the vcf here.

    Briefly "1/1", means the genotype is homozygous alternate allele. GQ =124 is the phred-scaled probability that the genotype call is wrong (large number means low probability)

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Strategies for Sequencing Challenging Samples
      by seqadmin


      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
      03-22-2024, 06:39 AM
    • seqadmin
      Techniques and Challenges in Conservation Genomics
      by seqadmin



      The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

      Avian Conservation
      Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
      03-08-2024, 10:41 AM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, Yesterday, 06:37 PM
    0 responses
    8 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, Yesterday, 06:07 PM
    0 responses
    8 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 03-22-2024, 10:03 AM
    0 responses
    49 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 03-21-2024, 07:32 AM
    0 responses
    67 views
    0 likes
    Last Post seqadmin  
    Working...
    X