Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • dindel output

    Hello All,

    I run Dindel and now I am trying to understand the output. I did not find an example describing each column in the output.
    Can someone please point me to the right place or help me understand the glf and VCF formats.

    Thanks!
    EHC

    (1) glf file:
    -what are the column mark as nBQT, nmmBQT, mLogBQ, nMMLeft, nMMRight, glf?
    -Which is the likelihood score? Can one infer its quality?
    -What is the dip.map line?
    -Why some positions appear multiple time? Are these multiple indels?

    msg index analysis_type tid lpos rpos center_position realigned_position was_candidate_in_window ref_all nref_all num_reads post_prob_variant qual est_freq logZ hapfreqs indidx msq numOffAll num_indel num_cover_forward num_cover_reverse num_unmapped_realigned var_coverage_forward var_coverage_reverse nBQT nmmBQT mLogBQ nMMLeft nMMRight glf
    ok 17 dip.map Chr1 11843 11962 11903 11903 1 NA -C 6 NA 0.000429941 NA NA NA 0 0 NA NA 0 0 0 0 1 NA NA NA NA NA 0/1:0.000429941
    ok 17 dip Chr1 11843 11962 11903 11892 0 NA R=>T 6 NA NA NA -67.6043 NA 0 24.1039 6 4 0 0 0 0 0 200 0 -6.88 0 0 0/0:-67.6037,0/1:-69.906,1/1:-69.9063
    ok 17 dip Chr1 11843 11962 11903 11900 0 NA R=>C 6 NA NA NA -67.6043 NA 0 24.1039 6 4 0 0 0 0 0 200 0 -6.88 0 0 0/0:-67.6037,0/1:-69.906,1/1:-69.9063
    ok 17 dip Chr1 11843 11962 11903 11903 1 NA -C 6 NA NA NA -67.6043 NA 0 24.1039 6 4 0 0 0 0 1 200 0 -6.88 0 0 0/0:-67.6037,0/1:-67.6134,1/1:-67.6238
    ok 17 dip Chr1 11843 11962 11903 11913 0 NA R=>A 6 NA NA NA -67.6043 NA 0 24.1039 6 4 0 0 0 0 0 200 0 -6.88 0 0 0/0:-67.6037,0/1:-69.906,1/1:-69.9063
    ok 18 dip.map Chr1 13262 13381 13321 13292 0 NA R=>G 294 NA 2545.49 NA NA NA 0 0 NA NA 0 0 0 0 0 NA NA NA NA NA 1/1:15.3077
    ok 18 dip.map Chr1 13262 13381 13321 13298 0 NA R=>G 294 NA 2545.49 NA NA NA 0 0 NA NA 0 0 0 0 0 NA NA NA NA NA 1/1:24.2244
    ok 18 dip.map Chr1 13262 13381 13321 13301 0 NA R=>G 294 NA 2545.49 NA NA NA 0 0 NA NA 0 0 0 0 0 NA NA NA NA NA 1/1:39.2562
    ok 18 dip.map Chr1 13262 13381 13321 13322 1 NA +AGTGAAAGTACCGGTCCATGGTTC 294 NA 2545.49 NA NA NA 0 29 NA NA 79 0 0 68 0 NA NA NA NA NA 1/1:56.3833

    (2) variantCalls.VCF
    - what is the last SAMPLE column. what does 1/1:124 mean?
    - what is the meaning of GT:GQ? I saw this appear in header but still can not understand that.
    (##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
    ##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype quality">)


    #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE
    Chr3 6048 . T TG 1712 PASS DP=93;NF=31;NR=11;NRS=32;NFS=12;HP=1 GT:GQ 1/1:124
    Chr3 6535 . C CG 37 PASS DP=98;NF=1;NR=0;NRS=2;NFS=0;HP=3 GT:GQ 1/1:7
    Chr3 7873 . C CT,CTTT 49 PASS DP=70;NF=4;NR=5;NRS=18;NFS=23;HP=1 GT:GQ 1/2:3
    Chr3 8105 . TAAA T 435 PASS DP=69;NF=12;NR=6;NRS=14;NFS=6;HP=1 GT:GQ 1/1:60
    Chr3 8703 . T TTTA 423 hp10 DP=77;NF=1;NR=7;NRS=3;NFS=12;HP=15 GT:GQ 1/1:16

  • #2
    Originally posted by EHC View Post
    (2) variantCalls.VCF
    - what is the last SAMPLE column. what does 1/1:124 mean?
    - what is the meaning of GT:GQ? I saw this appear in header but still can not understand that.
    That column is telling you the genotype (GT) and genotype quality (GQ). You can find full information about the vcf here.

    Briefly "1/1", means the genotype is homozygous alternate allele. GQ =124 is the phred-scaled probability that the genotype call is wrong (large number means low probability)

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Essential Discoveries and Tools in Epitranscriptomics
      by seqadmin




      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
      04-22-2024, 07:01 AM
    • seqadmin
      Current Approaches to Protein Sequencing
      by seqadmin


      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
      04-04-2024, 04:25 PM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, Today, 08:47 AM
    0 responses
    10 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-11-2024, 12:08 PM
    0 responses
    60 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 10:19 PM
    0 responses
    59 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 09:21 AM
    0 responses
    53 views
    0 likes
    Last Post seqadmin  
    Working...
    X