Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Can not pass QC when imputate wiith Michigan Imputation Serve and HRC reference

    Hello All.

    I have 88 samples in 23andme format. The population is EUR. And I would like to do imputation with Michigan Imputation Serve and Haplotype Reference Consortium panels.

    The first thing I did is converting the 23andme format to VCF format by bcftools with following commend. The reference is hg18.
    bcftools convert -c ID,CHROM,POS,AA -s sampleID -f hg18.fa --tsv2vcf sample.txt -Oz -o sampleID.vcf.gz

    Then, I merge all VCFs of 88 samples into one vcf file by bcftools and upload it to Michigan Imputation Serve. I choose HRC panel for imputation.

    But I got flowing errors.

    Input Validation
    1 valid VCF file(s) found.
    Samples: 88
    Chromosomes: 1
    SNPs: 164386
    Chunks: 24
    Datatype: unphased
    Reference Panel: hrc
    Quality Control
    Execution successful
    Statistics:
    Alternative allele frequency > 0.5 sites: 60,835
    Reference Overlap: 1.42%
    Match: 214
    Allele switch: 191
    Strand flip: 191
    Strand flip and allele switch: 215
    A/T, C/G genotypes: 4
    Filtered sites:
    Filter flag set: 0
    Invalid alleles: 62,461
    Duplicated sites: 0
    NonSNP sites: 0
    Monomorphic sites: 0
    Allele mismatch: 630
    SNPs call rate < 90%: 172
    Excluded sites in total: 63,669
    Remaining sites in total: 100,717
    Warning: 24 Chunks excluded: reference overlap < 50% (see statistics.txt for details).
    Remaining chunk(s): 0
    Error: No chunks passed the QC step. Imputation cannot be started!

    And I also got some statistics information like this
    "
    ......
    Invalid Alleles: 1 (C/.)
    Invalid Alleles: 1 (G/.)
    Invalid Alleles: 1 (G/.)
    Invalid Alleles: 1 (C/.)
    Invalid Alleles: 1 (G/.)
    Invalid Alleles: 1 (G/.)
    Invalid Alleles: 1 (C/.)
    Invalid Alleles: 1 (G/.)
    Invalid Alleles: 1 (G/.)
    Invalid Alleles: 1 (G/.)
    Invalid Alleles: 1 (T/.)
    Invalid Alleles: 1 (A/.)
    Invalid Alleles: 1 (G/C,T)
    Invalid Alleles: 1 (A/.)
    Invalid Alleles: 1 (G/.)
    Invalid Alleles: 1 (C/.)
    Invalid Alleles: 1 (C/.)
    Invalid Alleles: 1 (C/.)
    ......
    INFO - Allele switch: rs4970362 - pos: 1084601 (ref: G/A, data: A/G)
    INFO - Allele switch: rs6697886 - pos: 1163474 (ref: A/G, data: G/A)
    FILTER - Low call rate: rs6697886 - pos: 1163474 (0.25)
    FILTER - Allele mismatch: rs12563338 - pos: 1188481 (ref: G/A, data: T/A)
    ......
    chunk_1_0000000001_0010000000 (Snps: 4158, Reference overlap: 0.017069701280227598, low sample call rates: false)
    chunk_1_0010000001_0020000000 (Snps: 4547, Reference overlap: 0.018189692507579038, low sample call rates: false)
    chunk_1_0020000001_0030000000 (Snps: 4094, Reference overlap: 0.01352657004830918, low sample call rates: false)
    chunk_1_0030000001_0040000000 Sample NA06985: call rate: 0.49807037457434733
    chunk_1_0030000001_0040000000 (Snps: 4405, Reference overlap: 0.012578616352201259, low sample call rates: true)
    chunk_1_0040000001_0050000000 (Snps: 3991, Reference overlap: 0.016057312252964428, low sample call rates: false)
    chunk_1_0050000001_0060000000 Sample NA06985: call rate: 0.4794905008635579
    chunk_1_0050000001_0060000000 (Snps: 4632, Reference overlap: 0.01344717182497332, low sample call rates: true)
    chunk_1_0060000001_0070000000 (Snps: 4885, Reference overlap: 0.017154389505549948, low sample call rates: false)
    chunk_1_0070000001_0080000000 (Snps: 3948, Reference overlap: 0.014024542950162784, low sample call rates: false)
    chunk_1_0080000001_0090000000 (Snps: 4674, Reference overlap: 0.013550709294939657, low sample call rates: false)
    chunk_1_0090000001_0100000000 (Snps: 4589, Reference overlap: 0.01058543961978829, low sample call rates: false)
    chunk_1_0100000001_0110000000 (Snps: 3942, Reference overlap: 0.016504126031507877, low sample call rates: false)
    chunk_1_0110000001_0120000000 (Snps: 4830, Reference overlap: 0.014309076042518397, low sample call rates: false)
    chunk_1_0120000001_0130000000 Sample NA06985: call rate: 0.4512820512820513
    chunk_1_0120000001_0130000000 Sample NA07346: call rate: 0.47692307692307695
    chunk_1_0120000001_0130000000 Sample NA12145: call rate: 0.48205128205128206
    chunk_1_0120000001_0130000000 Sample NA12287: call rate: 0.47692307692307695
    chunk_1_0120000001_0130000000 Sample NA12751: call rate: 0.49230769230769234
    chunk_1_0120000001_0130000000 Sample NA12843: call rate: 0.4717948717948718
    chunk_1_0120000001_0130000000 (Snps: 195, Reference overlap: 0.01015228426395939, low sample call rates: true)
    chunk_1_0140000001_0150000000 (Snps: 1360, Reference overlap: 0.002932551319648094, low sample call rates: false)
    chunk_1_0150000001_0160000000 (Snps: 4526, Reference overlap: 0.01504907306434024, low sample call rates: false)
    chunk_1_0160000001_0170000000 (Snps: 5688, Reference overlap: 0.013368055555555555, low sample call rates: false)
    chunk_1_0170000001_0180000000 (Snps: 4290, Reference overlap: 0.011305952930318412, low sample call rates: false)
    chunk_1_0180000001_0190000000 (Snps: 4107, Reference overlap: 0.01516610495907559, low sample call rates: false)
    chunk_1_0190000001_0200000000 (Snps: 4062, Reference overlap: 0.014111922141119221, low sample call rates: false)
    chunk_1_0200000001_0210000000 (Snps: 5175, Reference overlap: 0.01111963190184049, low sample call rates: false)
    chunk_1_0210000001_0220000000 (Snps: 4956, Reference overlap: 0.012385137834598482, low sample call rates: false)
    chunk_1_0220000001_0230000000 Sample NA06985: call rate: 0.4813989752728893
    chunk_1_0220000001_0230000000 (Snps: 4489, Reference overlap: 0.011032656663724626, low sample call rates: true)
    chunk_1_0230000001_0240000000 (Snps: 5860, Reference overlap: 0.01815126050420168, low sample call rates: false)
    chunk_1_0240000001_0250000000 (Snps: 3314, Reference overlap: 0.017533432392273403, low sample call rates: false)

    "


    I am not quite nuderstand the "Invalid alleles: 62,461". It seems that I need clean up the raw data, but I think that will lost 62,461 of 164386 SNPs.

    What should I do now? Any help would be greatly appreciated
    Last edited by illuminaGA; 02-04-2016, 02:04 PM.

Latest Articles

Collapse

  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin




    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
    04-22-2024, 07:01 AM
  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Today, 11:49 AM
0 responses
11 views
0 likes
Last Post seqadmin  
Started by seqadmin, Yesterday, 08:47 AM
0 responses
16 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
61 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
60 views
0 likes
Last Post seqadmin  
Working...
X