SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Hi from Michigan albert liu Introductions 1 12-19-2012 04:59 PM
Hello from Michigan Marco Introductions 0 02-03-2010 05:17 AM

Reply
 
Thread Tools
Old 02-04-2016, 01:01 PM   #1
illuminaGA
Member
 
Location: Atlanta

Join Date: Dec 2012
Posts: 70
Default Can not pass QC when imputate wiith Michigan Imputation Serve and HRC reference

Hello All.

I have 88 samples in 23andme format. The population is EUR. And I would like to do imputation with Michigan Imputation Serve and Haplotype Reference Consortium panels.

The first thing I did is converting the 23andme format to VCF format by bcftools with following commend. The reference is hg18.
bcftools convert -c ID,CHROM,POS,AA -s sampleID -f hg18.fa --tsv2vcf sample.txt -Oz -o sampleID.vcf.gz

Then, I merge all VCFs of 88 samples into one vcf file by bcftools and upload it to Michigan Imputation Serve. I choose HRC panel for imputation.

But I got flowing errors.

Input Validation
1 valid VCF file(s) found.
Samples: 88
Chromosomes: 1
SNPs: 164386
Chunks: 24
Datatype: unphased
Reference Panel: hrc
Quality Control
Execution successful
Statistics:
Alternative allele frequency > 0.5 sites: 60,835
Reference Overlap: 1.42%
Match: 214
Allele switch: 191
Strand flip: 191
Strand flip and allele switch: 215
A/T, C/G genotypes: 4
Filtered sites:
Filter flag set: 0
Invalid alleles: 62,461
Duplicated sites: 0
NonSNP sites: 0
Monomorphic sites: 0
Allele mismatch: 630
SNPs call rate < 90%: 172
Excluded sites in total: 63,669
Remaining sites in total: 100,717
Warning: 24 Chunks excluded: reference overlap < 50% (see statistics.txt for details).
Remaining chunk(s): 0
Error: No chunks passed the QC step. Imputation cannot be started!

And I also got some statistics information like this
"
......
Invalid Alleles: 1 (C/.)
Invalid Alleles: 1 (G/.)
Invalid Alleles: 1 (G/.)
Invalid Alleles: 1 (C/.)
Invalid Alleles: 1 (G/.)
Invalid Alleles: 1 (G/.)
Invalid Alleles: 1 (C/.)
Invalid Alleles: 1 (G/.)
Invalid Alleles: 1 (G/.)
Invalid Alleles: 1 (G/.)
Invalid Alleles: 1 (T/.)
Invalid Alleles: 1 (A/.)
Invalid Alleles: 1 (G/C,T)
Invalid Alleles: 1 (A/.)
Invalid Alleles: 1 (G/.)
Invalid Alleles: 1 (C/.)
Invalid Alleles: 1 (C/.)
Invalid Alleles: 1 (C/.)
......
INFO - Allele switch: rs4970362 - pos: 1084601 (ref: G/A, data: A/G)
INFO - Allele switch: rs6697886 - pos: 1163474 (ref: A/G, data: G/A)
FILTER - Low call rate: rs6697886 - pos: 1163474 (0.25)
FILTER - Allele mismatch: rs12563338 - pos: 1188481 (ref: G/A, data: T/A)
......
chunk_1_0000000001_0010000000 (Snps: 4158, Reference overlap: 0.017069701280227598, low sample call rates: false)
chunk_1_0010000001_0020000000 (Snps: 4547, Reference overlap: 0.018189692507579038, low sample call rates: false)
chunk_1_0020000001_0030000000 (Snps: 4094, Reference overlap: 0.01352657004830918, low sample call rates: false)
chunk_1_0030000001_0040000000 Sample NA06985: call rate: 0.49807037457434733
chunk_1_0030000001_0040000000 (Snps: 4405, Reference overlap: 0.012578616352201259, low sample call rates: true)
chunk_1_0040000001_0050000000 (Snps: 3991, Reference overlap: 0.016057312252964428, low sample call rates: false)
chunk_1_0050000001_0060000000 Sample NA06985: call rate: 0.4794905008635579
chunk_1_0050000001_0060000000 (Snps: 4632, Reference overlap: 0.01344717182497332, low sample call rates: true)
chunk_1_0060000001_0070000000 (Snps: 4885, Reference overlap: 0.017154389505549948, low sample call rates: false)
chunk_1_0070000001_0080000000 (Snps: 3948, Reference overlap: 0.014024542950162784, low sample call rates: false)
chunk_1_0080000001_0090000000 (Snps: 4674, Reference overlap: 0.013550709294939657, low sample call rates: false)
chunk_1_0090000001_0100000000 (Snps: 4589, Reference overlap: 0.01058543961978829, low sample call rates: false)
chunk_1_0100000001_0110000000 (Snps: 3942, Reference overlap: 0.016504126031507877, low sample call rates: false)
chunk_1_0110000001_0120000000 (Snps: 4830, Reference overlap: 0.014309076042518397, low sample call rates: false)
chunk_1_0120000001_0130000000 Sample NA06985: call rate: 0.4512820512820513
chunk_1_0120000001_0130000000 Sample NA07346: call rate: 0.47692307692307695
chunk_1_0120000001_0130000000 Sample NA12145: call rate: 0.48205128205128206
chunk_1_0120000001_0130000000 Sample NA12287: call rate: 0.47692307692307695
chunk_1_0120000001_0130000000 Sample NA12751: call rate: 0.49230769230769234
chunk_1_0120000001_0130000000 Sample NA12843: call rate: 0.4717948717948718
chunk_1_0120000001_0130000000 (Snps: 195, Reference overlap: 0.01015228426395939, low sample call rates: true)
chunk_1_0140000001_0150000000 (Snps: 1360, Reference overlap: 0.002932551319648094, low sample call rates: false)
chunk_1_0150000001_0160000000 (Snps: 4526, Reference overlap: 0.01504907306434024, low sample call rates: false)
chunk_1_0160000001_0170000000 (Snps: 5688, Reference overlap: 0.013368055555555555, low sample call rates: false)
chunk_1_0170000001_0180000000 (Snps: 4290, Reference overlap: 0.011305952930318412, low sample call rates: false)
chunk_1_0180000001_0190000000 (Snps: 4107, Reference overlap: 0.01516610495907559, low sample call rates: false)
chunk_1_0190000001_0200000000 (Snps: 4062, Reference overlap: 0.014111922141119221, low sample call rates: false)
chunk_1_0200000001_0210000000 (Snps: 5175, Reference overlap: 0.01111963190184049, low sample call rates: false)
chunk_1_0210000001_0220000000 (Snps: 4956, Reference overlap: 0.012385137834598482, low sample call rates: false)
chunk_1_0220000001_0230000000 Sample NA06985: call rate: 0.4813989752728893
chunk_1_0220000001_0230000000 (Snps: 4489, Reference overlap: 0.011032656663724626, low sample call rates: true)
chunk_1_0230000001_0240000000 (Snps: 5860, Reference overlap: 0.01815126050420168, low sample call rates: false)
chunk_1_0240000001_0250000000 (Snps: 3314, Reference overlap: 0.017533432392273403, low sample call rates: false)

"


I am not quite nuderstand the "Invalid alleles: 62,461". It seems that I need clean up the raw data, but I think that will lost 62,461 of 164386 SNPs.

What should I do now? Any help would be greatly appreciated

Last edited by illuminaGA; 02-04-2016 at 01:04 PM.
illuminaGA is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:08 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO