i have multiple samples. i used HaplotypeCaller in GATK like:
java -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R ./pe.scaffolds.db.fasta -I ./AEM.dedupped.bam -o ./AEM.raw_variants.g.vcf . --genotyping_mode DISCOVERY -stand_emit_conf 10 -stand_call_conf 30 --allow_potentially_misencoded_quality_scores --emitRefConfidence GVCF -variant_index_type LINEAR -variant_index_parameter 128000
then i used GenotypeGVCFs to combine samples
java -jar GenomeAnalysisTK.jar \
-T GenotypeGVCFs \
-R pe.scaffolds.db.fasta \
--variant 123480.raw_variants.g.vcf \
--variant 123651.raw_variants.g.vcf \
-o 56species_raw.vcf \
and, i used SelectVariants like this:
java -jar GenomeAnalysisTK.jar \
-T SelectVariants \
-R pe.scaffolds.db.fasta \
-V 56species_raw.vcf \
-L PH01000000 \
-ef \
-o 56species_test.vcf
i get the vcf files, then i used vcftools convert the vcf file to fasta file. but i found that in the fasta file, different samples have different number of loci. i want to do phylogenetic analysis, now, i do not know how to solve the problem.
java -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R ./pe.scaffolds.db.fasta -I ./AEM.dedupped.bam -o ./AEM.raw_variants.g.vcf . --genotyping_mode DISCOVERY -stand_emit_conf 10 -stand_call_conf 30 --allow_potentially_misencoded_quality_scores --emitRefConfidence GVCF -variant_index_type LINEAR -variant_index_parameter 128000
then i used GenotypeGVCFs to combine samples
java -jar GenomeAnalysisTK.jar \
-T GenotypeGVCFs \
-R pe.scaffolds.db.fasta \
--variant 123480.raw_variants.g.vcf \
--variant 123651.raw_variants.g.vcf \
-o 56species_raw.vcf \
and, i used SelectVariants like this:
java -jar GenomeAnalysisTK.jar \
-T SelectVariants \
-R pe.scaffolds.db.fasta \
-V 56species_raw.vcf \
-L PH01000000 \
-ef \
-o 56species_test.vcf
i get the vcf files, then i used vcftools convert the vcf file to fasta file. but i found that in the fasta file, different samples have different number of loci. i want to do phylogenetic analysis, now, i do not know how to solve the problem.
Comment