SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Genomic Resequencing (http://seqanswers.com/forums/forumdisplay.php?f=28)
-   -   How to generate VCF from HISAT2 pre-built SNP index? (http://seqanswers.com/forums/showthread.php?t=75775)

jol.espinoz 04-30-2017 11:03 AM

How to generate VCF from HISAT2 pre-built SNP index?
 
1 Attachment(s)
My ultimate goal is to get a (n= samples, m= SNPs) data matrix. My plan was to use HISAT2 for the mapping, VCF tools for the vcf file, and then parse it to generate the data matrix I can actually mine.

I'm using the pre-built SNP index file for H. sapiens, Ensembl GRCh38 ftp://ftp.ccb.jhu.edu/pub/infphilo/h...h38_snp.tar.gz . I have HISAT2 running smoothly for all of my samples and started reading the downstream pipeline for generating VCF files (https://ccb.jhu.edu/software/hisat2/manual.shtml).

Code:

samtools mpileup -uf $HISAT2_HOME/example/reference/22_20-21M.fa eg2.sorted.bam | bcftools view -bvcg - > eg2.raw.bcf
How do I get the original fasta file or build a VCF file using this index and my sam/bam files? I was going to just download the hg38 Ensemble annotated genome but I don't think that's what I need. . . I went into the `make_grch38_snp.sh`file from the tar ball when downloading the SNPs db. I think it's building the SNP index from `Homo_sapiens.GRCh38.dna.primary_assembly.fa. Is this the file that needs to be used? (ftp://ftp.ensembl.org/pub/release-84...assembly.fa.gz)

Also, if anyone has any insight on how to generate a data matrix from the vcf files, it would be greatly appreciated (but first I need to generate the vcf files)

Thanks in advance


All times are GMT -8. The time now is 09:18 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.