Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to generate VCF from HISAT2 pre-built SNP index?

    My ultimate goal is to get a (n= samples, m= SNPs) data matrix. My plan was to use HISAT2 for the mapping, VCF tools for the vcf file, and then parse it to generate the data matrix I can actually mine.

    I'm using the pre-built SNP index file for H. sapiens, Ensembl GRCh38 ftp://ftp.ccb.jhu.edu/pub/infphilo/h...h38_snp.tar.gz . I have HISAT2 running smoothly for all of my samples and started reading the downstream pipeline for generating VCF files (https://ccb.jhu.edu/software/hisat2/manual.shtml).

    Code:
    samtools mpileup -uf $HISAT2_HOME/example/reference/22_20-21M.fa eg2.sorted.bam | bcftools view -bvcg - > eg2.raw.bcf
    How do I get the original fasta file or build a VCF file using this index and my sam/bam files? I was going to just download the hg38 Ensemble annotated genome but I don't think that's what I need. . . I went into the `make_grch38_snp.sh`file from the tar ball when downloading the SNPs db. I think it's building the SNP index from `Homo_sapiens.GRCh38.dna.primary_assembly.fa. Is this the file that needs to be used? (ftp://ftp.ensembl.org/pub/release-84...assembly.fa.gz)

    Also, if anyone has any insight on how to generate a data matrix from the vcf files, it would be greatly appreciated (but first I need to generate the vcf files)

    Thanks in advance
    Attached Files
    Last edited by jol.espinoz; 04-30-2017, 11:26 AM.

Latest Articles

Collapse

  • seqadmin
    Strategies for Sequencing Challenging Samples
    by seqadmin


    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
    03-22-2024, 06:39 AM
  • seqadmin
    Techniques and Challenges in Conservation Genomics
    by seqadmin



    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

    Avian Conservation
    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
    03-08-2024, 10:41 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 06:37 PM
0 responses
10 views
0 likes
Last Post seqadmin  
Started by seqadmin, Yesterday, 06:07 PM
0 responses
9 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-22-2024, 10:03 AM
0 responses
51 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-21-2024, 07:32 AM
0 responses
67 views
0 likes
Last Post seqadmin  
Working...
X