Hi,
I've got a fairly unusual data-set: Illumina PE 50 bp data on a 25 kb region of human genomic DNA sequenced in ~1,500 samples to very high coverage, typically over 300X. (For the curious, it was generated using barcoding of 96 samples per lane.)
I've mapped all of the reads with Maq and finished BAM conversion and pileup with SAMtools. Now my mission is to call genotypes for SNPs, indels and SVs as comprehensively as possible for all of the samples.
I've started playing around with SAMtools varFilter for SNP and indel calling, but before I get too involved in building up a pipeline for this I was hoping to get some insight from the various experts here: given this data-set, what approaches would you use for SNP, indel and SV calling? Are there reliable pipelines and parameters established for these analyses that anyone can recommend?
Cheers,
Daniel.
I've got a fairly unusual data-set: Illumina PE 50 bp data on a 25 kb region of human genomic DNA sequenced in ~1,500 samples to very high coverage, typically over 300X. (For the curious, it was generated using barcoding of 96 samples per lane.)
I've mapped all of the reads with Maq and finished BAM conversion and pileup with SAMtools. Now my mission is to call genotypes for SNPs, indels and SVs as comprehensively as possible for all of the samples.
I've started playing around with SAMtools varFilter for SNP and indel calling, but before I get too involved in building up a pipeline for this I was hoping to get some insight from the various experts here: given this data-set, what approaches would you use for SNP, indel and SV calling? Are there reliable pipelines and parameters established for these analyses that anyone can recommend?
Cheers,
Daniel.
Comment