Greetings,
I'm looking for some advice on how to improve my analysis of assembly and variant analysis using 100bp Illumina in genes with low-complexity regions (imperfect repeat sequences).
I am working on comparative genomics with a number of very AT-rich genomes (about 80%, in a variety of Plasmodium species). I am also doing some population genetics in there and need an accurate set of SNPs (and indels would be nice, too).
Mapping, de novo assembly, and SNP/indel calling all have problems assembling/mapping low-complexity regions (using Velvet for de novo, BWA for mapping and SAMtools/bcftools for variant analysis). Velvet gets them right about 50% of the time (checking with Sanger sequencing) BWA can't map these regions at all.
I tried masking the regions in the genome using DUST, but it only finds little regions, these are easier to find using protein sequence.
Any advice on how to mask these regions or (even better) include them in the analyses and get them right would be appreciated.
I'm looking for some advice on how to improve my analysis of assembly and variant analysis using 100bp Illumina in genes with low-complexity regions (imperfect repeat sequences).
I am working on comparative genomics with a number of very AT-rich genomes (about 80%, in a variety of Plasmodium species). I am also doing some population genetics in there and need an accurate set of SNPs (and indels would be nice, too).
Mapping, de novo assembly, and SNP/indel calling all have problems assembling/mapping low-complexity regions (using Velvet for de novo, BWA for mapping and SAMtools/bcftools for variant analysis). Velvet gets them right about 50% of the time (checking with Sanger sequencing) BWA can't map these regions at all.
I tried masking the regions in the genome using DUST, but it only finds little regions, these are easier to find using protein sequence.
Any advice on how to mask these regions or (even better) include them in the analyses and get them right would be appreciated.
Comment