Dear all,
I am working with a genome with very low heterozygosity. I suspect that this monoploid genome (which only includes reference contigs in its fasta file) is actually tetraploid (multiple independent analyses of SNPs confirm this suspicion).
As a first step towards testing whether different alleles of the same gene have different expression levels, I would like to map my RNA-seq reads to the reference contigs, and also to a separate set of "alt" contigs that are identical to the reference contigs, but which include the SNP variants. The idea is to separate, into two separate Bam files, the RNA reads exactly matching the reference genome from the RNA reads exactly matching the alternative SNPs.
Is there a straightforward way to take one's fasta file for the reference contigs, along with a BAM file (made from mapping Illumina DNA reads back to the reference contigs), to call the SNPs with some stringent filtering, and then to generate a new fasta file for "alt contigs" (the contigs with SNP variants)? Although this would give no idea as to the phasing, I think it would be okay in the case of my genome for purposes of looking at differential RNA-seq, because the SNPs are pretty far apart, and generally there are only two alleles per gene.
Thank you for any suggestions you may have.
Best regards,
TylerDodgeball
I am working with a genome with very low heterozygosity. I suspect that this monoploid genome (which only includes reference contigs in its fasta file) is actually tetraploid (multiple independent analyses of SNPs confirm this suspicion).
As a first step towards testing whether different alleles of the same gene have different expression levels, I would like to map my RNA-seq reads to the reference contigs, and also to a separate set of "alt" contigs that are identical to the reference contigs, but which include the SNP variants. The idea is to separate, into two separate Bam files, the RNA reads exactly matching the reference genome from the RNA reads exactly matching the alternative SNPs.
Is there a straightforward way to take one's fasta file for the reference contigs, along with a BAM file (made from mapping Illumina DNA reads back to the reference contigs), to call the SNPs with some stringent filtering, and then to generate a new fasta file for "alt contigs" (the contigs with SNP variants)? Although this would give no idea as to the phasing, I think it would be okay in the case of my genome for purposes of looking at differential RNA-seq, because the SNPs are pretty far apart, and generally there are only two alleles per gene.
Thank you for any suggestions you may have.
Best regards,
TylerDodgeball
Comment