I have WGS data (2x100bp) that has been aligned to hg19. Using HLA-VBSeq, I HLA typed the data, and now the researcher I am working with is interested in two allele sequences for HLA-E for gene editing purposes. I would really appreciate some feedback on my workflow since I have never done this type of analysis before.
Using samtools, I subset the list of reads covering the genomic region of HLA-E, then used picard FilterSamReads to output the filtered bam file. I converted the bam to paired fastq and then aligned the fastq to the HLA-E genomic DNA sequences for alleles E*01:03:01:01 and E*01:01:01:01 from the IMGT/HLA database using BWA-MEM. I then sorted and indexed my bam files for each allele for viewing in IGV. Now, I have a couple questions:
1. Should I have aligned the filtered bam to the nucleotide coding sequences for both alleles instead of the genomic sequences?
2. The researcher wants all sequences aligned to E*01:03:01:01 and E*01:01:01:01. Would it be appropriate to convert the bam files to fasta files using seqtk to deliver the sequences?
I'm not sure if my workflow is correct, or if delivering the fasta for each allele is appropriate for gene editing downstream. Help?
Using samtools, I subset the list of reads covering the genomic region of HLA-E, then used picard FilterSamReads to output the filtered bam file. I converted the bam to paired fastq and then aligned the fastq to the HLA-E genomic DNA sequences for alleles E*01:03:01:01 and E*01:01:01:01 from the IMGT/HLA database using BWA-MEM. I then sorted and indexed my bam files for each allele for viewing in IGV. Now, I have a couple questions:
1. Should I have aligned the filtered bam to the nucleotide coding sequences for both alleles instead of the genomic sequences?
2. The researcher wants all sequences aligned to E*01:03:01:01 and E*01:01:01:01. Would it be appropriate to convert the bam files to fasta files using seqtk to deliver the sequences?
Code:
samtools bam2fq input.bam | seqtk seq -A > output.fa