Hi,
What would be the fastest way (I have to do this hundreds of millions times) to extract for each aligned read in a BAM file:
1) The positions where the read bases differ from a reference sequence.
2) The PHRED base quality values of these bases. If the difference is an indel, the quality value will, of course, be skipped.
As far as I know, I cannot use mpileup or anything I know of due to memory limitation as this is a very custom amplicon reference analysis, with >500 million coverage per base position on the reference amplicon.
In short, I need to apply an efficient approach to extract all differing positions for each aligned read.
Thanks.
What would be the fastest way (I have to do this hundreds of millions times) to extract for each aligned read in a BAM file:
1) The positions where the read bases differ from a reference sequence.
2) The PHRED base quality values of these bases. If the difference is an indel, the quality value will, of course, be skipped.
As far as I know, I cannot use mpileup or anything I know of due to memory limitation as this is a very custom amplicon reference analysis, with >500 million coverage per base position on the reference amplicon.
In short, I need to apply an efficient approach to extract all differing positions for each aligned read.
Thanks.
Comment