I have received a TopHat aligned bam file from sequencing core. This was for paired-end RNA-Seq and I wish to do the RNA-editing analysis. I am using SAMTools/bcf for calling the variants. But there seems to be a problem. When I look at the pileup file for the samples i see a lot of ">>>" and "<<<" symbols in the mapping column (apart from standard A ; C ; T ; G ; a ; c ; t ; g ; . ; , $ and ^S). I guess there are just the intervals between the reads that are mapped across exon-exon junctions.
e.g.
chr1 3995422 t 10 >><<<>>>>< CEFEJJJJJJ
So actually there are no reads mapped to this position but SAMTools still shows that this position has a coverage of 10X.
Now I assume that this will really be a problem in variants calling. Is there a way I can deal with this problem?
One way is of-course to generate the pileup file, remove all the "<>" signs and adjust depth accordingly and the use some tool like VarScan to call the variants. But I would like to use the bcftools itself.
e.g.
chr1 3995422 t 10 >><<<>>>>< CEFEJJJJJJ
So actually there are no reads mapped to this position but SAMTools still shows that this position has a coverage of 10X.
Now I assume that this will really be a problem in variants calling. Is there a way I can deal with this problem?
One way is of-course to generate the pileup file, remove all the "<>" signs and adjust depth accordingly and the use some tool like VarScan to call the variants. But I would like to use the bcftools itself.