I'm looking at reads from genomes that definitely include structural variation (based on Breakdancer and SVDetect) and I'm running into something that I suspect to be an error.
I'm looking at the distribution of the distance between read pairs that are not properly paired in regions across the genome and I'm consistently finding that the distribution of the mean distance of reads where either of the pair maps within the centromere is different from the distribution of mean distance in the arms. There are also more of these reads found within the centromere, but the read depth is lower than in other regions.
I would suspect that a lot more of these reads could be errors, since the centromere is poorly mapped to begin with. However, I'm not sure how to filter them out if so. Is there a standard analysis step I may be missing in assessing these reads? As far as I can tell each of the SV detection tools assesses the reads differently.
I'm looking at the distribution of the distance between read pairs that are not properly paired in regions across the genome and I'm consistently finding that the distribution of the mean distance of reads where either of the pair maps within the centromere is different from the distribution of mean distance in the arms. There are also more of these reads found within the centromere, but the read depth is lower than in other regions.
I would suspect that a lot more of these reads could be errors, since the centromere is poorly mapped to begin with. However, I'm not sure how to filter them out if so. Is there a standard analysis step I may be missing in assessing these reads? As far as I can tell each of the SV detection tools assesses the reads differently.
Comment