View Single Post
Old 06-28-2016, 09:43 AM   #5
Senior Member
Location: Washington, D.C. metro area

Join Date: Feb 2010
Posts: 118

My understanding is that the HGVS guidelines were developed several years before NGS was-- I remember using them to annotate Sanger sequence reads.

Why HGVS marks features at their 3' end is probably more a function of how a change at the DNA/RNA level affects a translated protein: in a run of TTT(n) that could represent several phenylalanine codons, a deletion of one base, even at the 5' end, causes a frameshift that ultimately results in an alteration of the last phenylalanine at the protein level, regardless of where the actual deletion event occurs.

I imagine the algorithms for things like aligning and variant calling work similarly to the one used in BLAST-- they all tend to left align, which I imagine is probably more efficient with respect to calculating edit distances between a sequence and its reference, or something to that effect. But that also has the side effect of making matching up with protein-focused HGVS nomenclature somewhat difficult.

I wouldn't say you're using the wrong software as any aligner/variant caller (that I'm aware of, anyway) will do the same thing. As to whether there's a piece of software to switch from left to right aligned features (and vice-versa), I'm not sure. I find that if I'm looking for the same indels across multiple samples or data sets, I just make a note of the left aligned position and include it in any .bed or .vcf that I use for intersecting/filtering.

Last edited by Jessica_L; 06-28-2016 at 09:44 AM. Reason: clarity
Jessica_L is offline   Reply With Quote