Quote:
Originally Posted by kmkocot
Hi all,
I am trying to find a script or a program that I can call in a pipeline that can remove gap-only sites in a multiple sequence alignment. Can you think of anything I can use? I'd also like to
delete the columns at either end of each alignment where there are only 2 or fewer sequences with non-gap characters if you can think of anything that can do that.
Thanks!
Kevin
|
Given your alignments in the SAM format, you can remove all alignments with indels with the C-program
"dbamfilter -i". You could also easily parse the CIGAR field in the SAM format for any "I" or "D" operators.
What format are you working with?