ETA: The title should read "Finding and segregating coding low-complexity regions in the genome" - Apologies, I need a nap.
Greetings.
I am doing some variant analysis with mapping and I would like to segregate the output (using BWA -> SAMtools -> (VCF) -> Annotated with SNPeff) into variants associated with low-complexity protein sequences (as identified by the SEG algorithm).
In other words, I would like the output to be segregated into variants that are associated with low-complexity coding regions, and those that are not. I would also be keen to be able to map the low-complexity regions on the genome.
There are many ways of doing this, but would love to hear what others suggest on the best way to proceed and the best tools to work with.
Thanks!
Greetings.
I am doing some variant analysis with mapping and I would like to segregate the output (using BWA -> SAMtools -> (VCF) -> Annotated with SNPeff) into variants associated with low-complexity protein sequences (as identified by the SEG algorithm).
In other words, I would like the output to be segregated into variants that are associated with low-complexity coding regions, and those that are not. I would also be keen to be able to map the low-complexity regions on the genome.
There are many ways of doing this, but would love to hear what others suggest on the best way to proceed and the best tools to work with.
Thanks!