I feel like there must be a simple one-liner with see/awk that can do this, but can't think of it and I hope y'all can help be out.
I have a set of FASTA files for gene sequences, from GenBank. I need to extract all the instances of "GG," plus the twenty bases up stream. The output would be a list of 22-mers all ending in GG (with line breaks after each), all the instances of this in each gene.
Any help is greatly appreciated!
I have a set of FASTA files for gene sequences, from GenBank. I need to extract all the instances of "GG," plus the twenty bases up stream. The output would be a list of 22-mers all ending in GG (with line breaks after each), all the instances of this in each gene.
Any help is greatly appreciated!
Comment