Dear All,
I am looking for a way to "merge" two large sequence files. The first sequence is the reference sequence and the second is a consensus sequence containing ambiguous nucleotides (Ns). The merged sequence should correspond to the consensus sequence but the missing nucleotides should be used from the reference.
Example:
BAM-consensus: CCCTANNNNNNNNCATCTACATGG
Reference: GTATAGATATCATCATCTACATCC
Output => CCCTAGATATCATCATCTACATGG
I tried Emboss (cons, consambig, megamerger) but the problems are ambiguous nucleotides. I wrote a simple unix script and it worked just fine but not with large sequences (100MB). Any ideas that would work for larger files?
Thanks for considering my question!
I am looking for a way to "merge" two large sequence files. The first sequence is the reference sequence and the second is a consensus sequence containing ambiguous nucleotides (Ns). The merged sequence should correspond to the consensus sequence but the missing nucleotides should be used from the reference.
Example:
BAM-consensus: CCCTANNNNNNNNCATCTACATGG
Reference: GTATAGATATCATCATCTACATCC
Output => CCCTAGATATCATCATCTACATGG
I tried Emboss (cons, consambig, megamerger) but the problems are ambiguous nucleotides. I wrote a simple unix script and it worked just fine but not with large sequences (100MB). Any ideas that would work for larger files?
Thanks for considering my question!
Comment