Hi all,
So I'm fairly new to anything bioinformatics related and I've been kind of muddling my way through so far.
I have extracted the introns from three different species and have their sequences stored in three FASTA files. I need a way to extract the first 10 bases from each of these sequences and put them in a new file. I don't know if it helps or not, but the first 10 bases are in caps while the rest of the sequence is in lowercase. I'm not sure if I could use some sort of regex hereor something. So for example,
>Fasta format Identification stuff
ACTTGTACATatgggtatcataatcagggagatcc
>Fasta format Identification stuff
ACTTGTACATatgggtatcataatcagggagatcc
>Fasta format Identification stuff
ACTTGTACATatgggtatcataatcagggagatcc
Ideally I could preserve the ID lines with the extracted 10 base sequences. I have been using UNIX and perl for some of this (doing the actual extractions), but I also have access to Windows with python, biopython, and Emboss. Thanks for any help you guys could give!
So I'm fairly new to anything bioinformatics related and I've been kind of muddling my way through so far.
I have extracted the introns from three different species and have their sequences stored in three FASTA files. I need a way to extract the first 10 bases from each of these sequences and put them in a new file. I don't know if it helps or not, but the first 10 bases are in caps while the rest of the sequence is in lowercase. I'm not sure if I could use some sort of regex hereor something. So for example,
>Fasta format Identification stuff
ACTTGTACATatgggtatcataatcagggagatcc
>Fasta format Identification stuff
ACTTGTACATatgggtatcataatcagggagatcc
>Fasta format Identification stuff
ACTTGTACATatgggtatcataatcagggagatcc
Ideally I could preserve the ID lines with the extracted 10 base sequences. I have been using UNIX and perl for some of this (doing the actual extractions), but I also have access to Windows with python, biopython, and Emboss. Thanks for any help you guys could give!
Comment