Hello to everybody.
I am a plant biologist trying to switch to bioinformatics (notably, RNA-Seq). It is just a few months agoi I discovered all the Linux world and its surroundings. Now that I am going on with the analysis, I do find more and more bio-informatics problems that probably have more to do with informatics than biology.
Here is a problem it's a while I have tried to solve with different Linux commands, but I suppose that a script is needed.
I have alist of sequences in FASTA, all with a header XLOC_xxxx. Now, most of them have the same header. I'd like to sort only the first one for each different header (that is, avoid redundancy).
What do youpropose to fix the problem? (Yes, I already have the list of the headers I want to extract).
Thank you in advance!
I am a plant biologist trying to switch to bioinformatics (notably, RNA-Seq). It is just a few months agoi I discovered all the Linux world and its surroundings. Now that I am going on with the analysis, I do find more and more bio-informatics problems that probably have more to do with informatics than biology.
Here is a problem it's a while I have tried to solve with different Linux commands, but I suppose that a script is needed.
I have alist of sequences in FASTA, all with a header XLOC_xxxx. Now, most of them have the same header. I'd like to sort only the first one for each different header (that is, avoid redundancy).
What do youpropose to fix the problem? (Yes, I already have the list of the headers I want to extract).
Thank you in advance!
Comment