Hi,
i would like to remove sequences that are identical with the 5' or 3' end of a longer sequence.
Here is an example of what i would like to do :
INPUT :
OUTPUT :
I try to solve my problem with PRINSEQ, with the following comand line, but it did'nt work, it only remove reads that have the exact same sequence
Someone familiar with this tool can help me ?
Thanks in advance
i would like to remove sequences that are identical with the 5' or 3' end of a longer sequence.
Here is an example of what i would like to do :
INPUT :
Code:
>pi1 AAAAAAAAAATTAAGGGCCAGCTGA >pi12 AAAAAAAAAATTAAGGGCCAGCTGAA >pi13 AAAAAAAAACTTGAACTCTACTGC >pi14 AAAAAAAAATTAAGGGCCAGCTGAA >pi15 AAAAAAAAATTTTGGATGATCTTAAT >pi16 AAAAAAAAATTTTGGATGATCTTAATT >pi17 AAAAAAAACAAGGTCGGCATAAAG >pi18 AAAAAAAACGAACATGAGAGGATGGA
Code:
>pi12 AAAAAAAAAATTAAGGGCCAGCTGAA >pi13 AAAAAAAAACTTGAACTCTACTGC >pi16 AAAAAAAAATTTTGGATGATCTTAATT >pi17 AAAAAAAACAAGGTCGGCATAAAG >pi18 AAAAAAAACGAACATGAGAGGATGGA
Code:
perl prinseq-lite.pl -verbose -fasta tmp1.fa -derep 123 -out_format 1
Thanks in advance
Comment