I have been trying to do a multiple sequence alignment with 80,000 sequences to identify any similarities. All bases are 36-37bp long however when I run the multiple sequence alignment (tried clustal-omega and muscle however muscle failed) the output file provides the following as an example:
>NC_000853_1_10|NC_023151_4_32
-----------TA---------TACG---T-T-G-TA-GA-AAT-C--GCA-A-A-G---
G---T---G-G-T-GA--TG-T-TA-----------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
--------------------------------------
I was wondering if anyone else has had a similar issue and what might be causing this issue?
Any advice on tools which can deal with such data and provide correct output would be appreciated.
Thanks,
Tom
>NC_000853_1_10|NC_023151_4_32
-----------TA---------TACG---T-T-G-TA-GA-AAT-C--GCA-A-A-G---
G---T---G-G-T-GA--TG-T-TA-----------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
------------------------------------------------------------
--------------------------------------
I was wondering if anyone else has had a similar issue and what might be causing this issue?
Any advice on tools which can deal with such data and provide correct output would be appreciated.
Thanks,
Tom
Comment