SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   How to connect shared sequences in two large fasta files (http://seqanswers.com/forums/showthread.php?t=28857)

shanshuiii 03-31-2013 05:28 PM

How to connect shared sequences in two large fasta files
 
Hi,

I would like to hear any suggestion on connecting two fasta files, here is my problem:

I have two large fasta files, each around 1GB.

F1.fasta contains:
AB001.1
GTAGTGTGAGGTGTGT
AB002.1
GTAGTGTGAGGTGTGT
AB005.1
GTAGTGTGAGGTGTGT

And F2.fasta contains:
AB005.2
GTAGTGTGAGGTGTGT
AB006.2
GTAGTGTGAGGTGTGT

Imade up the sequences. But as you can see, they are actually pair-end reads from illumina HiSeq. The files were trimed so contain difference number of sequence. Now I wonder if there is any program I can used to find the paired sequnece (i.e. AB005.1 + AB005.2), and put a string of "--------------" between them.

So the output fasta will look like
AB005
GTAGTGTGAGGTGTGT-------------------GTAGTGTGAGGTGTGT

I would appreciate if you can guide me to any program or any command i can use in R or python?

Thanks!


All times are GMT -8. The time now is 05:17 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.