View Single Post
Old 03-31-2013, 05:28 PM   #1
shanshuiii
Junior Member
 
Location: Minnesota

Join Date: Nov 2012
Posts: 2
Default How to connect shared sequences in two large fasta files

Hi,

I would like to hear any suggestion on connecting two fasta files, here is my problem:

I have two large fasta files, each around 1GB.

F1.fasta contains:
AB001.1
GTAGTGTGAGGTGTGT
AB002.1
GTAGTGTGAGGTGTGT
AB005.1
GTAGTGTGAGGTGTGT

And F2.fasta contains:
AB005.2
GTAGTGTGAGGTGTGT
AB006.2
GTAGTGTGAGGTGTGT

Imade up the sequences. But as you can see, they are actually pair-end reads from illumina HiSeq. The files were trimed so contain difference number of sequence. Now I wonder if there is any program I can used to find the paired sequnece (i.e. AB005.1 + AB005.2), and put a string of "--------------" between them.

So the output fasta will look like
AB005
GTAGTGTGAGGTGTGT-------------------GTAGTGTGAGGTGTGT

I would appreciate if you can guide me to any program or any command i can use in R or python?

Thanks!
shanshuiii is offline   Reply With Quote