SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Finding Unique sequences not shared between closely related species duartemolha Bioinformatics 0 03-11-2013 09:54 AM
FASTA sequence From large BAM file mez Bioinformatics 9 01-13-2013 05:42 AM
Doubt about .abi files and .fasta files fc35802 Bioinformatics 2 02-28-2012 10:05 AM
repeat sequences/large files in galaxy Giles Bioinformatics 2 06-27-2011 11:08 AM
Large RNA sequences ? Does it has any sense ? perencia Bioinformatics 10 07-29-2010 07:05 AM

Reply
 
Thread Tools
Old 03-31-2013, 05:28 PM   #1
shanshuiii
Junior Member
 
Location: Minnesota

Join Date: Nov 2012
Posts: 2
Default How to connect shared sequences in two large fasta files

Hi,

I would like to hear any suggestion on connecting two fasta files, here is my problem:

I have two large fasta files, each around 1GB.

F1.fasta contains:
AB001.1
GTAGTGTGAGGTGTGT
AB002.1
GTAGTGTGAGGTGTGT
AB005.1
GTAGTGTGAGGTGTGT

And F2.fasta contains:
AB005.2
GTAGTGTGAGGTGTGT
AB006.2
GTAGTGTGAGGTGTGT

Imade up the sequences. But as you can see, they are actually pair-end reads from illumina HiSeq. The files were trimed so contain difference number of sequence. Now I wonder if there is any program I can used to find the paired sequnece (i.e. AB005.1 + AB005.2), and put a string of "--------------" between them.

So the output fasta will look like
AB005
GTAGTGTGAGGTGTGT-------------------GTAGTGTGAGGTGTGT

I would appreciate if you can guide me to any program or any command i can use in R or python?

Thanks!
shanshuiii is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:11 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO