View Single Post
Old 12-16-2009, 06:15 AM   #3
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

If you like Python, you could try something like this (untested):

HTML Code:
#This Python script requires Biopython 1.51 or later
from Bio import SeqIO
import itertools

#Setup variables (could parse command line args instead)
file_f = "s_1_1_sequence.txt"
file_r = "s_1_2_sequence.txt"
file_out = "interleaved.fastq"

def interleave(iter1, iter2) :
    for (forward, reverse) in itertools.izip(iter1,iter2):
        assert forward.id.endswith("/1")
        assert reverse.id.endswith("/2")
        #Remove the /1 and /2 from the identifiers,
        forward.id = forward.id[:-2]
        reverse.id = reverse.id[:-2]
        assert forward.id == reverse.id
        yield forward
        yield reverse

records_f = SeqIO.parse(open(file_f,"rU"), "fastq-illumina")
records_r = SeqIO.parse(open(file_r,"rU"), "fastq-illumina")

handle = open(file_out, "w")
count = SeqIO.write(interleave(records_f, records_r), handle, "fastq-sanger")
handle.close()
print "%i records written to %s" % (count, file_out)
Based on the Biopython example here:
http://news.open-bio.org/news/2009/1...ith-biopython/

Note - I'm assuming you have Illumina 1.3+ FASTQ files, not Solexa style FASTQ files. See http://en.wikipedia.org/wiki/FASTQ_format and http://nar.oxfordjournals.org/cgi/co...stract/gkp1137 or for search the forum for details.

Last edited by maubp; 12-16-2009 at 06:18 AM. Reason: Adding link
maubp is offline   Reply With Quote