SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Using Bfast to align paired end Illumina reads gavin.oliver Bioinformatics 14 01-14-2012 06:51 AM
BFAST for SOLiD paired end reads epigen Bioinformatics 31 09-03-2011 05:20 AM
BFAST mapping paired end reads. tanghz Bioinformatics 10 04-29-2011 06:29 AM
How to map SOLiD paired end reads by Bfast beliefbio Bioinformatics 1 12-29-2010 12:55 AM

Reply
 
Thread Tools
Old 12-16-2009, 02:38 AM   #1
lindseyjane
Member
 
Location: Oxford

Join Date: Apr 2009
Posts: 28
Default BFAST input format for paired end reads

I would like to know what I should do to input paired end reads to the BFAST software. The script provided, qseq2fastq.pl requires qseq files but I only have the sequence files from the Illumina machine named
e.g.
s_1_1_sequence.txt containing reads from 1st read in lane 1
s_1_2_sequence.txt containing reads from 2nd read in lane 1

I can easily convert to fastq using maq sol2sanger, but I understand that the BFAST FASTQ format is different from the standard FASTQ. The manual states it requires the pairs to be listed in order of 5' to 3' and on the same strand. It also requires them to have the same name whereas at the moment they have the same names but slightly different suffixes of #0/1 for read 1 and #0/2 for read 2.

Does anyone know a quick way to get my reads into the right format? Thanks
lindseyjane is offline   Reply With Quote
Old 12-16-2009, 06:06 AM   #2
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,542
Default

Just to clarify - you want to interleave the two paired FASTQ files (F1,F2,F3,... and R1,R2,R3,...) into one file where they alternate (F1,R1,F2,R2,F3,R3,...) but also remove the "/1" and "/2" suffices on the forward and reverse read identifiers to make them the same?

I'd write a script to do this in your language of choice (e.g. Perl perhaps with BioPerl, or Python with Biopython, etc).
maubp is offline   Reply With Quote
Old 12-16-2009, 06:15 AM   #3
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,542
Default

If you like Python, you could try something like this (untested):

HTML Code:
#This Python script requires Biopython 1.51 or later
from Bio import SeqIO
import itertools

#Setup variables (could parse command line args instead)
file_f = "s_1_1_sequence.txt"
file_r = "s_1_2_sequence.txt"
file_out = "interleaved.fastq"

def interleave(iter1, iter2) :
    for (forward, reverse) in itertools.izip(iter1,iter2):
        assert forward.id.endswith("/1")
        assert reverse.id.endswith("/2")
        #Remove the /1 and /2 from the identifiers,
        forward.id = forward.id[:-2]
        reverse.id = reverse.id[:-2]
        assert forward.id == reverse.id
        yield forward
        yield reverse

records_f = SeqIO.parse(open(file_f,"rU"), "fastq-illumina")
records_r = SeqIO.parse(open(file_r,"rU"), "fastq-illumina")

handle = open(file_out, "w")
count = SeqIO.write(interleave(records_f, records_r), handle, "fastq-sanger")
handle.close()
print "%i records written to %s" % (count, file_out)
Based on the Biopython example here:
http://news.open-bio.org/news/2009/1...ith-biopython/

Note - I'm assuming you have Illumina 1.3+ FASTQ files, not Solexa style FASTQ files. See http://en.wikipedia.org/wiki/FASTQ_format and http://nar.oxfordjournals.org/cgi/co...stract/gkp1137 or for search the forum for details.

Last edited by maubp; 12-16-2009 at 06:18 AM. Reason: Adding link
maubp is offline   Reply With Quote
Old 12-16-2009, 06:22 AM   #4
lindseyjane
Member
 
Location: Oxford

Join Date: Apr 2009
Posts: 28
Default

ok, thanks, I am learning to use perl, not familiar with python, so I will write a script to do this in perl.

Just to clarify, do you know whether I need to alter the reverse read to make the sequence on the same strand as the forward read?

eg if I have in the sequence.txt file
Forward read AAAATTT
Reverse read CCGGGG

I need to interleave them as:
AAAATTT
CCCCGG
is this correct?
lindseyjane is offline   Reply With Quote
Old 12-16-2009, 06:45 AM   #5
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,542
Default

Quote:
Originally Posted by lindseyjane View Post
ok, thanks, I am learning to use perl, not familiar with python, so I will write a script to do this in perl.
Fair enough - you may find BioPerl helpful, it has built in FASTQ support.
Quote:
Originally Posted by lindseyjane View Post
Just to clarify, do you know whether I need to alter the reverse read to make the sequence on the same strand as the forward read?
Maybe. I haven't used BFAST so don't know. It shouldn't be too hard to do it if required. Again, BioPerl will have built in reverse complement code, but don't forget to reverse the qualities too.
maubp is offline   Reply With Quote
Old 12-16-2009, 07:21 AM   #6
lindseyjane
Member
 
Location: Oxford

Join Date: Apr 2009
Posts: 28
Default

Thanks for all your help and rapid responses to my queries
lindseyjane is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:50 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO