SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
Bfast alignement with paired end reads in separate files david.tamborero Bioinformatics 2 11-29-2011 08:49 AM
Error during PicardMarkDuplicates (Illumina paired-ends mapped by Bfast) david.tamborero Bioinformatics 2 08-09-2011 02:08 AM
Mira assembler: Medium sized genomes;How to use 2 separate files for paired-end reads ndeshpan Bioinformatics 3 05-23-2011 06:59 PM
bfast bgeneratereads for paired ends sdvie Bioinformatics 2 03-23-2011 11:04 AM

Reply
 
Thread Tools
Old 12-20-2011, 11:12 AM   #1
david.tamborero
Member
 
Location: spain

Join Date: Feb 2011
Posts: 60
Default how to merge paired ends provided in separate files during (bfast) alignement

Hello,

I want to align illumina paired-end reads by using bfast. The point is that each end is provided in two separate .fastq files. I am not sure (at all) of which is the best way to 'join' them during the alignement process. I am using bfast_match + bfast_localign + bfast_postprocess. I've seen in the bfast manual that the localalign step allows to do the following:

Code:
bfast localalign -1 file_1.bmf -2 file_2.bmf -A 0 -U > sample.baf
When the .bmf file comes from the bwaaln utility. However, when the .bmf file comes from the bfast_match, the following does not seem to work (bfast+bwa-0.6.4e):

Code:
bfast localalign -f hg19.fa -m pair_1.bmf -m pair_2.bmf -A 0 -U > sample.baf
Therefore, I do not know the best way to proceed. My lucky guess is to align each .fastq file separately, and when I get the resulting two .sam files for each end then to join them by using picard (or samtools) merge.

Any help will be appreciated!

thanks
david
david.tamborero is offline   Reply With Quote
Old 12-21-2011, 05:17 AM   #2
brentp
Member
 
Location: salt lake city, UT

Join Date: Apr 2010
Posts: 72
Default

there is a file that comes with the BFAST distribution: scripts/ill2fastq.pl
that will convert your *sequence fastq files to bfastq format.
brentp is offline   Reply With Quote
Old 12-21-2011, 09:25 AM   #3
david.tamborero
Member
 
Location: spain

Join Date: Feb 2011
Posts: 60
Default

Thank you for your answer, brentp.

I've tried the ill2fastq.pl, and as far as I notice it just merges both fastq files in a single one in which the second end is reverted and complemented. For instance:

pair_1:
Quote:
@HWUSI-EAS1692_0001:1:1:1050:4451#0/1
CAGATTCACANTCCTGAATATCATGTTTTCTTTCCAAGGNATGACATAACGTCTTGGGATCATCCCTTGCTTTAATGAAAATCGTGGCAAATGAA
+HWUSI-EAS1692_0001:1:1:1050:4451#0/1
Ybaac][T^YB[ZZ[SKVZT`bcYbccaccaaa_cZZ[ZB[Z[T_c`cYcc\bcccc^T\a`TcccbL\ac\^a\Ybb`^bY]bb_BBBBB
pair_2:
Quote:
@HWUSI-EAS1692_0001:1:1:1050:4451#0/2
CATGATAATGCACTCCATCTCATTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGCACTAAAAAGCGGACCTTGGTGTGAAAACATAACACACAC
+HWUSI-EAS1692_0001:1:1:1050:4451#0/2
M_M^ZM\YL]U^L\^VQJIU\a__\``c\cW_aaaaa_R[_\_`W][__BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB

is converted to:
Quote:
@HWUSI-EAS1692_0001:1:1:1050:4451#0
CAGATTCACANTCCTGAATATCATGTTTTCTTTCCAAGGNATGACATAACGTCTTGGGATCATCCCTTGCTTTAATGAAAATCGTGGCAAATGAA
+
:CBBD><5?:#<;;<4,7;5ACD:CDDBDDBBB@D;;<;#<;<5@DADD=CDDDD?5=BA5DDDC-=BD=?B=:CCA?C:>CC@#########
@HWUSI-EAS1692_0001:1:1:1050:4451#0
GTGTGTGTTATGTTTTCACACCAAGGTCCGCTTTTTAGTGCTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAATGAGATGGAGTGCATTATCATG
+
##############################################@@<>8A@=@<3@BBBBB@8D=DAA=@@B=6*+27?=-?6>-:=.;?.@.

I'm still wondering if it is ok to obtain the .sam file as follows:

Code:
bfast match end1.fastq > end1.bmf
bfast match end2.fastq > end2.bmf

bfast localign end1.bmf > end1.baf
bfast localign end2.bmf > end2.baf

bfast postprocess end1.baf > end1.sam
bfast postprocess end2.baf > end2.sam

samtools merge end1.sam end2.sam > sample.sam
I hope that the subsequent programs in the pipeline will understand that the aligned reads of the .sam file are in one or another strand depending on the header info.

(Note that the pipeline is intended for searching for SNPs)
david.tamborero is offline   Reply With Quote
Old 01-03-2012, 05:31 AM   #4
david.tamborero
Member
 
Location: spain

Join Date: Feb 2011
Posts: 60
Default

Just in case anyone is interested in this post, I should say that everything goes nice when the two files containing each paired end are merged by the ill2fastq.pl script and then inputted to the bfast commands.

I'm still concerned in the following though:

- which is the advantage of doing so as compared to align each paired end separately and then joining the two resulting sam files (by samtools merge, for instance).

- since i've noticed that the ill2fastq.pl script reverses and complements the second paired end, I'm not sure of what are the correct values for the -w argument in the bfast match ('to find matches on the designed strands') and the -R in the bfast postprocess ('specifies to expect paired reads to be on reverse strands').

cheers,
david
david.tamborero is offline   Reply With Quote
Reply

Tags
alignement, bfast, paired end reads

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:58 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO