SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Is bowtie2 really this unstable? kgulukota Bioinformatics 5 02-22-2012 08:27 AM
Bowtie2 --al and --un: no output arnav Bioinformatics 3 02-14-2012 01:08 PM
Bowtie2 and 2 GB limit mscholz Bioinformatics 1 01-03-2012 01:08 PM
bowtie2 fun mscholz Bioinformatics 3 11-18-2011 07:30 PM
Bowtie2 with Tophat plassaaw Bioinformatics 4 11-11-2011 06:12 AM

Reply
 
Thread Tools
Old 02-07-2012, 07:39 AM   #1
kgulukota
Member
 
Location: Illinois

Join Date: Oct 2011
Posts: 30
Default Bowtie2 chokes on -a flag?

I am working on project where I need to get ALL hits to each read - defined fairly stringently. I tried to use bowtie2 with a command like this:

bowtie2 --threads 20 --reorder --score-min L,-0.5,-0.2 -a -x trdb -U R15.fq -S 15_tr.sam

My reads are 100 bp long hence the parameters for match are fairly stringent here. I expected that bowtie2 might take a while but will complete the job. Without the '-a' flag the job completed in about 30 mins. But with -a, I was waiting nearly 3 days and still undone.

To judge from the sam file, bowtie2 completed the alignments for about 70K of the reads reads (in ~ 10 mins) and then kept spinning with no writes to the sam file thereafter.

I know bowtie2 manual says it is not optimized for the -a flag. But this looks much worse than unoptimized. Its unusable. Anyone have experience with this?

Thanks,
Gulu
__________________
Kamalakar Gulukota,
Director,
Center for Bioinformatics and Computational Biology
NorthShore University Health System, kgulukota@northshore.org
kgulukota is offline   Reply With Quote
Old 02-07-2012, 11:38 PM   #2
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 836
Default

Yep, same results from us. The problem is that bowtie2 handles inserts, misreads, and (in local mode) read clipping. That's a lot of errors that take a much longer time to account for.

What you may be able to try to speed things up is to get bowtie2 to dump all the multiple-mapped reads to another file (e.g. with '-k 2'), and only do the '-a' on those reads.
gringer is offline   Reply With Quote
Old 02-08-2012, 05:54 AM   #3
kgulukota
Member
 
Location: Illinois

Join Date: Oct 2011
Posts: 30
Default

Quote:
Originally Posted by gringer View Post
What you may be able to try to speed things up is to get bowtie2 to dump all the multiple-mapped reads to another file (e.g. with '-k 2'), and only do the '-a' on those reads.
Thank gringer! I will try that.
__________________
Kamalakar Gulukota,
Director,
Center for Bioinformatics and Computational Biology
NorthShore University Health System, kgulukota@northshore.org
kgulukota is offline   Reply With Quote
Old 02-09-2012, 09:20 AM   #4
kgulukota
Member
 
Location: Illinois

Join Date: Oct 2011
Posts: 30
Default

An update:
Yes, bowtie2 does have a big issue with the '-a' flag. I ran bowtie2 on about 8.8 million reads. Following gringer's advice I first ran it with a generous '-k 50' option i.e:

bowtie2 --score-min L,-0.5,-0.2 -k 50 -x trdb -U rd.fq -S k50.sam

This ran and finished in about 20 mins or less. I found that 6,594 of the reads had 50 hits. Next, I created a new fastq file with just these 50's ("The50s.fq") and re-ran bowtie2 with the -a flag:

bowtie2 --score-min L,-0.5,-0.2 -a -x trdb -U The50s.fq -S 50s_tr.sam

Its been running for over 2 hours with no results being output. Overall, beware of the '-a' flag in bowtie2.

Now, the 6594 sequences do appear a bit repetitive - I'll strengthen my filtering upstream. So, its understandable why bowtie2 is choking. Still, it should be possible to put in some defenses against this flailing, right? So, if anyone active in bowtie2 development sees this, I have a request:

please have bowtie search till a Max_K parameter and come back more quickly with a message like "6,594 sequences had more than Max_K (1000) hits each - they are being ignored. See filtered.fastq for these sequences".
__________________
Kamalakar Gulukota,
Director,
Center for Bioinformatics and Computational Biology
NorthShore University Health System, kgulukota@northshore.org
kgulukota is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:59 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO