Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bowtie2 chokes on -a flag?

    I am working on project where I need to get ALL hits to each read - defined fairly stringently. I tried to use bowtie2 with a command like this:

    bowtie2 --threads 20 --reorder --score-min L,-0.5,-0.2 -a -x trdb -U R15.fq -S 15_tr.sam

    My reads are 100 bp long hence the parameters for match are fairly stringent here. I expected that bowtie2 might take a while but will complete the job. Without the '-a' flag the job completed in about 30 mins. But with -a, I was waiting nearly 3 days and still undone.

    To judge from the sam file, bowtie2 completed the alignments for about 70K of the reads reads (in ~ 10 mins) and then kept spinning with no writes to the sam file thereafter.

    I know bowtie2 manual says it is not optimized for the -a flag. But this looks much worse than unoptimized. Its unusable. Anyone have experience with this?

    Thanks,
    Gulu
    Kamalakar Gulukota,
    Director,
    Center for Bioinformatics and Computational Biology
    NorthShore University Health System, [email protected]

  • #2
    Yep, same results from us. The problem is that bowtie2 handles inserts, misreads, and (in local mode) read clipping. That's a lot of errors that take a much longer time to account for.

    What you may be able to try to speed things up is to get bowtie2 to dump all the multiple-mapped reads to another file (e.g. with '-k 2'), and only do the '-a' on those reads.

    Comment


    • #3
      Originally posted by gringer View Post
      What you may be able to try to speed things up is to get bowtie2 to dump all the multiple-mapped reads to another file (e.g. with '-k 2'), and only do the '-a' on those reads.
      Thank gringer! I will try that.
      Kamalakar Gulukota,
      Director,
      Center for Bioinformatics and Computational Biology
      NorthShore University Health System, [email protected]

      Comment


      • #4
        An update:
        Yes, bowtie2 does have a big issue with the '-a' flag. I ran bowtie2 on about 8.8 million reads. Following gringer's advice I first ran it with a generous '-k 50' option i.e:

        bowtie2 --score-min L,-0.5,-0.2 -k 50 -x trdb -U rd.fq -S k50.sam

        This ran and finished in about 20 mins or less. I found that 6,594 of the reads had 50 hits. Next, I created a new fastq file with just these 50's ("The50s.fq") and re-ran bowtie2 with the -a flag:

        bowtie2 --score-min L,-0.5,-0.2 -a -x trdb -U The50s.fq -S 50s_tr.sam

        Its been running for over 2 hours with no results being output. Overall, beware of the '-a' flag in bowtie2.

        Now, the 6594 sequences do appear a bit repetitive - I'll strengthen my filtering upstream. So, its understandable why bowtie2 is choking. Still, it should be possible to put in some defenses against this flailing, right? So, if anyone active in bowtie2 development sees this, I have a request:

        please have bowtie search till a Max_K parameter and come back more quickly with a message like "6,594 sequences had more than Max_K (1000) hits each - they are being ignored. See filtered.fastq for these sequences".
        Kamalakar Gulukota,
        Director,
        Center for Bioinformatics and Computational Biology
        NorthShore University Health System, [email protected]

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        18 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        22 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        16 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        47 views
        0 likes
        Last Post seqadmin  
        Working...
        X