Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bwa - fastq file size limit? and more...

    hello,

    running bwa v0.5.9 i get this error:
    fail to open file 'long_path/filename.fastq'. Abort!
    Abort (core dumped)

    yes, the file does actually exist. i think the problem is that it's too big - 4.6G.
    this is one side of illumina paired end reads - 100bp reads and there are 18,517,594 reads in the file. i shortened the file to ~2 million reads and bwa handled it just fine. is there a limit to the size of the fastq input file?

    if there is a limit i was thinking i could break up each of the paired end files and create multiple .sai files - but i think that would be problematic on the 'bwa sampe' end. i don't think the 'sampe' command supports multiple pairs of .sai files. maybe i could cat all the .sai files together?

    anyone else have a problem like this? too much data!

    thanks,
    mike

  • #2
    I think what you should do is split the fastq's, since that seems to work, and after you generate multiple bams, use samtools to merge the bams into one file.

    Comment


    • #3
      Sounds like a system limitation - 64-bit bwa can handle much bigger files than this with quite modest RAM. Are you running a 32-bit OS, or might you have built a 32-bit executable?

      Comment


      • #4
        thanks very much to both of you for your replies.

        i was indeed using a 32 bit version of bwa. i added "-m64" to the makefile gcc options and recompiled, creating a 64 bit executable.

        the large fastq file was read without error. problem solved. thanks again for the help.
        mike

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        22 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        24 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        19 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        50 views
        0 likes
        Last Post seqadmin  
        Working...
        X