Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bwa - fastq file size limit? and more...

    hello,

    running bwa v0.5.9 i get this error:
    fail to open file 'long_path/filename.fastq'. Abort!
    Abort (core dumped)

    yes, the file does actually exist. i think the problem is that it's too big - 4.6G.
    this is one side of illumina paired end reads - 100bp reads and there are 18,517,594 reads in the file. i shortened the file to ~2 million reads and bwa handled it just fine. is there a limit to the size of the fastq input file?

    if there is a limit i was thinking i could break up each of the paired end files and create multiple .sai files - but i think that would be problematic on the 'bwa sampe' end. i don't think the 'sampe' command supports multiple pairs of .sai files. maybe i could cat all the .sai files together?

    anyone else have a problem like this? too much data!

    thanks,
    mike

  • #2
    I think what you should do is split the fastq's, since that seems to work, and after you generate multiple bams, use samtools to merge the bams into one file.

    Comment


    • #3
      Sounds like a system limitation - 64-bit bwa can handle much bigger files than this with quite modest RAM. Are you running a 32-bit OS, or might you have built a 32-bit executable?

      Comment


      • #4
        thanks very much to both of you for your replies.

        i was indeed using a 32 bit version of bwa. i added "-m64" to the makefile gcc options and recompiled, creating a 64 bit executable.

        the large fastq file was read without error. problem solved. thanks again for the help.
        mike

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 08:47 AM
        0 responses
        16 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        60 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        60 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        54 views
        0 likes
        Last Post seqadmin  
        Working...
        X