Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BWA fail to open file

    hi all
    i am tyring to align a .fq file to a reference genome (.fa).
    so i start mqking indexe like :

    bwa indexe -a bwtsw exome/exome.fa -p exome

    and i get the 8 indexe files.

    then i am trying the "aln" command like :

    bwa aln exome s_1_1_sequence.fq > s_1_1_sequence.sai

    and i got an error saying that: [bwa_seq_open] fail to open file 's_1_1_sequence.fq' . Abort!

    i dont know what is wrong.. so i will be very thankfull if someone could give me the right command line to do that.
    thanks in advance
    wassim

  • #2
    Likely a relative path issue or file name indicated is not the true file name

    Are both the exome index files you created and the s_1_1_sequence.fq files in the same directory as the indicated command would require? If so it should work if s_1_1_sequence.fq is the true file name, its not really a .txt file?

    Comment


    • #3
      yea they are in the same directory and the s_1_1_sequence.fq is a .txt file that i changed the extension cz i saw in the manual that "aln" supports .fq files ... but i dont really now why BWA cannot open the file !
      NOTE: i am working on ubuntu machine

      Comment


      • #4
        It should work then. I'm guessing it opens with "less s_1_1_sequence.fq". If so the only thing I can suggest is look at the index files are they "exome.ann, exome.amb, etc.." Maybe the error message is not correctly identifying the true issue. In my case I've never used the -p option during indexing so my files are all "exome.fa indexed to exome.fa.ann, exome.fa.amb, etc..." So that would be my next suggestion. The file extension really doesn't matter, so if it started as a .txt file there really is no need to change the extension unless you have a pipeline that requires that extension.

        Comment


        • #5
          Eyeball your fq, and see if it looks like it should. Perhaps I'm mixing up bwa with some other application, but I had a similar error come up with a fq file that had a bunch of "--" in it from grep.

          Comment


          • #6
            in fact i have noticed noticed that the problem is with the size of my reads file.
            the size is 7.1 GB and i tried to make a file with the first 50000 lines and it works fine.
            so i rearched for the max limit of lines number or file size but i didnt find anything
            could anyone know the max file size supported by BWA ?
            thanks
            wassim

            Comment


            • #7
              Probably 4GB (2^32 bytes). The source code can be modified to handle larger files but you're probably best just splitting it in half.

              Comment


              • #8
                I've used files from HiSeq runs in the 7-9Gb range with no problem, the 4Gb limitation is for indexing there should not be a limitation to the size of the read files as bwa only process 200k reads at a time. Best guess if the first 50,000 works fine is that you have an issue in the fastq file. Do you still have the original (.txt) file from the sequencer? Maybe go back to that or do a line count (wc -l) and make sure the count is a multiple of 4 to start. If so try cutting out the first 50%, 75% and/or start with the everything but the last 10 reads/40 lines. Most often the issue will be a messed up new line or something at the end of the file.

                Comment


                • #9
                  Good point Jon. Given that the error happens prior to any processing (I assume the author would have stated it runs fine for a while first), the error likely occurs within the first 262144 reads.

                  Comment


                  • #10
                    hi all ..

                    thanks for suggestions .. but the file is working fine on a server that isnt more powerful than my machine ! in fact the error is comming when i just type the commad line for "aln" and it gives me this error before even running any part of the file .. (first 262144 reads)

                    i am wondering why it works fine on another machine and not on mine !!
                    even the other command "sampe" gives the same probleme ..

                    i tried to use another file .fq and still not works ...
                    its probably a probleme with permission to use phyisical memory cz i am working on a Pc in my lab and i think ther's a limition on using all RAM ( 4 GB )

                    Comment


                    • #11
                      Hi, everyone! I encountered similar problem recently. The codes are presented here.

                      echo "bwa aln -t 15 /data/hg19/human_g1k_v37.fasta.gz /data/lane1.R1.clean.fq.gz > lane1.aln1.sai" | qsub -l nodes=node8pn=15

                      And I got the error as following:

                      [bwa_aln] 17bp reads: max_diff = 2
                      [bwa_aln] 38bp reads: max_diff = 3
                      [bwa_aln] 64bp reads: max_diff = 4
                      [bwa_aln] 93bp reads: max_diff = 5
                      [bwa_aln] 124bp reads: max_diff = 6
                      [bwa_aln] 157bp reads: max_diff = 7
                      [bwa_aln] 190bp reads: max_diff = 8
                      [bwa_aln] 225bp reads: max_diff = 9
                      [bwa_seq_open] fail to open file '/data/lane1.R1.clean.fq.gz'. Abort!

                      /opt/gridview/pbs/dispatcher/mom_priv/jobs/671.node1.SC: line 1: 31281 Aborted

                      bwa aln -t 15 /data/hg19/human_g1k_v37.fasta.gz /data/lane1.R1.clean.fq.gz > lane1.aln1.sai

                      Thanks a million!

                      Comment


                      • #12
                        Hi wencanh,
                        I found the cause for this problem (in my case) :

                        1) In fact if you take a look on system requirement for BWA algo : http://bio-bwa.sourceforge.net/bwa.shtml
                        They say that it requires a minimum of 3.5 GB of RAM

                        2) If you check the link from "sourceforge" : http://sourceforge.net/apps/mediawik...e=SAM_protocol
                        they say that you should have a "A computer with 64-bit CPU, 8GB or more memory and 100GB free disk space"

                        So I think the problem is with the 64-bit CPU ... I suggest you to check your system specs.

                        Wassim

                        Comment


                        • #13
                          Thank you for your prompt response.

                          I don't think it is the same in my case. When one of my workmate A submited it to the servier, it worked. That means the system requiremetn for BWA is met. But when I submited it to the servier, I got the same error as I mentioned before.

                          BTW, the file "lane1.R1.clean.fq.gz" mentioned before was submitted by my workmate A. But we are in the same group and I can open and edit the file.

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Current Approaches to Protein Sequencing
                            by seqadmin


                            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                            04-04-2024, 04:25 PM
                          • seqadmin
                            Strategies for Sequencing Challenging Samples
                            by seqadmin


                            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                            03-22-2024, 06:39 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, 04-11-2024, 12:08 PM
                          0 responses
                          27 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-10-2024, 10:19 PM
                          0 responses
                          31 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-10-2024, 09:21 AM
                          0 responses
                          27 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-04-2024, 09:00 AM
                          0 responses
                          52 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X