Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Has anyone found any solution to this problem? I've just tried this C program, which seems to work well, but I am still getting the segmentation fault. I have 32GB of RAM on my system, so again its not memory.

    [p@c0-0]$ bwa samse /share/apps/genome/human/bowtie/hg18/hg18.fa /data/Mk/FpMb.sai /data/Mk/FpMb.part1.fastq > /data/Mk/FpMb.2.sam
    [bwa_aln_core] convert to sequence coordinate... 5.31 sec
    [bwa_aln_core] refine gapped alignments... Segmentation fault

    Comment


    • #17
      solution

      The fastq file is the problem.

      You need to use a third party script or program to convert your reads to a fastq files. For ex, for processing form the solid machine reads, on my case, the MAQ to fastq command didn't work. I had to use a third party program or script. Tofasta in these case.

      Hope this works.

      Comment


      • #18
        I tried the provided solid2fastq.pl script with both bwa and maq (they're the same). Diff'd various versions, but they're all the same. They all threw segmentation faults during the bwa samse step. I saw the last response posted about using the attached C program. That was my last failed attempt. The fastq file is in a different order completely so I can't quite tell whether they are much different. The file sizes, however, are quite different the BWA version giving me 6.4 gb roughly and the C version giving me 6.8gb of data. I don't see how QValues alone could make such a difference...

        What other third party tools are there that convert csfasta and qval files to fastq? The BWA tool and the C version posted on this thread are the only ones I have been able to find... Thanks!

        Comment


        • #19
          C script fixed

          I made a 'mistake' in de C script. Every 3rd line in a FASTQ file begins with a '+', and the rest of that line is an optional comment. However I put the name of the read there, but shorter than the first line '@'. During alignment this is no problem with BWA, however with MAQ and the postproccessing with BWA/SAMtools this gives segmentation errors.

          I fixed this in the script. The '+' line contains now only that. And this also reason why the FASTQ file with this script is that much bigger than with the perl script. I'm using this script now for weeks and so far it has worked every time.

          I changed the attachment above. However I'll repost it below too:

          csfastaToFastq.tar.gz

          And no, so far I also have not found any other means to convert (cs)fasta to Fastq.

          Comment


          • #20
            solved for me

            Originally posted by fpruzius View Post
            I made a 'mistake' in de C script. Every 3rd line in a FASTQ file begins with a '+', and the rest of that line is an optional comment. However I put the name of the read there, but shorter than the first line '@'. During alignment this is no problem with BWA, however with MAQ and the postproccessing with BWA/SAMtools this gives segmentation errors.

            I fixed this in the script. The '+' line contains now only that. And this also reason why the FASTQ file with this script is that much bigger than with the perl script. I'm using this script now for weeks and so far it has worked every time.

            I changed the attachment above. However I'll repost it below too:

            [ATTACH]213[/ATTACH]

            And no, so far I also have not found any other means to convert (cs)fasta to Fastq.
            This is just to share that I was having the same segfault error when running 'bwa aln' using a fastq file produced by solid2fastq (C version) script (BFAST 0.6.2a downloaded Jan/2010), in a 32 GB RAM machine with 2 quad-core Intel Xeon processors for a 800 MB reference genome and 300,000,000 25 bp-long SOLiD reads.

            Using the last csfastaToFastq script provided by fpruzius to produce the fastq file solved the problem.

            (Please, nilshomer, fix it in your great tool package distribution; it does not deserve such a disturbing, although minor, trouble).

            Comment


            • #21
              When checking the file produced by csfastaToFastq in my case, I realized that it trims out the last 4 characters in the read names, so that, for example, reads with names '2_6_241_F3' and '2_6_242_F3' in csfasta and qual files get both the same name in the fastq file produced, that is, '2_6_24'.

              Checking the cpp source code of csfastaToFastq I have seen that it removes _F3 and _R3 suffixes, using the underscore to locate that suffix, so that, in the case of having underscores in the read names, the output gets strange results.

              The portion of the code doing that stuff can be seen at the following lines:

              // this construction removes '_F3' or '_R3' from the sequence name
              while (csFastaLine[c] != '\n' && csFastaLine[c] != 'F' && csFastaLine[c] != 'R'){
              if (csFastaLine[c] != '_'){
              underscorePosition = c;
              }
              seqName[c] = csFastaLine[c];
              c++;
              }
              seqName[underscorePosition] = '\n';

              Changing them to the following lines seems to work fine for me:

              // this construction does NOT remove '_F3' or '_R3' from the sequence name
              while (csFastaLine[c] != '\n'){
              seqName[c] = csFastaLine[c];
              c++;
              }
              seqName[c] = '\n';

              So my question is: is it necessary to get rid of the _F3 and _R3 suffixes for downstream analyisis?

              Thanks in advance.

              Comment


              • #22
                Originally posted by javijevi View Post
                When checking the file produced by csfastaToFastq in my case, I realized that it trims out the last 4 characters in the read names, so that, for example, reads with names '2_6_241_F3' and '2_6_242_F3' in csfasta and qual files get both the same name in the fastq file produced, that is, '2_6_24'.

                Checking the cpp source code of csfastaToFastq I have seen that it removes _F3 and _R3 suffixes, using the underscore to locate that suffix, so that, in the case of having underscores in the read names, the output gets strange results.

                The portion of the code doing that stuff can be seen at the following lines:

                // this construction removes '_F3' or '_R3' from the sequence name
                while (csFastaLine[c] != '\n' && csFastaLine[c] != 'F' && csFastaLine[c] != 'R'){
                if (csFastaLine[c] != '_'){
                underscorePosition = c;
                }
                seqName[c] = csFastaLine[c];
                c++;
                }
                seqName[underscorePosition] = '\n';

                Changing them to the following lines seems to work fine for me:

                // this construction does NOT remove '_F3' or '_R3' from the sequence name
                while (csFastaLine[c] != '\n'){
                seqName[c] = csFastaLine[c];
                c++;
                }
                seqName[c] = '\n';

                So my question is: is it necessary to get rid of the _F3 and _R3 suffixes for downstream analyisis?

                Thanks in advance.
                It depends. For aligners like BFAST, to recognize reads that are from the same DNA fragment (mate or pairs) the read names must be the same. Other aligners separate the mates into two different files.

                Why not just use the "solid2fastq" program included in BFAST (I am the author)?

                Comment


                • #23
                  Originally posted by nilshomer View Post
                  Why not just use the "solid2fastq" program included in BFAST (I am the author)?
                  Of course, I tried it before, but I got segmentation faults (as indicated also by other users previously in this thread), so I shifted to csfastaToFastq, which seemed to fix the problem.

                  In a previous post from November 2008 in this thread, you told that you were going to fix the problem in solid2fastq and provide it in a posterior release. I downloaded BFAST on January 2010, so I think the problem is not absolutely fixed. Can you confirm that?

                  Comment


                  • #24
                    Originally posted by javijevi View Post
                    Of course, I tried it before, but I got segmentation faults (as indicated also by other users previously in this thread), so I shifted to csfastaToFastq, which seemed to fix the problem.

                    In a previous post from November 2008 in this thread, you told that you were going to fix the problem in solid2fastq and provide it in a posterior release. I downloaded BFAST on January 2010, so I think the problem is not absolutely fixed. Can you confirm that?
                    I apologize, I did not read the context. Could you try the latest release of BFAST to see if solid2fastq (the C-version) works for you? I have not had any problems with converting SOLiD data for BFAST (over 1 Trillion bases and counting).

                    Nils

                    Comment


                    • #25
                      Originally posted by nilshomer View Post
                      I apologize, I did not read the context. Could you try the latest release of BFAST to see if solid2fastq (the C-version) works for you? I have not had any problems with converting SOLiD data for BFAST (over 1 Trillion bases and counting).

                      Nils
                      I've tried the solid2fastq C-version of BFAST 0.6.3a. It worked apparently fine, since read names are not truncated as original csfastaToFastq script does. However, using the fastq produced by solid2fastq keeps on raising the segmentation fault error mentioned in this thread when running 'bwa aln', while using the modified csfastaToFastq is fine.

                      Please note that the error raises when running bwa (not BFAST) using the fastq file produced by BFAST's solid2fastq script.

                      Any idea?

                      Comment


                      • #26
                        Originally posted by javijevi View Post
                        I've tried the solid2fastq C-version of BFAST 0.6.3a. It worked apparently fine, since read names are not truncated as original csfastaToFastq script does. However, using the fastq produced by solid2fastq keeps on raising the segmentation fault error mentioned in this thread when running 'bwa aln', while using the modified csfastaToFastq is fine.

                        Please note that the error raises when running bwa (not BFAST) using the fastq file produced by BFAST's solid2fastq script.

                        Any idea?
                        Have you tried the solid2fastq.pl included in BWA? I apologize if I am repeating myself.

                        Comment


                        • #27
                          Originally posted by nilshomer View Post
                          Have you tried the solid2fastq.pl included in BWA? I apologize if I am repeating myself.
                          I didn't realize that bwa includes its own solid2fastq.pl...

                          I've just tried it and seems to work fine: running 'bwa aln' with the fastq file produced in this way does not raises the segmentation fault error.

                          By the way, I've realized that sequences in the fastq file produced by the BFAST's solid2fastq script are in standard color space (0123.), while the ones produced by either bwa's solid2fastq or csfastaToFastq scripts are in double-encoded color space (ACTGN). Could it be the problem? I cannot see any parameter in the 'bwa aln' command to specify the code expected in the fastq file to use, other than '-c' to work in color space.

                          Comment


                          • #28
                            Originally posted by javijevi View Post
                            I've just tried it and seems to work fine: running 'bwa aln' with the fastq file produced in this way does not raises the segmentation fault error.
                            I'm sorry for the previous message. It is not true. The fastq file produced by the solid2fastq.pl script shipped with bwa distribution also causes segmentation fault error in my computer.

                            Comment


                            • #29
                              You may try solid2fastq.pl here. The "-1" issue should be resolved, although I have not tested this on real data and I do not know if segfault is caused by other issues.

                              Download Burrows-Wheeler Aligner for free. BWA is a program for aligning sequencing reads against a large reference genome (e.g. human genome).


                              In addition, there are bugs in bwa-0.5.5. You'd better use the SVN version, which will become 0.5.6 in the near future.

                              Comment


                              • #30
                                solved

                                Originally posted by lh3 View Post
                                You may try solid2fastq.pl here. The "-1" issue should be resolved, although I have not tested this on real data and I do not know if segfault is caused by other issues.

                                Download Burrows-Wheeler Aligner for free. BWA is a program for aligning sequencing reads against a large reference genome (e.g. human genome).


                                In addition, there are bugs in bwa-0.5.5. You'd better use the SVN version, which will become 0.5.6 in the near future.
                                In my case, using the solid2fastq.pl shipped with the SVN above indicated solved the problems: fastq file is correctly produced (read names are properly trimmed), and using that fastq file does not raises segmentation fault errors.

                                Thanks a lot to everybody for the good work.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM
                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                25 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                27 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 09:21 AM
                                0 responses
                                24 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-04-2024, 09:00 AM
                                0 responses
                                52 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X