Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • FastX: fastq_quality_filter problem

    Hi,
    I'm trying to filter reads using fastx toolkit.
    Command is the following:
    fastq_quality_filter -Q33 -q 20 -p 100 -v -i filename_1 -o filename_2

    However, I often get two kinds of error message:
    Segmentation fault (core dumped)
    or
    fastq_quality_filter: bug: got empty array at fastq_quality_filter.c:97

    It seems to me that segmentation fault I get for files with more than (roughly) 50-60M reads
    Is there any limitation for the tool fastq_quality_filter? Any ideas about this issue?
    thanks.

  • #2
    Well, I found the cause. It happened I have mixed fastq files, some with the Sanger format quality, and some with Illumina 1.5+. If you use Illumina 1.5+ without parameter -Q33 you get an error message "fastq_quality_filter: Invalid quality score value..." that clearly indicates pitfall.
    But in the opposite case, when you wrongly use -Q33 for reads with Illumina quality format, error messages like
    Segmentation fault (core dumped)
    or
    fastq_quality_filter: bug: got empty array at fastq_quality_filter.c:97
    do not give a clue what is going wrong.

    Comment


    • #3
      Just wanted to add to this, as I got the same error message from fastq_quality_filter, but I knew that I had the correct quality format because it worked on a differently processed version of the same library.

      It turned that I had reads in my library that were empty after removing adapter sequences with cutadapt. Once I got ride of the empty entries, it fixed the issue with fastq_quality_filter.

      Comment


      • #4
        Originally posted by kerhard View Post
        Just wanted to add to this, as I got the same error message from fastq_quality_filter, but I knew that I had the correct quality format because it worked on a differently processed version of the same library.

        It turned that I had reads in my library that were empty after removing adapter sequences with cutadapt. Once I got ride of the empty entries, it fixed the issue with fastq_quality_filter.
        The fastq_quality_filter help screen does not list the -Q parameter. What is it for and what does Q33 mean. The reason I ask is that I use -q to set minimum quality, but without the -Q33 parameter, I also get the errors you received.

        Joe White

        Comment


        • #5
          Yeah, I found out about that -Q parameter on SEQanswers, it's "undocumented" in the Fastx toolkit. If the quality scores for your libraries are in the fastq sanger format (ascii(phred+33)), rather than the fastq illumina format (ascii(phred+64)), you would use the -Q33 parameter. fastq_quality_filter automatically assumes fastq illumina quality scores. See here for original explanation:

          Bridged amplification & clustering followed by sequencing by synthesis. (Genome Analyzer / HiSeq / MiSeq)

          Comment


          • #6
            Originally posted by kerhard View Post
            Yeah, I found out about that -Q parameter on SEQanswers, it's "undocumented" in the Fastx toolkit. If the quality scores for your libraries are in the fastq sanger format (ascii(phred+33)), rather than the fastq illumina format (ascii(phred+64)), you would use the -Q33 parameter. fastq_quality_filter automatically assumes fastq illumina quality scores. See here for original explanation:

            http://seqanswers.com/forums/showthread.php?t=6701
            Thanks! That should be documented.

            Comment


            • #7
              Hi all,
              I would like to add one question here regarding fastq_quality_filter

              I used the command:

              fastq_quality_filter -i R1_QC.fastq -o R1_QC_Filter.fastq -q 20 -p 80 -Q 33 -v
              fastq_quality_filter -i R2_QC.fastq -o R2_QC_Filter.fastq -q 20 -p 80 -Q 33 -v

              The result is fine. But the number of reads left in each of the pair-end file is different.

              When i do further trimming (or any other preprocessing) and eventually mapping, does it have to be that both end of pair end reads have to be present ?

              Thank you for your help in advance !
              cheers
              CN

              Comment


              • #8
                Originally posted by Chirag View Post
                Hi all,
                I would like to add one question here regarding fastq_quality_filter

                I used the command:

                fastq_quality_filter -i R1_QC.fastq -o R1_QC_Filter.fastq -q 20 -p 80 -Q 33 -v
                fastq_quality_filter -i R2_QC.fastq -o R2_QC_Filter.fastq -q 20 -p 80 -Q 33 -v

                The result is fine. But the number of reads left in each of the pair-end file is different.

                When i do further trimming (or any other preprocessing) and eventually mapping, does it have to be that both end of pair end reads have to be present ?
                Yes they do! This is a constant problem when trimming paired end data. I have switched to using Trimmomatic which trims reads in a "pair aware" manner, which is to say it outputs four files: R1 and R2 which are still paired, plus an R1 singleton and R2 singleton file.

                Comment


                • #9
                  Thanks Kmcarr !!!
                  I will try that tool and see how it works.

                  In the mean while, i have posted one question about over-represented Kmers at
                  Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

                  Could you please help if you have better understanding about it.


                  regards
                  CN

                  Comment


                  • #10
                    Originally posted by kerhard View Post
                    Just wanted to add to this, as I got the same error message from fastq_quality_filter, but I knew that I had the correct quality format because it worked on a differently processed version of the same library.

                    It turned that I had reads in my library that were empty after removing adapter sequences with cutadapt. Once I got ride of the empty entries, it fixed the issue with fastq_quality_filter.
                    Hi Kerhard, I have the same problem. Could you please tell me how you got rid of the empty entries?

                    Comment


                    • #11
                      removing empty entries

                      Originally posted by monkey_SEQ View Post
                      Hi Kerhard, I have the same problem. Could you please tell me how you got rid of the empty entries?
                      In cutadapt, the program I used to remove adapter sequences from the reads, there is a parameter (-m, --minimum-length) that allows you to remove reads that are too short after removing the adapter.

                      For example -m 20 would only give you reads that are 20 bp after removing the adapter. This would of course exclude any adapter only reads, the empty entries.

                      Comment


                      • #12
                        Originally posted by kerhard View Post
                        In cutadapt, the program I used to remove adapter sequences from the reads, there is a parameter (-m, --minimum-length) that allows you to remove reads that are too short after removing the adapter.

                        For example -m 20 would only give you reads that are 20 bp after removing the adapter. This would of course exclude any adapter only reads, the empty entries.
                        Wow! Thanks that worked perfectly! So simple

                        I am also using the quality parameter (-q, --quality-cutoff) of cutadapt to remove low quality bases from the ends of the reads. But it seems that this parameter only removes bases from the 3'end of the read. What program can I use to also filter the 5'ends of reads and even in the middle of the read?

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Current Approaches to Protein Sequencing
                          by seqadmin


                          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                          04-04-2024, 04:25 PM
                        • seqadmin
                          Strategies for Sequencing Challenging Samples
                          by seqadmin


                          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                          03-22-2024, 06:39 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 04-11-2024, 12:08 PM
                        0 responses
                        22 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 10:19 PM
                        0 responses
                        24 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 09:21 AM
                        0 responses
                        19 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-04-2024, 09:00 AM
                        0 responses
                        52 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X