Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Prinseq-lite problem

    Dear all,
    I am using prinseq (v.0.20.4, CentOS) to filter my paried end data with the following commands.
    [root@localhost]# perl prinseq-lite.pl -fastq S1_read1.fq -trim_right 1 -trim_qual_right 10 -min_len 100 -derep 123 -out_format 3 -out_good S1_read1.trim -out_bad null -log
    Estimate size of input data for status report (this might take a while for large files)
    done
    Parse and process input data
    done
    Check for duplicates
    done
    Write results to output file(s)
    done
    Clean up empty files
    done
    Input and filter stats:
    Input sequences: 489,463
    Input bases: 49,435,763
    Input mean length: 101.00
    Good sequences: 292,060 (59.67%)
    Good bases: 29,206,000
    Good mean length: 100.00
    Bad sequences: 197,403 (40.33%)
    Bad bases: 19,937,703
    Bad mean length: 101.00
    Sequences filtered by specified parameters:
    derep: 197403

    Output: log file and S1_read.trim.fastq

    [root@localhost]# perl prinseq-lite.pl -fastq S1_read2.fq -trim_right 1 -trim_qual_right 10 -min_len 100 -derep 123 -out_format 3 -out_good S1_read2.trim -out_bad null -log
    Estimate size of input data for status report (this might take a while for large files)
    done
    Parse and process input data
    done
    Clean up empty files
    Use of uninitialized value $fhmappings in unlink at ../prinseq-lite-0.20.4/prinseq-lite.pl line 1833.
    done
    Input and filter stats:
    Input sequences: 489,463
    Input bases: 46,498,985
    Input mean length: 95.00
    Good sequences: 0 (0.00%)
    Bad sequences: 489,463 (100.00%)
    Bad bases: 46,498,985
    Bad mean length: 95.00
    Sequences filtered by specified parameters:
    min_len: 489463

    Output: only log file
    Could anybody let me know what the problem is and how to solve it?
    Last edited by kumar03; 12-09-2015, 06:40 PM.

  • #2
    You would be much better off using a program that can handle paired reads, like BBDuk. Prinseq is obsolete, IMO.

    Comment


    • #3
      There is no output based on -min_len 100. Here my PE file one consist of 101 bp and another one 95 bp by checking fastqc. May be this reason no output was generated. In this case how to assign minimum length ?
      note: Initially illumina machine produced an average of 101 bp per samples by Paired-end seq.

      Comment


      • #4
        Prinseq handles paired-end files, it is not obsolete. You need to specify them correctly on the command line though with '-fastq' and '-fastq2' for each pair. It generates beautiful graphs of the data which gives you a much deeper understanding of the results than anything else. Most trimming programs either fail or succeed silently which is useless for quality control and leads to a lot of wasted time trying to understand the output.

        Yes, I think setting the minlength to 100 is too high, especially if your reads are 95bp . I would set it lower, like 40, so you do not discard too many good reads by length following quality trimming.
        Last edited by SES; 12-10-2015, 05:45 AM.

        Comment


        • #5
          Thanks for your reply.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM
          • seqadmin
            Techniques and Challenges in Conservation Genomics
            by seqadmin



            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

            Avian Conservation
            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
            03-08-2024, 10:41 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 06:37 PM
          0 responses
          10 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, Yesterday, 06:07 PM
          0 responses
          9 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-22-2024, 10:03 AM
          0 responses
          50 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 03-21-2024, 07:32 AM
          0 responses
          67 views
          0 likes
          Last Post seqadmin  
          Working...
          X