Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Prinseq-lite problem

    Dear all,
    I am using prinseq (v.0.20.4, CentOS) to filter my paried end data with the following commands.
    [root@localhost]# perl prinseq-lite.pl -fastq S1_read1.fq -trim_right 1 -trim_qual_right 10 -min_len 100 -derep 123 -out_format 3 -out_good S1_read1.trim -out_bad null -log
    Estimate size of input data for status report (this might take a while for large files)
    done
    Parse and process input data
    done
    Check for duplicates
    done
    Write results to output file(s)
    done
    Clean up empty files
    done
    Input and filter stats:
    Input sequences: 489,463
    Input bases: 49,435,763
    Input mean length: 101.00
    Good sequences: 292,060 (59.67%)
    Good bases: 29,206,000
    Good mean length: 100.00
    Bad sequences: 197,403 (40.33%)
    Bad bases: 19,937,703
    Bad mean length: 101.00
    Sequences filtered by specified parameters:
    derep: 197403

    Output: log file and S1_read.trim.fastq

    [root@localhost]# perl prinseq-lite.pl -fastq S1_read2.fq -trim_right 1 -trim_qual_right 10 -min_len 100 -derep 123 -out_format 3 -out_good S1_read2.trim -out_bad null -log
    Estimate size of input data for status report (this might take a while for large files)
    done
    Parse and process input data
    done
    Clean up empty files
    Use of uninitialized value $fhmappings in unlink at ../prinseq-lite-0.20.4/prinseq-lite.pl line 1833.
    done
    Input and filter stats:
    Input sequences: 489,463
    Input bases: 46,498,985
    Input mean length: 95.00
    Good sequences: 0 (0.00%)
    Bad sequences: 489,463 (100.00%)
    Bad bases: 46,498,985
    Bad mean length: 95.00
    Sequences filtered by specified parameters:
    min_len: 489463

    Output: only log file
    Could anybody let me know what the problem is and how to solve it?
    Last edited by kumar03; 12-09-2015, 06:40 PM.

  • #2
    You would be much better off using a program that can handle paired reads, like BBDuk. Prinseq is obsolete, IMO.

    Comment


    • #3
      There is no output based on -min_len 100. Here my PE file one consist of 101 bp and another one 95 bp by checking fastqc. May be this reason no output was generated. In this case how to assign minimum length ?
      note: Initially illumina machine produced an average of 101 bp per samples by Paired-end seq.

      Comment


      • #4
        Prinseq handles paired-end files, it is not obsolete. You need to specify them correctly on the command line though with '-fastq' and '-fastq2' for each pair. It generates beautiful graphs of the data which gives you a much deeper understanding of the results than anything else. Most trimming programs either fail or succeed silently which is useless for quality control and leads to a lot of wasted time trying to understand the output.

        Yes, I think setting the minlength to 100 is too high, especially if your reads are 95bp . I would set it lower, like 40, so you do not discard too many good reads by length following quality trimming.
        Last edited by SES; 12-10-2015, 05:45 AM.

        Comment


        • #5
          Thanks for your reply.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM
          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          18 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          22 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          16 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          47 views
          0 likes
          Last Post seqadmin  
          Working...
          X