Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by Jose Blanca View Post
    Sorry, I have not explained myself well enough.
    clean_reads uses two different algorithms for quality trimming. One for long reads (lucy) and a different one for short reads. If you're cleaning long reads, the parameters aplicable are the lucy parameters: lucy_error, lucy_window and lucy_bracket. These are the parameters that you should tweak to modify the cleaning behaviour when dealing with 454 and sanger reads.

    For illumina and solid we didin't manage to use lucy so we implemented a sliding window trimming function. Its parameters are qual_window, qual_threshold, and only_3_end. That's why these parameters can only be used with short reads.
    Hi Jose
    I am going with your explainations. I issued the followed command:
    clean_reads -i Pair01_fastq_format.fastq -o ./clean_reads/output_q20_len_50 -p 454 -f fastq -g fastq -min_length 20 --lucy_error 0.025,0.02 --lucy_bracket 10,0.02 --lucy_window 1,0.02

    It seems to work fine but no output. The clean_reads.error explains that the input file "Pair01_fastq_format.fastq" is not found. The file is in the same directory from where i issued the command and it says "no such file or directory".
    Am i wrong in the command itself?
    I really appreciate your help. Thnx

    Comment


    • #17
      Are you sure the file is in there? could it be a problem with the letter case? In unix the case matter Pair01_fastq_format.fastq and pair01_fastq_format.fastq would be different files.
      Can you run the following command ok?
      head Pair01_fastq_format.fastq

      Comment


      • #18
        Hi Jose
        Its fine now. The command is working. But I have a question. If I want to cleaning minimum threshold quality score, how can i do that? Since --qual_threshold does not work for 454, how is 'clean_reads' working without quality threshold information?
        Thanx

        Comment


        • #19
          Take a look at the lucy documentation, because you have to use their parameters.

          Comment


          • #20
            Hi Jose,

            another question for you. I've been testing clean_reads and it works quite nicely. However, when I try to use multi threads I got errors which I believe are related with psubprocess. Could you help me on this (since it would speed up the work ). The error code is as follows:

            "clean_reads -i mp1_M1.fastq -o mp1_M1cr1.fastq -p illumina -t 4 -a adaptors.fasta
            The command was:
            /usr/local/bin/clean_reads -i mp1_M1.fastq -o mp1_M1cr1.fastq -p illumina -t 4 -a adaptors.fasta
            /usr/local/bin/clean_reads version: 0.2.1
            Running pipeline illumina with the following parameters:
            --platform: illumina
            --seq_in: mp1_M1.fastq
            --seq_out: mp1_M1cr1.fastq
            --adaptors_file: adaptors.fasta
            --threads: 4
            --disable_quality_trimming: False
            --qual_threshold: 20
            --qual_window: 1
            --only_3_end: False
            --filter_identity: 95.0
            --filter_length_percentage: 75.0
            --error_log: clean_reads.error
            An unexpected error happened.
            The clean_reads developers would appreciate your feedback
            Please send them the error log and take a look at it: clean_reads.error

            [Errno 2] No such file or directory: '/tmp/tmpTYEr3e/tmpKIFhW2/tmpGHXvok'/usr/local/lib/python2.6/dist-packages/franklin/utils/cgitb.py:245: DeprecationWarning: BaseException.message has been deprecated as of Python 2.6
            value = pydoc.text.repr(getattr(evalue, name))
            Traceback (most recent call last):
            File "/usr/local/bin/clean_reads", line 857, in <module>
            main(stdout, stderr)
            File "/usr/local/bin/clean_reads", line 840, in main
            processes=threads)
            File "/usr/local/lib/python2.6/dist-packages/franklin/pipelines/pipelines.py", line 339, in seq_pipeline_runner
            processes)
            File "/usr/local/lib/python2.6/dist-packages/franklin/pipelines/pipelines.py", line 287, in _parallel_process_sequences
            retcode = process.wait()
            File "/usr/local/lib/python2.6/dist-packages/psubprocess/prunner.py", line 374, in wait
            self._collect_output_streams()
            File "/usr/local/lib/python2.6/dist-packages/psubprocess/prunner.py", line 407, in _collect_output_streams
            joiner(out_file, part_out_fnames)
            File "/usr/local/lib/python2.6/dist-packages/psubprocess/prunner.py", line 490, in default_cat_joiner
            in_fhand = open(in_file_, 'r')
            IOError: [Errno 2] No such file or directory: '/tmp/tmpTYEr3e/tmpKIFhW2/tmpGHXvok' "

            thanx
            P

            Comment


            • #21
              I think that I have fixed the problem. The fix is included in the psubprocess that I have just released, could you try to reinstall psubprcess with this version?

              Comment


              • #22
                How does GS Assembler determine qual cutoff?

                Originally posted by Himalaya View Post
                Hi Jose
                I am trying to do quality trimming and filtering 454 reads. The adaptors and primers and barcode sequences are already removed.I am not allowed to specific minimum quality threshold to clean bad quality reads. I don't understand why? How does it do quality trimming.
                thnx
                I am wondering this myself. We have a Junior with v.2.5p1 software and it appears (by scanning many qual scores) that the lower cutoff is 10, although I see a few zeros in there (is this a glitch)? I looked through the GS Run section of the manual on filtering and could not find a place to set the qual score threshold nor an explanation of what the cutoff is. I'd hate to assume it is 10 based on a visual scan of some qual scores. Anybody have an idea?

                Comment


                • #23
                  I would recommend to to a quality boxplot to understand how every run went. You have an example of a boxplot here:



                  It is the thrid chart.

                  Comment


                  • #24
                    Thank you, Jose. After looking at the link you sent, I found that our bioinformatics center has a program to do just that. I am completely new to sequencing so I appreciate your help.

                    Comment


                    • #25
                      Originally posted by essvee View Post
                      I suggest trying SeqTrim.
                      You can set minimum quality based on a defined window size, minimum length, etc.
                      You can also run it command line, or online.
                      www.scbi.uma.es/seqtrim/
                      hi essvee
                      Do you know how seqtrim cleans up the low quality bases. I mean the actual steps or methodology? My sequences are already clean from primers and adaptors. I just need to clean low quality bases. I couldn't find it in published papers(2007 and 2010) and in downloaded tar file. Please let me know where i can find the working principle of cleaning low quality bases.
                      Thanks

                      Comment


                      • #26
                        Originally posted by robs View Post
                        I like PRINSEQ (http://prinseq.sourceforge.net/). It comes as web and standalone version and does all the QC and data pre-processing that you need.

                        The application note also contains a short comparison with similar tools (http://bioinformatics.oxfordjournals.../27/6/863.long).
                        HI Rob
                        Can you show me where can i find the working principle of prinseq? I would like to know how prinseq trims off the low quality score bases?

                        Comment


                        • #27
                          Finally I got QTrim to quality trim the 454 sequence reads. It also outputs the graphical plots showing the quality trend of reads before and after quality trimming. QTrim is available here
                          hiv.sanbi.ac.za/software/qtrim

                          Comment


                          • #28
                            Hi
                            someone knows how to change parameters in seqTrim by command line
                            Thanks so much
                            Mary Luz

                            Comment


                            • #29
                              any 454 data should always be re-called using PyroBayes....much more accurate base-caller and significantly improves data quality

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              9 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              50 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              67 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X